from:"Zhijie Shen \(JIRA\)"

[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-08-06 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660952#comment-14660952
 ] 

Zhijie Shen commented on YARN-3049:
---

As the issue is not blocking the whole reader implementation, how about letting 
this patch in first? [~sjlee0]?

Some more comments about the issue:

1. ColumnHelper needs to be updated as well to return a byte[] column name 
instead of a String one.

2. I'm worried that Bytes.toString() doesn't make the long integer be stored as 
the way we want. If it isn't stored as the 8 bytes, we may not guarantee the 
order of event columns.

3. FlowRunId in the row key should be fine, because the row key is never 
converted to String again. But it's good to double check.

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
> YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, 
> YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, 
> YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, 
> YARN-3049-YARN-2928.7.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-08-06 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660853#comment-14660853
 ] 

Zhijie Shen commented on YARN-3049:
---

Here's a quick example:
{code}
  @Test
  public void test() {
// imitate the process to write a long
Long a = 1234567890L;
byte[] b = Bytes.toBytes(a);
String c = Bytes.toString(b);
// imitate the process to read a long
byte[] d = Bytes.toBytes(c);
Long e = Bytes.toLong(d);
assertEquals(a, e);
  }
{code}
b and d are different bytes, then. Do I use Bytes in a wrong way?

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
> YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, 
> YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, 
> YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, 
> YARN-3049-YARN-2928.7.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-08-05 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3049:
--
Attachment: YARN-3049-YARN-2928.7.patch

Attach a new patch:

1. Rebase against YARN-3984.
2. Address Sangjin and Li's comments.

There's still a remaining issue: the timestamp will not be ser/des correctly by 
using UTF-8. I didn't figure the reason, but I did an experiment that the bytes 
were converted into string and then bytes, and they became different. Still 
need to do more investigation about this problem.

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
> YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, 
> YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, 
> YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, 
> YARN-3049-YARN-2928.7.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-08-05 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659183#comment-14659183
 ] 

Zhijie Shen commented on YARN-3984:
---

bq. If the info map is not empty, this record would be redundant and will take 
up storage space.

Make sense. The patch looks good to me. Will commit it.

> Rethink event column key issue
> --
>
> Key: YARN-3984
> URL: https://issues.apache.org/jira/browse/YARN-3984
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
> Fix For: YARN-2928
>
> Attachments: YARN-3984-YARN-2928.001.patch
>
>
> Currently, the event column key is event_id?info_key?timestamp, which is not 
> so friendly to fetching all the events of an entity and sorting them in a 
> chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
> schema. I open this jira to continue the discussion about it which was 
> commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-08-04 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3049:
--
Attachment: YARN-3049-YARN-2928.6.patch

Upload a new patch which makes HBase backend to make the decision locally.

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
> YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, 
> YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, 
> YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4013) Publisher V2 should write the unmanaged AM flag too

2015-08-03 Thread Zhijie Shen (JIRA)

Zhijie Shen created YARN-4013:
-

 Summary: Publisher V2 should write the unmanaged AM flag too
 Key: YARN-4013
 URL: https://issues.apache.org/jira/browse/YARN-4013
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen


Upon rebase the branch, I find we need to redo the similar work for V2 
publisher:

https://issues.apache.org/jira/browse/YARN-3543



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-08-03 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652806#comment-14652806
 ] 

Zhijie Shen commented on YARN-3049:
---

Okay, what will the timestamp be used to do? If there're too much context info 
required, I agree it's not elegant to incrementally expose them to the backend.

One step back, I start to understand that the real situation actually deviates 
from what I originally thought about the storage layer. When defining the data 
model, I defined a generic TimelineEntity and make other first-class citizen 
entities extend it. Then, we uniformly process the entities no matter what 
their type is. What we discussed so far implies that we cannot only treat the 
entities so generally. For application entity, we may need to take an 
additional step to parse its start/finish event to write more records.

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
> YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, 
> YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, 
> YARN-3049-YARN-2928.5.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container

2015-08-03 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3993:
--
Attachment: YARN-3993-YARN-2928.0001.patch

Rename the patch to make it for the branch.

> Change to use the AM flag in ContainerContext determine AM container
> 
>
> Key: YARN-3993
> URL: https://issues.apache.org/jira/browse/YARN-3993
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Zhijie Shen
>Assignee: Sunil G
>  Labels: newbie
> Attachments: 0001-YARN-3993-YARN-2928.patch, 
> YARN-3993-YARN-2928.0001.patch
>
>
> After YARN-3116, we will have a flag in ContainerContext to determine if the 
> container is AM or not in aux service. We need to change accordingly to make 
> use of this feature instead of depending on container ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-08-03 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3049:
--
Attachment: YARN-3049-YARN-2928.5.patch

I made a change in the new patch to reflect my last proposal. The user don't 
need to explicitly tell it's the start of a new app. Instead, I added 
"firstRequest" as the context of the app collector. RM collector manager sets 
this flag to true upon adding a new app collector at RM side.

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
> YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, 
> YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, 
> YARN-3049-YARN-2928.5.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container

2015-08-03 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652225#comment-14652225
 ] 

Zhijie Shen commented on YARN-3993:
---

+1 LGTM.

Will commit after jenkins' comment.

> Change to use the AM flag in ContainerContext determine AM container
> 
>
> Key: YARN-3993
> URL: https://issues.apache.org/jira/browse/YARN-3993
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Zhijie Shen
>Assignee: Sunil G
>  Labels: newbie
> Attachments: 0001-YARN-3993-YARN-2928.patch
>
>
> After YARN-3116, we will have a flag in ContainerContext to determine if the 
> container is AM or not in aux service. We need to change accordingly to make 
> use of this feature instead of depending on container ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-08-03 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652206#comment-14652206
 ] 

Zhijie Shen commented on YARN-3049:
---

Hi Sangjin,


Thanks for your comments. The proposed method will work for now and can 
minimize the change we should make. In fact, I used to think of this method 
too. The reason why I abandoned it is that the method couple the business logic 
and data storage. It potentially increase the risk that the change in the 
business logic will break the storage layer. For example, we rename app_created 
as app_started. This may be still easy to fix, but the maintenance difficulty 
is likely to increase as logic grows more complex. That's why I think we should 
let app collector to tell the backend that it's the first request.


On the other side, I agree RM should be responsible for this too. Actually this 
is also what I did in the current patch. If you think my proposal of letting 
app collector to determine if it is the first request, the way we can do is to 
extend RM app collector and implement this logic there.


Thanks,

Zhijie



> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
> YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, 
> YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-07-31 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650106#comment-14650106
 ] 

Zhijie Shen commented on YARN-3049:
---

What I meant before is that HBaseTimelineWriterImpl is not aware of a life 
cycle/session of the application, such that it's hard to detect the app 
creation event inside HBaseTimelineWriterImpl and make it transparent the 
caller. Instead, app collector can know if it is the first put request for this 
app sent to the writer.

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
> YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, 
> YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4006) YARN ATS Alternate Kerberos HTTP Authentication Changes

2015-07-31 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-4006:
--
Assignee: Greg Senia

> YARN ATS Alternate Kerberos HTTP Authentication Changes
> ---
>
> Key: YARN-4006
> URL: https://issues.apache.org/jira/browse/YARN-4006
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, timelineserver
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.5.1, 2.6.1, 2.8.0, 2.7.1, 2.7.2
>Reporter: Greg Senia
>Assignee: Greg Senia
> Fix For: 2.8.0
>
> Attachments: YARN-4006-branch-trunk.patch, YARN-4006-branch2.6.0.patch
>
>
> When attempting to use The Hadoop Alternate Authentication Classes. They do 
> not exactly work with what was built with 
> https://issues.apache.org/jira/browse/YARN-1935.
> I went ahead and made the following changes to support using a Custom 
> AltKerberos DelegationToken custom class.
> Changes to: TimelineAuthenticationFilterInitializer.class
>String authType = filterConfig.get(AuthenticationFilter.AUTH_TYPE);
> LOG.info("AuthType Configured: "+authType);
> if (authType.equals(PseudoAuthenticationHandler.TYPE)) {
>   filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   PseudoDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: PseudoDelegationTokenAuthenticationHandler");
> } else if (authType.equals(KerberosAuthenticationHandler.TYPE) || 
> (UserGroupInformation.isSecurityEnabled() && 
> conf.get("hadoop.security.authentication").equals(KerberosAuthenticationHandler.TYPE)))
>  {
>   if (!(authType.equals(KerberosAuthenticationHandler.TYPE))) {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   authType);
> LOG.info("AuthType: "+authType);
>   } else {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   KerberosDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: KerberosDelegationTokenAuthenticationHandler");
>   } 
>   // Resolve _HOST into bind address
>   String bindAddress = conf.get(HttpServer2.BIND_ADDRESS);
>   String principal =
>   filterConfig.get(KerberosAuthenticationHandler.PRINCIPAL);
>   if (principal != null) {
> try {
>   principal = SecurityUtil.getServerPrincipal(principal, bindAddress);
> } catch (IOException ex) {
>   throw new RuntimeException(
>   "Could not resolve Kerberos principal name: " + ex.toString(), 
> ex);
> }
> filterConfig.put(KerberosAuthenticationHandler.PRINCIPAL,
> principal);
>   }
> }
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4006) YARN ATS Alternate Kerberos HTTP Authentication Changes

2015-07-31 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650082#comment-14650082
 ] 

Zhijie Shen commented on YARN-4006:
---

Sure, take your time. I'll cancel the patch until the complete one is ready.

> YARN ATS Alternate Kerberos HTTP Authentication Changes
> ---
>
> Key: YARN-4006
> URL: https://issues.apache.org/jira/browse/YARN-4006
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, timelineserver
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.5.1, 2.6.1, 2.8.0, 2.7.1, 2.7.2
>Reporter: Greg Senia
> Fix For: 2.8.0
>
> Attachments: YARN-4006-branch-trunk.patch, YARN-4006-branch2.6.0.patch
>
>
> When attempting to use The Hadoop Alternate Authentication Classes. They do 
> not exactly work with what was built with 
> https://issues.apache.org/jira/browse/YARN-1935.
> I went ahead and made the following changes to support using a Custom 
> AltKerberos DelegationToken custom class.
> Changes to: TimelineAuthenticationFilterInitializer.class
>String authType = filterConfig.get(AuthenticationFilter.AUTH_TYPE);
> LOG.info("AuthType Configured: "+authType);
> if (authType.equals(PseudoAuthenticationHandler.TYPE)) {
>   filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   PseudoDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: PseudoDelegationTokenAuthenticationHandler");
> } else if (authType.equals(KerberosAuthenticationHandler.TYPE) || 
> (UserGroupInformation.isSecurityEnabled() && 
> conf.get("hadoop.security.authentication").equals(KerberosAuthenticationHandler.TYPE)))
>  {
>   if (!(authType.equals(KerberosAuthenticationHandler.TYPE))) {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   authType);
> LOG.info("AuthType: "+authType);
>   } else {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   KerberosDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: KerberosDelegationTokenAuthenticationHandler");
>   } 
>   // Resolve _HOST into bind address
>   String bindAddress = conf.get(HttpServer2.BIND_ADDRESS);
>   String principal =
>   filterConfig.get(KerberosAuthenticationHandler.PRINCIPAL);
>   if (principal != null) {
> try {
>   principal = SecurityUtil.getServerPrincipal(principal, bindAddress);
> } catch (IOException ex) {
>   throw new RuntimeException(
>   "Could not resolve Kerberos principal name: " + ex.toString(), 
> ex);
> }
> filterConfig.put(KerberosAuthenticationHandler.PRINCIPAL,
> principal);
>   }
> }
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-07-31 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650069#comment-14650069
 ] 

Zhijie Shen commented on YARN-3904:
---

bq. I'm not 100% sure if that's what we would like to do. Maybe we would like 
to decouple the offline aggregation module from our normal entity storage. 
Therefore, maybe it's also appealing to allow users specify if they need to 
create data schema in the offline aggregation process? Such as, setting one 
flag in the offline aggregator to create data schema?

Make sense, but can we still make table creation centralized？ I think we can 
make some option to create raw entity tables and aggregation tables separately. 
Thoughts?

bq. After the changes in this JIRA, we will only have two types of 
TimelineWriters, one for FS (test only) and one for HBase. The setting on the 
offline storage should be independent from this setting, I assume?

Yeah, I meant we currently have TIMELINE_SERVICE_READER|WRITER_CLASS pointing 
to a specific reader/writer implementation. However, it's better to have config 
such as "blah.blah.backend.type". When backend.type = hbase, we user can access 
HBase both directly and via Phoenix, and we allow aggregation. This may not 
need to part of this jira, but just think it out loudly.

> Refactor timelineservice.storage to add support to online and offline 
> aggregation writers
> -
>
> Key: YARN-3904
> URL: https://issues.apache.org/jira/browse/YARN-3904
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-3904-YARN-2928.001.patch, 
> YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
> YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, 
> YARN-3904-YARN-2928.006.patch
>
>
> After we finished the design for time-based aggregation, we can adopt our 
> existing Phoenix storage into the storage of the aggregated data. In this 
> JIRA, I'm proposing to refactor writers to add support to aggregation 
> writers. Offline aggregation writers typically has less contextual 
> information. We can distinguish these writers by special naming. We can also 
> use CollectorContexts to model all contextual information and use it in our 
> writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4006) YARN ATS Alternate Kerberos HTTP Authentication Changes

2015-07-31 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650015#comment-14650015
 ] 

Zhijie Shen commented on YARN-4006:
---

[~gss2002], it seems that the patch doesn't come with any alt kerberos auth 
code. Is it an WIP patch?

> YARN ATS Alternate Kerberos HTTP Authentication Changes
> ---
>
> Key: YARN-4006
> URL: https://issues.apache.org/jira/browse/YARN-4006
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, timelineserver
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.5.1, 2.6.1, 2.8.0, 2.7.1, 2.7.2
>Reporter: Greg Senia
> Fix For: 2.8.0
>
> Attachments: YARN-4006-branch-trunk.patch, YARN-4006-branch2.6.0.patch
>
>
> When attempting to use The Hadoop Alternate Authentication Classes. They do 
> not exactly work with what was built with 
> https://issues.apache.org/jira/browse/YARN-1935.
> I went ahead and made the following changes to support using a Custom 
> AltKerberos DelegationToken custom class.
> Changes to: TimelineAuthenticationFilterInitializer.class
>String authType = filterConfig.get(AuthenticationFilter.AUTH_TYPE);
> LOG.info("AuthType Configured: "+authType);
> if (authType.equals(PseudoAuthenticationHandler.TYPE)) {
>   filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   PseudoDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: PseudoDelegationTokenAuthenticationHandler");
> } else if (authType.equals(KerberosAuthenticationHandler.TYPE) || 
> (UserGroupInformation.isSecurityEnabled() && 
> conf.get("hadoop.security.authentication").equals(KerberosAuthenticationHandler.TYPE)))
>  {
>   if (!(authType.equals(KerberosAuthenticationHandler.TYPE))) {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   authType);
> LOG.info("AuthType: "+authType);
>   } else {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   KerberosDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: KerberosDelegationTokenAuthenticationHandler");
>   } 
>   // Resolve _HOST into bind address
>   String bindAddress = conf.get(HttpServer2.BIND_ADDRESS);
>   String principal =
>   filterConfig.get(KerberosAuthenticationHandler.PRINCIPAL);
>   if (principal != null) {
> try {
>   principal = SecurityUtil.getServerPrincipal(principal, bindAddress);
> } catch (IOException ex) {
>   throw new RuntimeException(
>   "Could not resolve Kerberos principal name: " + ex.toString(), 
> ex);
> }
> filterConfig.put(KerberosAuthenticationHandler.PRINCIPAL,
> principal);
>   }
> }
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-07-31 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649915#comment-14649915
 ] 

Zhijie Shen commented on YARN-3049:
---

I uploaded a new patch to address Sangjin's comments except bellow:

bq. l.93: What does it mean to indicate newApp for a set of entities? What if 
the set of entities contains bunch of different applications?

I don't worry about this, because the the put request to the app collector is 
related to the same app.

bq. See comments above; rather than relying on the boolean flag in the 
arguments, can we detect the case of the application created event and do it?

See my comments above.

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
> YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, 
> YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-07-31 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3049:
--
Attachment: YARN-3049-YARN-2928.4.patch

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
> YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, 
> YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-07-31 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649770#comment-14649770
 ] 

Zhijie Shen commented on YARN-3049:
---

[~sjlee0], yeah, I agree it's not a decent solution to let the user code to 
trigger writing the app to flow mapping. The reason why I did this before is 
that we can avoid check and put for each individual entity put request, which 
will obviously slow dow the write path.  Detecting the application created 
event sounds a reasonable option.  However, I'm afraid we cannot hide it inside 
the writer as the implementation detail, because the writer is bind to the 
session of an application. One solution I can think of is tackling the session 
start in the app collector. Upon the first put request received by the app 
collector, we tell the writer to also write the app to flow mapping. What do 
you think?

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
> YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, 
> YARN-3049-YARN-2928.3.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-31 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649598#comment-14649598
 ] 

Zhijie Shen commented on YARN-3984:
---

Sure, it's fine too. One question, do we make sure every event has such a 
column or only the event without info has it? Personally, I prefer the former 
option, which makes the process of the event uniformed.

> Rethink event column key issue
> --
>
> Key: YARN-3984
> URL: https://issues.apache.org/jira/browse/YARN-3984
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
> Fix For: YARN-2928
>
>
> Currently, the event column key is event_id?info_key?timestamp, which is not 
> so friendly to fetching all the events of an entity and sorting them in a 
> chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
> schema. I open this jira to continue the discussion about it which was 
> commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-31 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649557#comment-14649557
 ] 

Zhijie Shen commented on YARN-3984:
---

Okay, "e! eventid # inverse_event_timestamp ? eventkey" sounds a reasonable 
compromise.

Secondly, how do we deal with an event without any info? How about creating a 
column "e! eventid # inverse_event_timestamp ? dummy_key : empty_value"?

> Rethink event column key issue
> --
>
> Key: YARN-3984
> URL: https://issues.apache.org/jira/browse/YARN-3984
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
> Fix For: YARN-2928
>
>
> Currently, the event column key is event_id?info_key?timestamp, which is not 
> so friendly to fetching all the events of an entity and sorting them in a 
> chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
> schema. I open this jira to continue the discussion about it which was 
> commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-07-30 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648457#comment-14648457
 ] 

Zhijie Shen commented on YARN-3904:
---

[~gtCarrera9], thanks for the patch. Bellow are my comments:

bq. The two failed tests passed on my local machine, and the failures appeared 
to be irrelevant. This said, we may still need to fix those intermittent test 
failures.

Do we plan to fix it in this patch?

Some high level comments:

1. As is also mentioned in YARN-3049, how about we refactoring reader/writer 
method signature in a separate jira to avoid conflicts?

2.  I suggest moving the table creation stuff into TimelineSchemaCreator.

3. As HBase backend is accessed both directly and via Phoenix, it's good for us 
to cleanup the configuration to say we're using the HBase backend (comparing to 
FS backend) instead of specifically HBase or Phoenix writer/reader.

Other patch details:

1. Make OfflineAggregationWriter extend Service, such that you don't need to 
define init.

2. Now we're working towards a production standard patch. Would you please 
write some javadoc to explain the schema of the aggregation tables like what we 
did for HBase tables.

3. The connection config should be moved to YarnConfiguration.

4. Why is info column family kept? I expect the aggregation table will only 
have metrics data

5. Let's also have a default PhoenixOfflineAggregationWriterImpl constructor to 
be used in the production code.

6. {{Class.forName(DRIVER_CLASS_NAME);}} doesn't  need to be invoked every time 
we get a connection.

> Refactor timelineservice.storage to add support to online and offline 
> aggregation writers
> -
>
> Key: YARN-3904
> URL: https://issues.apache.org/jira/browse/YARN-3904
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-3904-YARN-2928.001.patch, 
> YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
> YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch
>
>
> After we finished the design for time-based aggregation, we can adopt our 
> existing Phoenix storage into the storage of the aggregated data. In this 
> JIRA, I'm proposing to refactor writers to add support to aggregation 
> writers. Offline aggregation writers typically has less contextual 
> information. We can distinguish these writers by special naming. We can also 
> use CollectorContexts to model all contextual information and use it in our 
> writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-07-30 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648218#comment-14648218
 ] 

Zhijie Shen commented on YARN-3049:
---

[~gtCarrera9], thanks for review. I've addressed most of your comments in the 
new patch exception followings:

bq. However, I still incline to proceed the changes in this JIRA so that we can 
speed up consolidating our POC patches.

Exactly.

bq. Reader interface: use TimelineCollectorContext to package reader arguments?

Yeah, I can see the rationale behind it, but maybe it's not 
TimelineCollectorContext. As I see a lot of arguments for the reader interface 
(as well as the writer one) and the potential signature change in future (e.g, 
adding newApp in this patch), I start to think of grouping the primitive 
arguments, shielding them in some category object, such as EntityContext, 
EntityFilters, Opts and so on, and using these as the arguments of the 
interface instead. Therefore, if we want to add newApp here, we don't really 
need to change the method signature, but add a getter/setter in Opts. Please 
let me know how you think about the idea. I can file another jira to deal with 
the issue.

bq. We're now performing filters by ourselves in memory. I'm wondering if it 
will be more efficient to translate some of our filter specifications into 
HBase filters?

That sounds a good idea, which should potentially improve the read performance. 
Let me do some investigation how to map our filter into HBase filter and push 
it to the backend. Given it may be a non-trivial work, can we get this patch in 
and follow up the filter change in another jira just in case?

bq. Add a specific test in TestHBaseTimelineWriterImpl for App2FlowTable?

In fact, it has been tested. I change the write path by letting newApp = true, 
and check if we can query the entity successfully without giving the 
flow/flowRun explicitly. However, I didn't do much assertion around the fields 
of retrieved entities, because I consider of deferring this work together with 
rewriting the whole HBase backend unit test. The current tests are too 
preliminary to capture the potential bugs around DB operations.

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
> YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, 
> YARN-3049-YARN-2928.3.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-07-30 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3049:
--
Attachment: YARN-3049-YARN-2928.3.patch

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
> YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, 
> YARN-3049-YARN-2928.3.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-07-30 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648048#comment-14648048
 ] 

Zhijie Shen commented on YARN-3942:
---

Yeah, I prefer creating a TimelineEntityFileClient to modifying the current 
TimelineClientImp, because it should minimize the affect on existing code path. 
However, I'm afraid no matter which way we chose, we cannot make the change 
seamless to users.  We cannot avoid the additional step at the client side to 
set app/app-attempt ID, can we? At Hive/Tez client (and other potential app 
client), you also have to switch the context app/app-attempt ID once the client 
detect a new YARN app/app-attempt is created. Therefore, if some application 
wants to make use of it, it will also involve code change at the user land.

BTW, why do you need app-attempt ID? Is the log file on the basis of app or 
app-attempt?

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-07-29 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647217#comment-14647217
 ] 

Zhijie Shen commented on YARN-3942:
---

[~jlowe], thanks for sharing more information about limitation. It sounds a 
reasonable tradeoff, and only affects the cross-app queries. One concern is 
that the patch only contains the read path, and the writer path only exists in 
TEZ. Therefore, it's not a complete solution from the perspective of YARN 
alone. Is it possible to generalize the write path in TEZ and promote it to 
YARN?

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-29 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3814:
--
Attachment: YARN-3814.reference.patch

I attached the patch which contains the web services only from the POC uber 
patch for your reference.

The reason why I propose to have cluster ID in the path is to make it more like 
a *REST* API, such that there's a hierarchical path from cluster to entityId. 
The reason why I only choose clusterId, appId, entityType and entityId on the 
path is that we said these are the 4 pieces can uniquely identify an entity in 
taxonomy (at least for now).

I'm not too worry about the default cluster ID problem. The user can read it 
from yarn configuration. When we create the client lib to wrap over the REST 
API, we can load the default from there if the user doesn't supply the 
clusterId.

> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3814-YARN-2928.01.patch, 
> YARN-3814-YARN-2928.02.patch, YARN-3814.reference.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-29 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646935#comment-14646935
 ] 

Zhijie Shen commented on YARN-3984:
---

In fact, metric has the same problem, but it may be still okay to ignore a 
metric without any data.

> Rethink event column key issue
> --
>
> Key: YARN-3984
> URL: https://issues.apache.org/jira/browse/YARN-3984
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
> Fix For: YARN-2928
>
>
> Currently, the event column key is event_id?info_key?timestamp, which is not 
> so friendly to fetching all the events of an entity and sorting them in a 
> chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
> schema. I open this jira to continue the discussion about it which was 
> commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-29 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646924#comment-14646924
 ] 

Zhijie Shen commented on YARN-3984:
---

[~vrushalic], thanks for picking it up. The aforementioned cases are definitely 
good to support, while the current query we want to support now (in YARN-3051 
and YARN-3049) is to retrieve all events belonging to an entity (e.g. 
application, attempt, container and etc.). With this basic query, we can easily 
distill the details that happen to the entity, such as the diagnostic msg of 
the kill event. In this case, the most efficient way is to put timestamp even 
before the event ID, so that we don't need to order the events in memory.

In addition to the key composition, I find another significant problem with the 
event store schema. If the event doesn't contain any info, it will be ignored 
then. And we cannot always guarantee user will put something into info. For 
example, user may define a KILL event without any diagnostic msg.

> Rethink event column key issue
> --
>
> Key: YARN-3984
> URL: https://issues.apache.org/jira/browse/YARN-3984
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
> Fix For: YARN-2928
>
>
> Currently, the event column key is event_id?info_key?timestamp, which is not 
> so friendly to fetching all the events of an entity and sorting them in a 
> chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
> schema. I open this jira to continue the discussion about it which was 
> commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container

2015-07-29 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3993:
--
Labels: newbie  (was: )

> Change to use the AM flag in ContainerContext determine AM container
> 
>
> Key: YARN-3993
> URL: https://issues.apache.org/jira/browse/YARN-3993
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>  Labels: newbie
>
> After YARN-3116, we will have a flag in ContainerContext to determine if the 
> container is AM or not in aux service. We need to change accordingly to make 
> use of this feature instead of depending on container ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container

2015-07-29 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646558#comment-14646558
 ] 

Zhijie Shen edited comment on YARN-3993 at 7/29/15 6:26 PM:


[~sunilg], thanks for your interest. It's not related to RM. In YARN-3116, we 
already build the channel to propagate the AM flag to aux service. What we need 
to do here is simply update the way that PerNodeTimelineCollectorsAuxService 
determine if the container is AM or not. Feel free to pick it up if you want to 
ramp up with TS v2.


was (Author: zjshen):
[~sunilg], thanks for your interest. It's not related to RM. In YARN-3116, we 
already build the channel to propagate the AM flag to aux service. What we need 
to do here is simply update the way that PerNodeTimelineCollectorsAuxService 
determine if the container is AM or not.

> Change to use the AM flag in ContainerContext determine AM container
> 
>
> Key: YARN-3993
> URL: https://issues.apache.org/jira/browse/YARN-3993
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>  Labels: newbie
>
> After YARN-3116, we will have a flag in ContainerContext to determine if the 
> container is AM or not in aux service. We need to change accordingly to make 
> use of this feature instead of depending on container ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container

2015-07-29 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646558#comment-14646558
 ] 

Zhijie Shen commented on YARN-3993:
---

[~sunilg], thanks for your interest. It's not related to RM. In YARN-3116, we 
already build the channel to propagate the AM flag to aux service. What we need 
to do here is simply update the way that PerNodeTimelineCollectorsAuxService 
determine if the container is AM or not.

> Change to use the AM flag in ContainerContext determine AM container
> 
>
> Key: YARN-3993
> URL: https://issues.apache.org/jira/browse/YARN-3993
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>
> After YARN-3116, we will have a flag in ContainerContext to determine if the 
> container is AM or not in aux service. We need to change accordingly to make 
> use of this feature instead of depending on container ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container

2015-07-29 Thread Zhijie Shen (JIRA)

Zhijie Shen created YARN-3993:
-

 Summary: Change to use the AM flag in ContainerContext determine 
AM container
 Key: YARN-3993
 URL: https://issues.apache.org/jira/browse/YARN-3993
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen


After YARN-3116, we will have a flag in ContainerContext to determine if the 
container is AM or not in aux service. We need to change accordingly to make 
use of this feature instead of depending on container ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently

2015-07-29 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646443#comment-14646443
 ] 

Zhijie Shen commented on YARN-3992:
---

The problem was found with jenkins build on YARN-3049: 
https://builds.apache.org/job/PreCommit-YARN-Build/8701/testReport/

> TestApplicationPriority.testApplicationPriorityAllocation fails intermittently
> --
>
> Key: YARN-3992
> URL: https://issues.apache.org/jira/browse/YARN-3992
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Zhijie Shen
>
> {code}
> java.lang.AssertionError: expected:<7> but was:<5>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently

2015-07-29 Thread Zhijie Shen (JIRA)

Zhijie Shen created YARN-3992:
-

 Summary: TestApplicationPriority.testApplicationPriorityAllocation 
fails intermittently
 Key: YARN-3992
 URL: https://issues.apache.org/jira/browse/YARN-3992
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen


{code}
java.lang.AssertionError: expected:<7> but was:<5>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-07-29 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646339#comment-14646339
 ] 

Zhijie Shen commented on YARN-3049:
---

TestApplicationPriority.testApplicationPriorityAllocation seems to have a race 
condition issue. I cannot reproduce it locally both on trunk or with on 
YARN-2928 with this patch. Anyway, it seems not to be related to this jira. 
Will file a separate Jira to track the test failure.

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
> YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-07-28 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3049:
--
Attachment: YARN-3049-YARN-2928.2.patch

After YARN-3908, I updated the patch according to the HBase write fixes. I've 
decoupled the wireup of rest APIs and worked towards a review ready HBase 
implementation patch. This patch will still include the implementation of 
writing and reading app2flow table, because without it, the reader may not work 
properly. Please let me know if you want to split it into two patch.

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
> YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-07-27 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643790#comment-14643790
 ] 

Zhijie Shen commented on YARN-3942:
---

Thanks for this work. Agree it's a good interim step between v1 and v2. I have 
a first scan of this patch, and am fine with the idea overall. As far as I can 
tell, the unsupported case is to get entities of the same type across 
applications. Other than that, the HDFS data path seems to work fine. [~jlowe], 
if you'd like to elaborate the drawback a bit, it will be helpful.

Will continue to review the patch, and post more detailed comments.

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-27 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3908.
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: YARN-2928

Committed the patch to branch YARN-2928. Thanks for the patch, Vrushali and 
Sangjin, as well as other folks for contributing your thoughts.

> Bugs in HBaseTimelineWriterImpl
> ---
>
> Key: YARN-3908
> URL: https://issues.apache.org/jira/browse/YARN-3908
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
> Fix For: YARN-2928
>
> Attachments: YARN-3908-YARN-2928.001.patch, 
> YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
> YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, 
> YARN-3908-YARN-2928.005.patch
>
>
> 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
> fields of a timeline entity plus events. However, entity#info map is not 
> stored at all.
> 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3984) Rethink event column key issue

2015-07-27 Thread Zhijie Shen (JIRA)

Zhijie Shen created YARN-3984:
-

 Summary: Rethink event column key issue
 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
 Fix For: YARN-2928


Currently, the event column key is event_id?info_key?timestamp, which is not so 
friendly to fetching all the events of an entity and sorting them in a 
chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
schema. I open this jira to continue the discussion about it which was 
commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-27 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643476#comment-14643476
 ] 

Zhijie Shen commented on YARN-3908:
---

Sure, as most folks are comfortable with the latest patch, let's get this in. 
I'll file a separate jira to track the discussion about event column key.

> Bugs in HBaseTimelineWriterImpl
> ---
>
> Key: YARN-3908
> URL: https://issues.apache.org/jira/browse/YARN-3908
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
> Attachments: YARN-3908-YARN-2928.001.patch, 
> YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
> YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, 
> YARN-3908-YARN-2928.005.patch
>
>
> 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
> fields of a timeline entity plus events. However, entity#info map is not 
> stored at all.
> 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-27 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643267#comment-14643267
 ] 

Zhijie Shen commented on YARN-3908:
---

Okay, it's fair point. It seems that the key design significantly depends on 
how we want to operate on the events. The current key design is most friendly 
to check if there exists the events who match the given event ID to match some 
given info key (and its value). But if you want to fetch everything that 
belongs to this event (our query needs to do this, as it's implicitly an atomic 
unit for now), it seems to be inevitable to scan through all these columns that 
have the given event ID (correct me if I'm wrong :-). If so, there seems to to 
have little gain from this key design, while complicating the event 
encapsulation logic.

And after rethinking of the current query to support (YARN-3051), I want to 
amend my suggestion. It seems to be more reasonable to use 
{{e!eventTimestamp?eventId?eventInfoKey}}, such that we can natively scan 
through the events of one entity one-by-one return them in a chronological 
order.

> Bugs in HBaseTimelineWriterImpl
> ---
>
> Key: YARN-3908
> URL: https://issues.apache.org/jira/browse/YARN-3908
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
> Attachments: YARN-3908-YARN-2928.001.patch, 
> YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
> YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, 
> YARN-3908-YARN-2928.005.patch
>
>
> 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
> fields of a timeline entity plus events. However, entity#info map is not 
> stored at all.
> 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-27 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643113#comment-14643113
 ] 

Zhijie Shen commented on YARN-3908:
---

[~vrushalic], thanks for fixing the problem. W.R.T the column key, shall we use:
{code}
e!eventId?eventTimestamp?eventInfoKey : eventInfoValue 
{code}

Image we have two KILL events: one on TS1 and the other on TS2. IMHO, we want 
to scan through the two events' columns one-by-one instead of in a interleaved 
manner. This will make reader to parse multiple events much easier and 
encapsulate them one after the other. It will be more useful in the future if 
we want to just retrieve part of the events of a big job (e.g. within a given 
time window or the most recent events). Thoughts?

> Bugs in HBaseTimelineWriterImpl
> ---
>
> Key: YARN-3908
> URL: https://issues.apache.org/jira/browse/YARN-3908
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
> Attachments: YARN-3908-YARN-2928.001.patch, 
> YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
> YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, 
> YARN-3908-YARN-2928.005.patch
>
>
> 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
> fields of a timeline entity plus events. However, entity#info map is not 
> stored at all.
> 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3981) support timeline clients not associated with an application

2015-07-27 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643056#comment-14643056
 ] 

Zhijie Shen commented on YARN-3981:
---

Thanks for filing the jira. I'm going to pick this up.

> support timeline clients not associated with an application
> ---
>
> Key: YARN-3981
> URL: https://issues.apache.org/jira/browse/YARN-3981
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>
> In the current v.2 design, all timeline writes must belong in a 
> flow/application context (cluster + user + flow + flow run + application).
> But there are use cases that require writing data outside the context of an 
> application. One such example is a higher level client (e.g. tez client or 
> hive/oozie/cascading client) writing flow-level data that spans multiple 
> applications. We need to find a way to support them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3981) support timeline clients not associated with an application

2015-07-27 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-3981:
-

Assignee: Zhijie Shen

> support timeline clients not associated with an application
> ---
>
> Key: YARN-3981
> URL: https://issues.apache.org/jira/browse/YARN-3981
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
>
> In the current v.2 design, all timeline writes must belong in a 
> flow/application context (cluster + user + flow + flow run + application).
> But there are use cases that require writing data outside the context of an 
> application. One such example is a higher level client (e.g. tez client or 
> hive/oozie/cascading client) writing flow-level data that spans multiple 
> applications. We need to find a way to support them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3949) ensure timely flush of timeline writes

2015-07-24 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640910#comment-14640910
 ] 

Zhijie Shen commented on YARN-3949:
---

IMHO, given write + flush, it's not necessary to have sync write and async 
write api at the writer level, while we already have the analogy at the app 
collector level. App collector level knows better than writer to decide if it 
should flush after one write, two writes or more.

The current approach seems to be good now, I propose to go with it, and unblock 
viewing the app timeline data after it gets finished. Thoughts? From my point 
of view, async write is more than the flush (e.g, queueing the entities in the 
collector, combining the updates of the same entity and etc.)

For the patch details:

1. "writer.flush.interval.seconds" \->  "writer.flush-interval-seconds". YARN 
convention is to use "." to separate namespaces (sub components) and "-" to 
concat words. Please move the default to YarnConfiguration as well, which is 
part of API, and ad this config to yarn-default.xml.

2. Shall we use shutdown and then waitTermination to gracefully stop the 
service? Otherwise, if there's a scheduled flush task that is running while the 
manager invokes writer.close(), will it cause any problem? Or is it just thread 
safe, such that we don't need to worry about it?

> ensure timely flush of timeline writes
> --
>
> Key: YARN-3949
> URL: https://issues.apache.org/jira/browse/YARN-3949
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-3949-YARN-2928.001.patch, 
> YARN-3949-YARN-2928.002.patch, YARN-3949-YARN-2928.002.patch
>
>
> Currently flushing of timeline writes is not really handled. For example, 
> {{HBaseTimelineWriterImpl}} relies on HBase's {{BufferedMutator}} to batch 
> and write puts asynchronously. However, {{BufferedMutator}} may not flush 
> them to HBase unless the internal buffer fills up.
> We do need a flush functionality first to ensure that data are written in a 
> reasonably timely manner, and to be able to ensure some critical writes are 
> done synchronously (e.g. key lifecycle events).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3949) ensure timely flush of timeline writes

2015-07-22 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638190#comment-14638190
 ] 

Zhijie Shen commented on YARN-3949:
---

The proposal looks good to me for now. We may need to revisit it if we'd like 
to support getting the real-time data later.

One question about the buffer: if for some reason the app collector has 
crashed, will this written, but unflushed data be lost?

> ensure timely flush of timeline writes
> --
>
> Key: YARN-3949
> URL: https://issues.apache.org/jira/browse/YARN-3949
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-3949-YARN-2928.001.patch
>
>
> Currently flushing of timeline writes is not really handled. For example, 
> {{HBaseTimelineWriterImpl}} relies on HBase's {{BufferedMutator}} to batch 
> and write puts asynchronously. However, {{BufferedMutator}} may not flush 
> them to HBase unless the internal buffer fills up.
> We do need a flush functionality first to ensure that data are written in a 
> reasonably timely manner, and to be able to ensure some critical writes are 
> done synchronously (e.g. key lifecycle events).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-22 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14637300#comment-14637300
 ] 

Zhijie Shen commented on YARN-3908:
---

bq. Is it the event id + timestamp? How about the event type? If you look at 
the equals() and the hashCode() implementations of TimelineEvent, it uses the 
timestamp, the event type, and even the info as a whole, but the id is not used 
for equality. How does that square with the stated intent that the event id and 
the timestamp form the identity?

There's no event type now. In v1, it's called type, but in v2 is renamed to id. 
We want to use id + ts to identify an event object uniquely to support the case 
that an event happens multiple times. And we can avoid the combination ID like 
"container_allocation_13421543243". Does this make sense?

bq. Is pretty much the only access pattern "give me all the events that belong 
to this entity"?

Yeah, get the events in chronological order of one entity, or just getting part 
of them via filtering.

bq. Two TimelineEvents are equal only if the timestamp is equal AND the type is 
equal AND the entire info maps are equal. What would we query by event type, 
timestamp and event info key? Do users always have to specify the timestamp?

There's no type, but only ID. In the current reader API, we cannot do 
sub-entity filtering, but in the future, we can try to support , for example, 
getting the events in a given time window. If two event has the same , 
but different info, we may consider them as the same event, but carry different 
information. The latter put one will append more k/v pairs or update the 
existing ones.

bq. Do we need to store only the latest event for each timestamp, or all of 
them? It would almost sound like the key should be type and timestamp, but what 
about the entire event info map?

In DB, i think proper logic is: if we put  and , we 
should have two separate records persisted; and if we put  and  again, we should update 
the same record and let k1=v1'.



> Bugs in HBaseTimelineWriterImpl
> ---
>
> Key: YARN-3908
> URL: https://issues.apache.org/jira/browse/YARN-3908
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
> Attachments: YARN-3908-YARN-2928.001.patch, 
> YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
> YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, 
> YARN-3908-YARN-2928.005.patch
>
>
> 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
> fields of a timeline entity plus events. However, entity#info map is not 
> stored at all.
> 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-07-20 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634352#comment-14634352
 ] 

Zhijie Shen commented on YARN-3049:
---

[~sjlee0], yeah, for POC purpose, I temporally do flush upon each put. I 
suspect it will significantly impact the write performance. We may need to sync 
on this issue

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
> YARN-3049-WIP.3.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-20 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634200#comment-14634200
 ] 

Zhijie Shen commented on YARN-3908:
---

I set up hbase-1.0.1.1 as a single node cluster on local FS, submit an MR job, 
after job got finished, I used the REST API (YARN-3049) to read the entity -> 
NOT FOUND and I used hbase shell to scan through the entity table -> NOT FOUND 
as well.

We may want to rethink of the buffer policy. It seems not to be a good user 
experience that after app is finished, the entity is still not available to 
users.

> Bugs in HBaseTimelineWriterImpl
> ---
>
> Key: YARN-3908
> URL: https://issues.apache.org/jira/browse/YARN-3908
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
> Attachments: YARN-3908-YARN-2928.001.patch, 
> YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
> YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, 
> YARN-3908-YARN-2928.005.patch
>
>
> 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
> fields of a timeline entity plus events. However, entity#info map is not 
> stored at all.
> 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-07-20 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3049:
--
Attachment: YARN-3049-WIP.3.patch

Upload a new WIP patch with some bug fixes, including the the two mentioned in 
YARN-3908.

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
> YARN-3049-WIP.3.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-20 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633857#comment-14633857
 ] 

Zhijie Shen commented on YARN-3908:
---

I found two more issues upon debugging the reader POC:

1. The events have been written into metrics column family.

2. The entity is not accessible immediately after a single put operation. 

> Bugs in HBaseTimelineWriterImpl
> ---
>
> Key: YARN-3908
> URL: https://issues.apache.org/jira/browse/YARN-3908
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
> Attachments: YARN-3908-YARN-2928.001.patch, 
> YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
> YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, 
> YARN-3908-YARN-2928.005.patch
>
>
> 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
> fields of a timeline entity plus events. However, entity#info map is not 
> stored at all.
> 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-07-16 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3049:
--
Attachment: YARN-3049-WIP.2.patch

[~sjlee0] and [~gtCarrera9], thanks for review the patch. I'm currently 
targeting an E2E reader POC, and I'll try to address your comments a bit later. 
I upload a new WIP patch, which basically makes the reader work E2E, while 
their are couple of bugs. I'll spend some more time to fix them.

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-15 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628708#comment-14628708
 ] 

Zhijie Shen commented on YARN-3814:
---

I didn't go beyond the current reader interface. You're safe:-)

> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3814-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-15 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628647#comment-14628647
 ] 

Zhijie Shen commented on YARN-3814:
---

[~varun_saxena], thanks for putting the patch. It seems that we have duplicate 
some work (I'm working on a POC for reader (YARN-3049) which contains some REST 
API hook too). I'll upload a POC patch a bit latter. Let's consolidate them.

> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3814-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-13 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625663#comment-14625663
 ] 

Zhijie Shen commented on YARN-3116:
---

Congrats on your first patch, [~giovanni.fumarola]!

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Fix For: 2.8.0
>
> Attachments: YARN-3116.patch, YARN-3116.v10.patch, 
> YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, 
> YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, 
> YARN-3116.v8.patch, YARN-3116.v9.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-13 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625397#comment-14625397
 ] 

Zhijie Shen commented on YARN-3908:
---

Yeah, but the method based on metric value number is not guaranteed, are we 
okay with it?

> Bugs in HBaseTimelineWriterImpl
> ---
>
> Key: YARN-3908
> URL: https://issues.apache.org/jira/browse/YARN-3908
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
> Attachments: YARN-3908-YARN-2928.001.patch, 
> YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch
>
>
> 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
> fields of a timeline entity plus events. However, entity#info map is not 
> stored at all.
> 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-13 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625369#comment-14625369
 ] 

Zhijie Shen commented on YARN-3908:
---

[~vrushalic] and [~sjlee0], thanks for helping fix the problems. I've two 
questions:

1. In fact, I'm wondering if we should but info and events into a separate 
column family like what we did for configs/metrics?

2. We don't want to store the metric type, do we?

> Bugs in HBaseTimelineWriterImpl
> ---
>
> Key: YARN-3908
> URL: https://issues.apache.org/jira/browse/YARN-3908
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
> Attachments: YARN-3908-YARN-2928.001.patch, 
> YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch
>
>
> 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
> fields of a timeline entity plus events. However, entity#info map is not 
> stored at all.
> 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-07-13 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3049:
--
Attachment: YARN-3049-WIP.1.patch

Attache a WIP patch so that the community can take a look while I still need to 
add the app->flow mapping and some missing fields.

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3049-WIP.1.patch
>
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-10 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623176#comment-14623176
 ] 

Zhijie Shen commented on YARN-3116:
---

+1 for the last patch. Will commit it after jenkins comments.

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v10.patch, 
> YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, 
> YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, 
> YARN-3116.v8.patch, YARN-3116.v9.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3914) Entity created time should be part of the row key of entity table

2015-07-10 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623170#comment-14623170
 ] 

Zhijie Shen commented on YARN-3914:
---

This will not block the implementation of getEntities (YARN-3049), but the 
performance will be bad without it, especially when the number of entities per 
type per app becomes huge, i.e., there's a big job.

> Entity created time should be part of the row key of entity table
> -
>
> Key: YARN-3914
> URL: https://issues.apache.org/jira/browse/YARN-3914
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Entity created time should be part of the row key of entity table, between 
> entity type and entity Id. The reason to have it is to index the entities. 
> Though we cannot index the entities for all kinds of information, indexing 
> them according to the created time is very necessary. Without it, every query 
> for the latest entities that belong to an application and a type will scan 
> through all the entities that belong to them. For example, if we want to list 
> the 100 latest started containers in an YARN app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3914) Entity created time should be part of the row key of entity table

2015-07-10 Thread Zhijie Shen (JIRA)

Zhijie Shen created YARN-3914:
-

 Summary: Entity created time should be part of the row key of 
entity table
 Key: YARN-3914
 URL: https://issues.apache.org/jira/browse/YARN-3914
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


Entity created time should be part of the row key of entity table, between 
entity type and entity Id. The reason to have it is to index the entities. 
Though we cannot index the entities for all kinds of information, indexing them 
according to the created time is very necessary. Without it, every query for 
the latest entities that belong to an application and a type will scan through 
all the entities that belong to them. For example, if we want to list the 100 
latest started containers in an YARN app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-10 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622998#comment-14622998
 ] 

Zhijie Shen commented on YARN-3116:
---

one nit: can we move ContainerType to server/api?

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
> YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, 
> YARN-3116.v7.patch, YARN-3116.v8.patch, YARN-3116.v9.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-10 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622859#comment-14622859
 ] 

Zhijie Shen commented on YARN-3116:
---

Sure, I'll review the latest patch this afternoon.

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
> YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, 
> YARN-3116.v7.patch, YARN-3116.v8.patch, YARN-3116.v9.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-09 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621592#comment-14621592
 ] 

Zhijie Shen commented on YARN-3908:
---

1. TimelineEvent has a timestamp associated with it. It tells us when the event 
happened. We should have this information persisted, but unfortunately it seems 
not.

2. Metric doesn't have a timestamp because the timestamp is associated with 
each individual value.

3. I also realized that the metric type is not persisted too. Now I just assume 
if size(metric) > 1 => time series, else => single value in reader 
implementation. But it may not be guaranteed.


> Bugs in HBaseTimelineWriterImpl
> ---
>
> Key: YARN-3908
> URL: https://issues.apache.org/jira/browse/YARN-3908
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
>
> 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
> fields of a timeline entity plus events. However, entity#info map is not 
> stored at all.
> 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-09 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621581#comment-14621581
 ] 

Zhijie Shen commented on YARN-3116:
---

[~kkaranasos], I didn't touch the detail on YARN-2884, but it seems to be the 
API change that needs to be exposed to the users. In this case, user faced 
objects, i.e., ContainerLaunchContext, is the better choice for you.

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
> YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, 
> YARN-3116.v7.patch, YARN-3116.v8.patch, YARN-3116.v9.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model

2015-07-09 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621545#comment-14621545
 ] 

Zhijie Shen commented on YARN-3836:
---

+1 LGTM

> add equals and hashCode to TimelineEntity and other classes in the data model
> -
>
> Key: YARN-3836
> URL: https://issues.apache.org/jira/browse/YARN-3836
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Attachments: YARN-3836-YARN-2928.001.patch, 
> YARN-3836-YARN-2928.002.patch, YARN-3836-YARN-2928.003.patch, 
> YARN-3836-YARN-2928.004.patch
>
>
> Classes in the data model API (e.g. {{TimelineEntity}}, 
> {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
> {{hashCode()}}. This can cause problems when these objects are used in a 
> collection such as a {{HashSet}}. We should implement these methods wherever 
> appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-09 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621480#comment-14621480
 ] 

Zhijie Shen edited comment on YARN-3908 at 7/10/15 12:23 AM:
-

It's blocking the reader interface implementation now.

Assign it to [~vrushalic] by default. Please feel free to rebalance the 
workload.


was (Author: zjshen):
Assign it to [~vrushalic] by default. Please feel free to rebalance the 
workload.

> Bugs in HBaseTimelineWriterImpl
> ---
>
> Key: YARN-3908
> URL: https://issues.apache.org/jira/browse/YARN-3908
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
>
> 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
> fields of a timeline entity plus events. However, entity#info map is not 
> stored at all.
> 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-09 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621480#comment-14621480
 ] 

Zhijie Shen commented on YARN-3908:
---

Assign it to [~vrushalic] by default. Please feel free to rebalance the 
workload.

> Bugs in HBaseTimelineWriterImpl
> ---
>
> Key: YARN-3908
> URL: https://issues.apache.org/jira/browse/YARN-3908
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
>
> 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
> fields of a timeline entity plus events. However, entity#info map is not 
> stored at all.
> 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-09 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3908:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-2928

> Bugs in HBaseTimelineWriterImpl
> ---
>
> Key: YARN-3908
> URL: https://issues.apache.org/jira/browse/YARN-3908
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
>
> 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
> fields of a timeline entity plus events. However, entity#info map is not 
> stored at all.
> 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-09 Thread Zhijie Shen (JIRA)

Zhijie Shen created YARN-3908:
-

 Summary: Bugs in HBaseTimelineWriterImpl
 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C


1. In HBaseTimelineWriterImpl, the info column family contains the basic fields 
of a timeline entity plus events. However, entity#info map is not stored at all.

2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model

2015-07-09 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621209#comment-14621209
 ] 

Zhijie Shen commented on YARN-3836:
---

bq. l.550: It sounds like now the type takes precedence over the created time 
in the sort order in this version. Is this intended? If not (timestamp is 
supposed to be first), it might be a good idea to have Identifier implement 
Comparable as well and use that in TimelineEntity.compareTo().

Currently getEntities supports only return the entities of a single entity 
type, such that the ordering among them won't be affected by the entity type. 
In general, it's seem to be more natural to put entities of the same type close 
to each other. For example, we can merge to the collection of entities returned 
from multiple getEntities queries to imitate fetching entities of multiple 
entity types. In case that we have the specific use case (e.g., we want to 
order entities globally across type), it should be fine and not expensive to 
define a customized comparator to do it.

> add equals and hashCode to TimelineEntity and other classes in the data model
> -
>
> Key: YARN-3836
> URL: https://issues.apache.org/jira/browse/YARN-3836
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Attachments: YARN-3836-YARN-2928.001.patch, 
> YARN-3836-YARN-2928.002.patch, YARN-3836-YARN-2928.003.patch
>
>
> Classes in the data model API (e.g. {{TimelineEntity}}, 
> {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
> {{hashCode()}}. This can cause problems when these objects are used in a 
> collection such as a {{HashSet}}. We should implement these methods wherever 
> appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table

2015-07-09 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621051#comment-14621051
 ] 

Zhijie Shen commented on YARN-3901:
---

Yeah, I have dependency on this table for reader. If nobody is working on this 
table, I can take care of it.

> Populate flow run data in the flow_run table
> 
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-09 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621034#comment-14621034
 ] 

Zhijie Shen commented on YARN-3116:
---

[~kkaranasos], thanks for notifying us of YARN-2882. I took a quick look at the 
jira. Our approach seems to be similar, but it seems that we're on parallel 
tracks. While YARN-2882 defines two container type for container related API so 
as to differ the container request to RM or NM, what we want to label a 
container here aims to let NM know if the container hosts AM or not. This is 
completely internal information, and users are blind to this type and also not 
able to set/change it. And this is why we propose to pass this information via 
ContainerTokenIndentifier instead of ContainerLaunchContext. Thoughts?

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
> YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, 
> YARN-3116.v7.patch, YARN-3116.v8.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table

2015-07-08 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619759#comment-14619759
 ] 

Zhijie Shen commented on YARN-3901:
---

[~vrushalic], just want to confirm with you that the jira won't cover app_flow 
table, right?

I need to flow mapping for implementing the reader apis against HBase backend. 
If it's not covered here, I can help to implement it in the scope of YARN-3049.

> Populate flow run data in the flow_run table
> 
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model

2015-07-08 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619729#comment-14619729
 ] 

Zhijie Shen commented on YARN-3836:
---

bq. I see that we're implementing the Comparable interface for all 3 types. I'm 
wondering if it makes sense for them. What would it mean to order 
TimelineEntity instances? Does it mean much? Where would it be useful? Do we 
need to implement it? The same questions go for the other 2 types...

For example, compareTo of TimelineEntity is used to order the entities in the 
return set of getEntities query. It would be better to return the entities 
ordered by timestamp instead of randomly.

bq. his is an open question. Is the id alone the identity or does the timestamp 
together form the identity? Do we expect users of TimelineEvent always be able 
to provide the timestamp? Honestly I'm not 100% sure what the contract is, and 
we probably want to make it explicit (and add it to the javadoc). Thoughts?

In ATS v1, we actually use id + timestamp to uniquely identify an event. On 
merit of doing this is to let the app to put the same event multiple times. For 
example, a job can request resource many times. Every time it can put a 
RESOURCE_REQUEST event with a unique timestamp and fill in the resource 
information.

> add equals and hashCode to TimelineEntity and other classes in the data model
> -
>
> Key: YARN-3836
> URL: https://issues.apache.org/jira/browse/YARN-3836
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Attachments: YARN-3836-YARN-2928.001.patch
>
>
> Classes in the data model API (e.g. {{TimelineEntity}}, 
> {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
> {{hashCode()}}. This can cause problems when these objects are used in a 
> collection such as a {{HashSet}}. We should implement these methods wherever 
> appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-08 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3116:
--
Attachment: YARN-3116.v8.patch

Fixed TestAppRunnability as well in the new patch.

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
> YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, 
> YARN-3116.v7.patch, YARN-3116.v8.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-08 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619564#comment-14619564
 ] 

Zhijie Shen commented on YARN-3116:
---

Xuan, thanks for your comment. I think this is a good point. To be forward 
compatible, it's better to use the enum here instead of the boolean flag. In 
this case, we can add more enum, such as SystemContainer and so on in the 
future without adding new flag and breaking the compatibility. 
[~giovanni.fumarola], [~subru], how do you think?

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
> YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-08 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619518#comment-14619518
 ] 

Zhijie Shen commented on YARN-3116:
---

Is TestAppRunnability failure related to this patch? The normal practice is to 
check if the test failure is related to the code change in this jira. If not, 
you can go ahead to fix a separate jira to tackling it.

Thanks for fixing TestPrivilegedOperationExecutor. It seems to be 
straightforward. So let's keep it here.

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
> YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-07-08 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619440#comment-14619440
 ] 

Zhijie Shen commented on YARN-3047:
---

Thanks for kicking another jenkins build. IAC, the patch looks good to me.

> [Data Serving] Set up ATS reader with basic request serving structure and 
> lifecycle
> ---
>
> Key: YARN-3047
> URL: https://issues.apache.org/jira/browse/YARN-3047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: Timeline_Reader(draft).pdf, 
> YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, 
> YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, 
> YARN-3047-YARN-2928.12.patch, YARN-3047-YARN-2928.13.patch, 
> YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, 
> YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, 
> YARN-3047.04.patch
>
>
> Per design in YARN-2938, set up the ATS reader as a service and implement the 
> basic structure as a service. It includes lifecycle management, request 
> serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-07-08 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619347#comment-14619347
 ] 

Zhijie Shen commented on YARN-3049:
---

Updated the title accordingly to describe the scope of this jira more 
accurately.

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-07-08 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3049:
--
Summary: [Storage Implementation] Implement storage reader interface to 
fetch raw data from HBase backend  (was: [Storage Implementation] Implement the 
storage reader interface to fetch raw data)

> [Storage Implementation] Implement storage reader interface to fetch raw data 
> from HBase backend
> 
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3049) [Storage Implementation] Implement the storage reader interface to fetch raw data

2015-07-08 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3049:
--
Summary: [Storage Implementation] Implement the storage reader interface to 
fetch raw data  (was: [Compatiblity] Implement existing ATS queries in the new 
ATS design)

> [Storage Implementation] Implement the storage reader interface to fetch raw 
> data
> -
>
> Key: YARN-3049
> URL: https://issues.apache.org/jira/browse/YARN-3049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
>
> Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-07 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3116:
--
Attachment: YARN-3116.v6.patch

Fixed the test failure in the new patch. Otherwise, the previous patch looks 
good to me. As I'm touching the patch also, I need a second committer to take a 
look. [~jianhe], would you mind doing me a favor?

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
> YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-07-07 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617553#comment-14617553
 ] 

Zhijie Shen commented on YARN-3047:
---

It looks good to me overall, exception the config. Please let me know if I've 
missed something: the new configuration name is and the old configuration 
default value are used together. Why do we want the combination?
{code}
276 if (YarnConfiguration.useHttps(conf)) {
277   return 
conf.get(YarnConfiguration.TIMELINE_READER_WEBAPP_HTTPS_ADDRESS,
278   
YarnConfiguration.DEFAULT_TIMELINE_SERVICE_WEBAPP_HTTPS_ADDRESS);
279 } else {
280   return conf.get(YarnConfiguration.TIMELINE_READER_WEBAPP_ADDRESS,
281  YarnConfiguration.DEFAULT_TIMELINE_SERVICE_WEBAPP_ADDRESS);
282 }
{code}

Can't we just reuse the existing config "timeline_service_webapp" instead of 
creating a new? In fact, user is bind of writer. They just know 
timeline_service_webapp is where they can access the data.

> [Data Serving] Set up ATS reader with basic request serving structure and 
> lifecycle
> ---
>
> Key: YARN-3047
> URL: https://issues.apache.org/jira/browse/YARN-3047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: Timeline_Reader(draft).pdf, 
> YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, 
> YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, 
> YARN-3047-YARN-2928.12.patch, YARN-3047.001.patch, YARN-3047.003.patch, 
> YARN-3047.005.patch, YARN-3047.006.patch, YARN-3047.007.patch, 
> YARN-3047.02.patch, YARN-3047.04.patch
>
>
> Per design in YARN-2938, set up the ATS reader as a service and implement the 
> basic structure as a service. It includes lifecycle management, request 
> serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-07-07 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617307#comment-14617307
 ] 

Zhijie Shen commented on YARN-3047:
---

Would you please hold a while? I plan to take a look this afternoon.

> [Data Serving] Set up ATS reader with basic request serving structure and 
> lifecycle
> ---
>
> Key: YARN-3047
> URL: https://issues.apache.org/jira/browse/YARN-3047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: Timeline_Reader(draft).pdf, 
> YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, 
> YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, 
> YARN-3047-YARN-2928.12.patch, YARN-3047.001.patch, YARN-3047.003.patch, 
> YARN-3047.005.patch, YARN-3047.006.patch, YARN-3047.007.patch, 
> YARN-3047.02.patch, YARN-3047.04.patch
>
>
> Per design in YARN-2938, set up the ATS reader as a service and implement the 
> basic structure as a service. It includes lifecycle management, request 
> serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-07-06 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616031#comment-14616031
 ] 

Zhijie Shen commented on YARN-3047:
---

YARN-3051 has been committed. Would you please update the jira?

> [Data Serving] Set up ATS reader with basic request serving structure and 
> lifecycle
> ---
>
> Key: YARN-3047
> URL: https://issues.apache.org/jira/browse/YARN-3047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: Timeline_Reader(draft).pdf, 
> YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, 
> YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, 
> YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, 
> YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, 
> YARN-3047.04.patch
>
>
> Per design in YARN-2938, set up the ATS reader as a service and implement the 
> basic structure as a service. It includes lifecycle management, request 
> serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-07-06 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3047:
--
Labels:   (was: BB2015-05-TBR)

> [Data Serving] Set up ATS reader with basic request serving structure and 
> lifecycle
> ---
>
> Key: YARN-3047
> URL: https://issues.apache.org/jira/browse/YARN-3047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: Timeline_Reader(draft).pdf, 
> YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, 
> YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, 
> YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, 
> YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, 
> YARN-3047.04.patch
>
>
> Per design in YARN-2938, set up the ATS reader as a service and implement the 
> basic structure as a service. It includes lifecycle management, request 
> serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-07-06 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615930#comment-14615930
 ] 

Zhijie Shen commented on YARN-3051:
---

Will commit the patch late today if no more comments.

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051-YARN-2928.003.patch, 
> YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, 
> YARN-3051-YARN-2928.05.patch, YARN-3051-YARN-2928.06.patch, 
> YARN-3051-YARN-2928.07.patch, YARN-3051-YARN-2928.08.patch, 
> YARN-3051.Reader_API.patch, YARN-3051.Reader_API_1.patch, 
> YARN-3051.Reader_API_2.patch, YARN-3051.Reader_API_3.patch, 
> YARN-3051.Reader_API_4.patch, YARN-3051.wip.02.YARN-2928.patch, 
> YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-07-06 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615839#comment-14615839
 ] 

Zhijie Shen commented on YARN-3051:
---

Okay, then it seems to be fine. I didn't notice it's per cluster based mapping 
file.

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051-YARN-2928.003.patch, 
> YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, 
> YARN-3051-YARN-2928.05.patch, YARN-3051-YARN-2928.06.patch, 
> YARN-3051-YARN-2928.07.patch, YARN-3051-YARN-2928.08.patch, 
> YARN-3051.Reader_API.patch, YARN-3051.Reader_API_1.patch, 
> YARN-3051.Reader_API_2.patch, YARN-3051.Reader_API_3.patch, 
> YARN-3051.Reader_API_4.patch, YARN-3051.wip.02.YARN-2928.patch, 
> YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-07-06 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615792#comment-14615792
 ] 

Zhijie Shen commented on YARN-3051:
---

bq. The current FS implementation had cluster as part of the path. So there 
will a app_flow_mapping.csv for each cluster. So in a way it is part of the 
primary key even though its not there in app_flow_mapping.csv
I hope that is what your concern was.

The problem is about write path. Suppose we unfortunately have the duplicate 
appId: one is clusterId1/appId and the other is clusterId2/appId. When the 
former entity is written, you have added appId into the mapping file. How do 
you write the mapping file upon cluster2/appId? Overwriting the row of appId? 
Appending one more row of appId? Both will trouble you when finding the right 
flow info when the query has default values.

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051-YARN-2928.003.patch, 
> YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, 
> YARN-3051-YARN-2928.05.patch, YARN-3051-YARN-2928.06.patch, 
> YARN-3051-YARN-2928.07.patch, YARN-3051-YARN-2928.08.patch, 
> YARN-3051.Reader_API.patch, YARN-3051.Reader_API_1.patch, 
> YARN-3051.Reader_API_2.patch, YARN-3051.Reader_API_3.patch, 
> YARN-3051.Reader_API_4.patch, YARN-3051.wip.02.YARN-2928.patch, 
> YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-07-06 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615747#comment-14615747
 ] 

Zhijie Shen commented on YARN-3051:
---

Hi Varun, thanks for updating the patch. I have only one remaining issue about 
this patch:

According to 
https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf.
 It seems that we have chosen clusterId + appId to globally find a unique flow 
run. I think here we should do it similar by adding clusterId, which 's 
mandatory field. /cc [~sjlee0].

Some other improvement that are required in the future to improve robustness 
and performance. Let's make sure we have a jira to improve the reader later.

1. Maybe we want to cache the mapping instead of reading it from the file for 
every query.
2. limit should be push down into the for loop. It's unnecessary that if we 
want to just retrieve 10 entities, we will have to go through 1000 qualified 
candidates and finally pick the top 10.
3. We'd better avoid hard code "/" as the path separator, and we should use 
FileSystem interface to operate the files, such that the impl can also work 
with HDFS.

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051-YARN-2928.003.patch, 
> YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, 
> YARN-3051-YARN-2928.05.patch, YARN-3051-YARN-2928.06.patch, 
> YARN-3051-YARN-2928.07.patch, YARN-3051-YARN-2928.08.patch, 
> YARN-3051.Reader_API.patch, YARN-3051.Reader_API_1.patch, 
> YARN-3051.Reader_API_2.patch, YARN-3051.Reader_API_3.patch, 
> YARN-3051.Reader_API_4.patch, YARN-3051.wip.02.YARN-2928.patch, 
> YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-07-06 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615387#comment-14615387
 ] 

Zhijie Shen commented on YARN-3051:
---

How about we using common csv lib to handle the lookup file?

http://commons.apache.org/proper/commons-csv/index.html

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051-YARN-2928.003.patch, 
> YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, 
> YARN-3051-YARN-2928.05.patch, YARN-3051-YARN-2928.06.patch, 
> YARN-3051-YARN-2928.07.patch, YARN-3051.Reader_API.patch, 
> YARN-3051.Reader_API_1.patch, YARN-3051.Reader_API_2.patch, 
> YARN-3051.Reader_API_3.patch, YARN-3051.Reader_API_4.patch, 
> YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3881) Writing RM cluster-level metrics

2015-07-02 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612638#comment-14612638
 ] 

Zhijie Shen commented on YARN-3881:
---

Once the metrics are ready, we can build YARN/timeline service builtin webUI to 
show this information, as well as expose it via API, such that third party 
monitoring like ambari can integrate with it. I think it should be quite 
flexible.

> Writing RM cluster-level metrics
> 
>
> Key: YARN-3881
> URL: https://issues.apache.org/jira/browse/YARN-3881
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: metrics.json
>
>
> RM has a bunch of metrics that we may want to write into the timeline backend 
> to. I attached the metrics.json that I've crawled via 
> {{http://localhost:8088/jmx?qry=Hadoop:*}}. IMHO, we need to pay attention to 
> three groups of metrics:
> 1. QueueMetrics
> 2. JvmMetrics
> 3. ClusterMetrics
> The problem is that unlike other metrics belongs to a single application, 
> these ones belongs to RM or cluster-wide. Therefore, current write path is 
> not going to work for these metrics because they don't have the associated 
> user/flow/app context info. We need to rethink of modeling cross-app metrics 
> and the api to handle them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3881) Writing RM cluster-level metrics

2015-07-02 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612429#comment-14612429
 ] 

Zhijie Shen commented on YARN-3881:
---

IMHO, we need to add an addition API to direct write the cross app metrics (or 
already aggregated metrics, if you think of these ones are actually the 
aggregated data of each individual app, such as the counters of 
submitted/pending/running apps) to the backend, in the separate tables, such as 
cluster/queue/user tables, and these data don't need to be aggregated any more.

> Writing RM cluster-level metrics
> 
>
> Key: YARN-3881
> URL: https://issues.apache.org/jira/browse/YARN-3881
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: metrics.json
>
>
> RM has a bunch of metrics that we may want to write into the timeline backend 
> to. I attached the metrics.json that I've crawled via 
> {{http://localhost:8088/jmx?qry=Hadoop:*}}. IMHO, we need to pay attention to 
> three groups of metrics:
> 1. QueueMetrics
> 2. JvmMetrics
> 3. ClusterMetrics
> The problem is that unlike other metrics belongs to a single application, 
> these ones belongs to RM or cluster-wide. Therefore, current write path is 
> not going to work for these metrics because they don't have the associated 
> user/flow/app context info. We need to rethink of modeling cross-app metrics 
> and the api to handle them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3881) Writing RM cluster-level metrics

2015-07-02 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3881:
--
Attachment: metrics.json

> Writing RM cluster-level metrics
> 
>
> Key: YARN-3881
> URL: https://issues.apache.org/jira/browse/YARN-3881
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: metrics.json
>
>
> RM has a bunch of metrics that we may want to write into the timeline backend 
> to. I attached the metrics.json that I've crawled via 
> {{http://localhost:8088/jmx?qry=Hadoop:*}}. IMHO, we need to pay attention to 
> three groups of metrics:
> 1. QueueMetrics
> 2. JvmMetrics
> 3. ClusterMetrics
> The problem is that unlike other metrics belongs to a single application, 
> these ones belongs to RM or cluster-wide. Therefore, current write path is 
> not going to work for these metrics because they don't have the associated 
> user/flow/app context info. We need to rethink of modeling cross-app metrics 
> and the api to handle them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3881) Writing RM cluster-level metrics

2015-07-02 Thread Zhijie Shen (JIRA)

Zhijie Shen created YARN-3881:
-

 Summary: Writing RM cluster-level metrics
 Key: YARN-3881
 URL: https://issues.apache.org/jira/browse/YARN-3881
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


RM has a bunch of metrics that we may want to write into the timeline backend 
to. I attached the metrics.json that I've crawled via 
{{http://localhost:8088/jmx?qry=Hadoop:*}}. IMHO, we need to pay attention to 
three groups of metrics:

1. QueueMetrics
2. JvmMetrics
3. ClusterMetrics

The problem is that unlike other metrics belongs to a single application, these 
ones belongs to RM or cluster-wide. Therefore, current write path is not going 
to work for these metrics because they don't have the associated user/flow/app 
context info. We need to rethink of modeling cross-app metrics and the api to 
handle them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3880) Writing more RM side app-level metrics

2015-07-02 Thread Zhijie Shen (JIRA)

Zhijie Shen created YARN-3880:
-

 Summary: Writing more RM side app-level metrics
 Key: YARN-3880
 URL: https://issues.apache.org/jira/browse/YARN-3880
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


In YARN-3044, we implemented an analog of metrics publisher for ATS v1. While 
it helps to write app/attempt/container life cycle events, it really doesn't 
write  as many app-level system metrics that RM are now having.  Just list the 
metrics that I found missing:

* runningContainers
* memorySeconds
* vcoreSeconds
* preemptedResourceMB
* preemptedResourceVCores
* numNonAMContainerPreempted
* numAMContainerPreempted

Please feel fee to add more into the list if you find it's not covered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-07-02 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612113#comment-14612113
 ] 

Zhijie Shen commented on YARN-3051:
---

2. I meant we store  in a CSV file. Thoughts?

3. I think FS impl related config shouldn't be put in api as the impl not 
supposed to be used by public, but for test purpose.

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051-YARN-2928.003.patch, 
> YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, 
> YARN-3051-YARN-2928.05.patch, YARN-3051-YARN-2928.06.patch, 
> YARN-3051.Reader_API.patch, YARN-3051.Reader_API_1.patch, 
> YARN-3051.Reader_API_2.patch, YARN-3051.Reader_API_3.patch, 
> YARN-3051.Reader_API_4.patch, YARN-3051.wip.02.YARN-2928.patch, 
> YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-01 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611445#comment-14611445
 ] 

Zhijie Shen commented on YARN-3116:
---

1. I think normal execution won't have null attempt, but the tests have omitted 
it. You probably want to fix the test code instead, such as mock the 
currentAttempt and fix Application#submmit to add the attempt to rmapp.

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
> YARN-3116.v4.patch
>
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2756 matches

Mail list logo