[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522766#comment-14522766
 ] 

Vinod Kumar Vavilapalli commented on YARN-3481:
---

bq. Vinod Kumar Vavilapalli, it looks like YARN-2965 is very similar to this. 
Actually, this also looks like a clone to YARN-1012. Anyway, from what I 
understand, those JIRAs want to send utilization metrics in the heartbeat and 
that's pretty much what I'm targeting here. My current prototype extends 
ContainersMonitorImpl and puts this information into the NodeHealthStatus. I 
think I could do that in any of those JIRAs. 
Okay, I am going to assign YARN-1012 to you and close this as dup. Will also 
make YARN-3534 a sub-task of YARN-1011.

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-3481
> URL: https://issues.apache.org/jira/browse/YARN-3481
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Inigo Goiri
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> To allow the RM take better scheduling decisions, it should be aware of the 
> actual utilization of the containers. The NM would aggregate the 
> ContainerMetrics and report it in every heartbeat.
> Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-30 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521936#comment-14521936
 ] 

Karthik Kambatla commented on YARN-3481:


I have been working with [~rgrandl] on YARN-2965 (he shared his Tetris code 
privately). YARN-2965 aims to expose more than just CPU and memory - disk 
in/out bandwidth and network in/out bandwidth. I think it is okay to capture 
CPU and memory here, and add the remaining items in the context of that JIRA.



> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-3481
> URL: https://issues.apache.org/jira/browse/YARN-3481
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Inigo Goiri
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> To allow the RM take better scheduling decisions, it should be aware of the 
> actual utilization of the containers. The NM would aggregate the 
> ContainerMetrics and report it in every heartbeat.
> Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-29 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520922#comment-14520922
 ] 

Inigo Goiri commented on YARN-3481:
---

[~vinodkv], it looks like YARN-2965 is very similar to this. Actually, this 
also looks like a clone to YARN-1012. Anyway, from what I understand, those 
JIRAs want to send utilization metrics in the heartbeat and that's pretty much 
what I'm targeting here. My current prototype extends ContainersMonitorImpl and 
puts this information into the NodeHealthStatus. I think I could do that in any 
of those JIRAs. 

For now, I'm pushing the implementation of NodeResourceMonitor (YARN-3534) 
which will add a ResourceUtilization entity so I'm not making progress here 
yet. Once I'm done with that one, I can move to any of the other two JIRAs 
instead of keep moving forward with this one.

Please, let me know where you guys think it'd be better to push for this (even 
YARN-3332 is a possibility).

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-3481
> URL: https://issues.apache.org/jira/browse/YARN-3481
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Inigo Goiri
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> To allow the RM take better scheduling decisions, it should be aware of the 
> actual utilization of the containers. The NM would aggregate the 
> ContainerMetrics and report it in every heartbeat.
> Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520825#comment-14520825
 ] 

Vinod Kumar Vavilapalli commented on YARN-3481:
---

Sorry, didn't realize before. This is a dup of YARN-2965?

IAC, as I commented on YARN-2965, this is getting increasingly messy: we keep 
re-measuring the same thing in multiple places - ContainersMonitorImpl 
(existing code), Timeline Service Next-gen (YARN-2928), and this JIRA.

[~elgoiri], can you please look at my doc at YARN-3332?

[~vvasudev], please look at this - this is relevant to some of the efforts you 
are doing to expose metrics (for e.g. YARN-3503). I think we need to lay the 
groundwork for unified stats collection to avoid this repeated measurements.

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-3481
> URL: https://issues.apache.org/jira/browse/YARN-3481
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Inigo Goiri
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> To allow the RM take better scheduling decisions, it should be aware of the 
> actual utilization of the containers. The NM would aggregate the 
> ContainerMetrics and report it in every heartbeat.
> Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-23 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510492#comment-14510492
 ] 

Inigo Goiri commented on YARN-3481:
---

It requires the implementation of ResourceUtilization.

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-3481
> URL: https://issues.apache.org/jira/browse/YARN-3481
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Inigo Goiri
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> To allow the RM take better scheduling decisions, it should be aware of the 
> actual utilization of the containers. The NM would aggregate the 
> ContainerMetrics and report it in every heartbeat.
> Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-23 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510490#comment-14510490
 ] 

Inigo Goiri commented on YARN-3481:
---

YARN-3534 will be the patch providing ResourceUtilization.

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-3481
> URL: https://issues.apache.org/jira/browse/YARN-3481
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Inigo Goiri
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> To allow the RM take better scheduling decisions, it should be aware of the 
> actual utilization of the containers. The NM would aggregate the 
> ContainerMetrics and report it in every heartbeat.
> Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-20 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503648#comment-14503648
 ] 

Inigo Goiri commented on YARN-3481:
---

[~vinodkc] thanks for fixing the JIRA; these are my first tries and I'm not 
aware of what is the best practice.

Regarding the approach to send information, I agree that creating a new type 
(ResourceUtilization is the proposal) might be the best way to go. I started 
implementing it this way and it's basically a copy of Resource for now with a 
float for the VCores. I'll try to keep both up to sync but it might be 
challenging. Is there anything we can do to keep them coupled? What about 
ResourceUtilizaiton extending Resource? It doesn't look clean to me so I hope 
there might be a better way.

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-3481
> URL: https://issues.apache.org/jira/browse/YARN-3481
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Inigo Goiri
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> To allow the RM take better scheduling decisions, it should be aware of the 
> actual utilization of the containers. The NM would aggregate the 
> ContainerMetrics and report it in every heartbeat.
> Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-20 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503384#comment-14503384
 ] 

Carlo Curino commented on YARN-3481:


I agree with [~vinodkv]... let's go for the new ResourceUtilization record, it 
allows you to evolve independently of whatever policing it is done for 
Resource. Try to keep the code structure/style very consistent, as the two are 
closely related.

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-3481
> URL: https://issues.apache.org/jira/browse/YARN-3481
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Inigo Goiri
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> To allow the RM take better scheduling decisions, it should be aware of the 
> actual utilization of the containers. The NM would aggregate the 
> ContainerMetrics and report it in every heartbeat.
> Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-20 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1450#comment-1450
 ] 

Vinod Kumar Vavilapalli commented on YARN-3481:
---

We should decouple the records exchanged between servers from the ones used by 
the users. Tying them together inhibits decoupled evolution.

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-3481
> URL: https://issues.apache.org/jira/browse/YARN-3481
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Inigo Goiri
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> To allow the RM take better scheduling decisions, it should be aware of the 
> actual utilization of the containers. The NM would aggregate the 
> ContainerMetrics and report it in every heartbeat.
> Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-16 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499137#comment-14499137
 ] 

Inigo Goiri commented on YARN-3481:
---

I could cover the first one (creating a ResourceUtilization) in this JIRA. It 
replicates some of the code of Resource but I guess is fine.

For the second one, I think we should tackle this in a different JIRA as it has 
a deeper impact in other parts of the code. To minimize this impact, we could 
keep the current interfaces (with ints), change the stored type from int to 
double, and add the interfaces with doubles. Obviously, this is not superclean 
but it minimizes possible problems.

I'm open to implement either one.

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-3481
> URL: https://issues.apache.org/jira/browse/YARN-3481
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Inigo Goiri
>Priority: Minor
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> To allow the RM take better scheduling decisions, it should be aware of the 
> actual utilization of the containers. The NM would aggregate the 
> ContainerMetrics and report it in every heartbeat.
> Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499037#comment-14499037
 ] 

Carlo Curino commented on YARN-3481:


Creating a new ResourceUtilization class, I think would be particularly 
relevant if we start tracking more resources that YARN enforces. I.e., if YARN 
only enforces  and we care to monitor: disk queues, disk bandwidth for 
writs/reads, disk IOPS, network bandwidth, CPU IO-wait-time, etc.. etc.. than a 
new object is probably a good way to go. 

If the set of resources we will monitor and enforce is the same, I would vote 
for evolving Resource to express everything as doubles (also RAM). I stumble on 
limitations of Resources in the context of the "reservation" work, where I was 
tracking cpu-over-time and running out of range of Integer (e.g., counting 
memory over time for large reservations). This would allow us to simplify that 
code too (removing local classes used only to handle integral resources).

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-3481
> URL: https://issues.apache.org/jira/browse/YARN-3481
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Inigo Goiri
>Priority: Minor
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> To allow the RM take better scheduling decisions, it should be aware of the 
> actual utilization of the containers. The NM would aggregate the 
> ContainerMetrics and report it in every heartbeat.
> Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-16 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498760#comment-14498760
 ] 

Inigo Goiri commented on YARN-3481:
---

I started implementing this and I hit a philosophical issue: how to report the 
utilization?

Initially, I thought about using the Resource class (with the ResourceProto for 
transfer) but it has an issue with accuracy. If we have one container using 10% 
of a CPU, Resource only allows to specify 0 or 1 for the VCores.

There are multiple possible approaches to solve this issue:
*Modify Resource to define VCores as a double. The problem with this is that 
we'd need to change many interfaces.
*Modify Resource to store milliVCores internally and keep all the public 
interfaces with VCores.
*Create a new type called ResourceUtilization that would have a float instead 
of an int. We would use this new type to send utilization data. This new class 
would also be suitable to send other utilizations like disk queue length, etc.
*Keep using Resource as is but when working with utilization, put milliVCores 
there. In this case, we would have a weird semantics for Resource where 
sometimes we send milliVCores and other times we send VCores.
*Define 1 VCore as 0.001 CPUs in the cluster. The problem with this is that 
applications would have to change how many VCores they ask for.

Note that YARN-3122 is storing a metric called milliVCores for this.

I would like to see what people thinks is the best option. Ideas?

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-3481
> URL: https://issues.apache.org/jira/browse/YARN-3481
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Inigo Goiri
>Priority: Minor
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> To allow the RM take better scheduling decisions, it should be aware of the 
> actual utilization of the containers. The NM would aggregate the 
> ContainerMetrics and report it in every heartbeat.
> Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-13 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493172#comment-14493172
 ] 

Inigo Goiri commented on YARN-3481:
---

To make YARN aware of the actual use of resources in the machines and not only 
allocation, the RM should track the actual utilization. Right now, the 
ContainerMonitor collects the CPU and the memory utilization for each container 
(YARN-3122 and YARN-2984). We want to collect this information in the 
NodeStatusUpdater and report it to the RM in the NodeStatus. This would add a 
Resource (availableResources) to the heartbeat.

Thoughts?

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-3481
> URL: https://issues.apache.org/jira/browse/YARN-3481
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Inigo Goiri
>Priority: Minor
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> To allow the RM take better scheduling decisions, it should be aware of the 
> actual utilization of the containers. The NM would aggregate the 
> ContainerMetrics and report it in every heartbeat.
> Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)