[jira] [Commented] (YUNIKORN-14) Add rest API to retrieve app/container history info

2020-04-03 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074331#comment-17074331
 ] 

Adam Antal commented on YUNIKORN-14:


Thanks for the reviews!

> Add rest API to retrieve app/container history info
> ---
>
> Key: YUNIKORN-14
> URL: https://issues.apache.org/jira/browse/YUNIKORN-14
> Project: Apache YuniKorn
>  Issue Type: New Feature
>  Components: core - scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Adam Antal
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.8
>
> Attachments: Yunikorn_UI.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As part of the web UI we can show application and container history.
> The current pages are mocked up and do not show the real history. Before the 
> changes can be made on the web UI side we need to provide the history via a 
> REST interface so it can be consumed by the UI.
> All web service code is located in package 
> [https://github.com/apache/incubator-yunikorn-core/tree/master/pkg/webservice].
>  When running the scheduler locally (from K8shim using "make run"), the REST 
> APIs can be accessed via
>  * [http://localhost:9080/ws/v1/apps]
>  * [http://localhost:9080/ws/v1/queues]
>  * [http://localhost:9080/ws/v1/nodes]
> We need to add another endpoint to provide data to yunikorn-web to render the 
> app/container history page. Please check with [~akhilpb] for the desired data 
> format, etc. That issue is tracked via YUNIKORN-8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-14) Add rest API to retrieve app/container history info

2020-03-27 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068549#comment-17068549
 ] 

Adam Antal commented on YUNIKORN-14:


Discussed a discussion with [~wilfreds] about this issue offline.

The proposed solution is the following:
- Keep the current {{HistoricalClusterInfo}} object to store the data.
- Keep the {{HistoricalPartitionInfoUpdater}} object, but modify its 
implementation to not pull the metrics directly from the scheduler, but from 
the existing Prometheus metrics endpoints. Thus we will have the same data in 
the web UI as a user would see it from Grafana using the Prometheus endpoints.
- The settings can be kept the same, but the classes might be moved from the 
cache package to some other package.

> Add rest API to retrieve app/container history info
> ---
>
> Key: YUNIKORN-14
> URL: https://issues.apache.org/jira/browse/YUNIKORN-14
> Project: Apache YuniKorn
>  Issue Type: New Feature
>  Components: core - scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Adam Antal
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: Yunikorn_UI.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As part of the web UI we can show application and container history.
> The current pages are mocked up and do not show the real history. Before the 
> changes can be made on the web UI side we need to provide the history via a 
> REST interface so it can be consumed by the UI.
> All web service code is located in package 
> [https://github.com/apache/incubator-yunikorn-core/tree/master/pkg/webservice].
>  When running the scheduler locally (from K8shim using "make run"), the REST 
> APIs can be accessed via
>  * [http://localhost:9080/ws/v1/apps]
>  * [http://localhost:9080/ws/v1/queues]
>  * [http://localhost:9080/ws/v1/nodes]
> We need to add another endpoint to provide data to yunikorn-web to render the 
> app/container history page. Please check with [~akhilpb] for the desired data 
> format, etc. That issue is tracked via YUNIKORN-8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-14) Add rest API to retrieve app/container history info

2020-03-25 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067390#comment-17067390
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-14:
---

Providing two sets of metrics that are not in sync or can show highly different 
numbers is a bad idea.
We'll get questions and new jiras raised about the fact that the provided web 
UI is out of sync with metrics collected.

We should either leverage the metrics implementation or not provide the web UI 
metrics. Doing two things is asking for problems.

> Add rest API to retrieve app/container history info
> ---
>
> Key: YUNIKORN-14
> URL: https://issues.apache.org/jira/browse/YUNIKORN-14
> Project: Apache YuniKorn
>  Issue Type: New Feature
>  Components: core - scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Adam Antal
>Priority: Major
>  Labels: pull-request-available
> Attachments: Yunikorn_UI.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As part of the web UI we can show application and container history.
> The current pages are mocked up and do not show the real history. Before the 
> changes can be made on the web UI side we need to provide the history via a 
> REST interface so it can be consumed by the UI.
> All web service code is located in package 
> [https://github.com/apache/incubator-yunikorn-core/tree/master/pkg/webservice].
>  When running the scheduler locally (from K8shim using "make run"), the REST 
> APIs can be accessed via
>  * [http://localhost:9080/ws/v1/apps]
>  * [http://localhost:9080/ws/v1/queues]
>  * [http://localhost:9080/ws/v1/nodes]
> We need to add another endpoint to provide data to yunikorn-web to render the 
> app/container history page. Please check with [~akhilpb] for the desired data 
> format, etc. That issue is tracked via YUNIKORN-8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-14) Add rest API to retrieve app/container history info

2020-03-25 Thread Weiwei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067350#comment-17067350
 ] 

Weiwei Yang commented on YUNIKORN-14:
-

Hi [~wilfreds]

Thanks for the comments.

The history info here just provides very basic info about the cluster, e.g # of 
containers/apps in the last 12h. I think we can leverage this simple solution 
to give a basic impression for users. For comprehensive metrics, we have 
Prometheus integration so we can push that to its store for persistent. Here, 
we just need a small time-bound cache just like [~adam.antal] has implemented.

It is a pull mode, but that's fine. We are doing the pull once per minute (or 
maybe 30s), since the data is cached, no matter how many requests from web UI, 
it will not lock partition and damage scheduler performance. For the moment 
where write happens, it simply gets the data from partition without any 
calculation,  the impact is trivial.

The push mode is the Prometheus metrics, which we already have so I don't think 
we need to build anything similar.

 

 

 

> Add rest API to retrieve app/container history info
> ---
>
> Key: YUNIKORN-14
> URL: https://issues.apache.org/jira/browse/YUNIKORN-14
> Project: Apache YuniKorn
>  Issue Type: New Feature
>  Components: core - scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Adam Antal
>Priority: Major
>  Labels: pull-request-available
> Attachments: Yunikorn_UI.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As part of the web UI we can show application and container history.
> The current pages are mocked up and do not show the real history. Before the 
> changes can be made on the web UI side we need to provide the history via a 
> REST interface so it can be consumed by the UI.
> All web service code is located in package 
> [https://github.com/apache/incubator-yunikorn-core/tree/master/pkg/webservice].
>  When running the scheduler locally (from K8shim using "make run"), the REST 
> APIs can be accessed via
>  * [http://localhost:9080/ws/v1/apps]
>  * [http://localhost:9080/ws/v1/queues]
>  * [http://localhost:9080/ws/v1/nodes]
> We need to add another endpoint to provide data to yunikorn-web to render the 
> app/container history page. Please check with [~akhilpb] for the desired data 
> format, etc. That issue is tracked via YUNIKORN-8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-14) Add rest API to retrieve app/container history info

2020-03-25 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067342#comment-17067342
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-14:
---

I looked at the PR and want to propose a different approach as I see a number 
of issues.
I have mentioned tracking applications details in the text but I am not sure if 
that is needed in the first instance. It would still fit in the design if we 
want to add that in the second step.

History should be part of {{common}} or the {{scheduler}} not the {{cache}} I 
think. I would expect that we have multiple generic collectors that can collect 
history data. One generic collector is started per partition like the 
{{PartitionManager}} in its own go routine. History and all tracking is always 
per partition and will not go over that level at any point.

The current implementation uses a pull mechanism to collect the data from the 
partition. That requires locking the partition on retrieval (locks are missing 
currently in the solution) and could thus impact scheduling performance if the 
web interface gets lots of requests. We should not need to impact the partition 
to retrieve the history. The data should be kept in the collector and retrieved 
from there.

A change going deeper: why is the history just getting top level partition 
data? Getting info out for queues or nodes is as important going forward. I 
also see an omission here: we lose history data as soon as we remove the 
partition. It will thus not show us real history for a time period just the 
history for the current state going back a fixed time. That would become even 
more important when we look at queues, nodes or applications. If we go forward 
we need to be able to track and maintain the history data for a period of time 
independent of the removal of the partition/node/queue/application.

Tracking history should not be limited by the number of entries but by time 
range that we need to keep (24 hours as an example). Having a history per 
minute is what we need at least. Maybe we even need to go to a 30 or 15 second 
split. Longer periods means we could too easily miss short running containers 
or applications. The other solution would be to use a push from the different 
tracked objects into a channel that is read by the history collector. That 
would mean we do not miss info but the implementation becomes a bit trickier. 
We can still sum up to give stats per time range but that would then become 
easier to manage for small intervals. That would also not be "on demand" but 
based on an internal timing of the history collector.
All changes for things we need to track run through the partition info already 
so we would just need to instrument one object to keep track of all these 
things.

Thoughts?

> Add rest API to retrieve app/container history info
> ---
>
> Key: YUNIKORN-14
> URL: https://issues.apache.org/jira/browse/YUNIKORN-14
> Project: Apache YuniKorn
>  Issue Type: New Feature
>  Components: core - scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Adam Antal
>Priority: Major
>  Labels: pull-request-available
> Attachments: Yunikorn_UI.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As part of the web UI we can show application and container history.
> The current pages are mocked up and do not show the real history. Before the 
> changes can be made on the web UI side we need to provide the history via a 
> REST interface so it can be consumed by the UI.
> All web service code is located in package 
> [https://github.com/apache/incubator-yunikorn-core/tree/master/pkg/webservice].
>  When running the scheduler locally (from K8shim using "make run"), the REST 
> APIs can be accessed via
>  * [http://localhost:9080/ws/v1/apps]
>  * [http://localhost:9080/ws/v1/queues]
>  * [http://localhost:9080/ws/v1/nodes]
> We need to add another endpoint to provide data to yunikorn-web to render the 
> app/container history page. Please check with [~akhilpb] for the desired data 
> format, etc. That issue is tracked via YUNIKORN-8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org