[jira] [Commented] (FLINK-17328) Expose network metric for job vertex in rest api

2020-12-21 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252668#comment-17252668
 ] 

Piotr Nowojski commented on FLINK-17328:


I'm currently taking over and investigating the parent issue how could it be 
implemented. After the investigation I would either use the existing tickets 
and assign them to myself or modify them/create new ones.

> Expose network metric for job vertex in rest api
> 
>
> Key: FLINK-17328
> URL: https://issues.apache.org/jira/browse/FLINK-17328
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / REST
>Reporter: lining
>Priority: Major
>  Labels: pull-request-available
>
> JobDetailsHandler
>  * pool usage: outPoolUsageAvg, inputExclusiveBuffersUsageAvg, 
> inputFloatingBuffersUsageAvg
>  * back-pressured for show whether it is back pressured(merge all iths 
> subtasks)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17328) Expose network metric for job vertex in rest api

2020-12-18 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251665#comment-17251665
 ] 

Till Rohrmann commented on FLINK-17328:
---

[~pnowojski] is this still relevant? What is the state of this ticket?

> Expose network metric for job vertex in rest api
> 
>
> Key: FLINK-17328
> URL: https://issues.apache.org/jira/browse/FLINK-17328
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / REST
>Reporter: lining
>Priority: Major
>  Labels: pull-request-available
>
> JobDetailsHandler
>  * pool usage: outPoolUsageAvg, inputExclusiveBuffersUsageAvg, 
> inputFloatingBuffersUsageAvg
>  * back-pressured for show whether it is back pressured(merge all iths 
> subtasks)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17328) Expose network metric for job vertex in rest api

2020-09-14 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195554#comment-17195554
 ] 

Piotr Nowojski commented on FLINK-17328:


Sorry [~chesnay], I think I've misunderstood you in that case. 

Regarding the pool usages, they are helpful for two things:
# input pool usage is useful to find a task which is back-pressured but is not 
back-pressuring upstream tasks. That's a temporary situation, but can happen. 
Especially if task is emitting some large chunk of buffered data, like 
{{WindowOperator}} after firing a timer.
# combination of average pool usage with "is back pressured" state, can be used 
to distinguish between a case when a couple of channels are back-pressured 
(data skew) or all of the channels are. It's not as important as the "is 
back-pressured" fact, but still useful and hard to digest without the job graph.

> Expose network metric for job vertex in rest api
> 
>
> Key: FLINK-17328
> URL: https://issues.apache.org/jira/browse/FLINK-17328
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / REST
>Reporter: lining
>Assignee: lining
>Priority: Major
>  Labels: pull-request-available
>
> JobDetailsHandler
>  * pool usage: outPoolUsageAvg, inputExclusiveBuffersUsageAvg, 
> inputFloatingBuffersUsageAvg
>  * back-pressured for show whether it is back pressured(merge all iths 
> subtasks)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17328) Expose network metric for job vertex in rest api

2020-09-14 Thread Chesnay Schepler (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195478#comment-17195478
 ] 

Chesnay Schepler commented on FLINK-17328:
--

I never said displaying it in the UI is not more convenient than doing the 
matching by hand.

What I'm questioning is why we expose the pool usages through the REST API when 
all you really need is "backpressure between subtask A of Task 1 and subtask B 
of Task 2" or "this edge has 20% more data then other edges".

> Expose network metric for job vertex in rest api
> 
>
> Key: FLINK-17328
> URL: https://issues.apache.org/jira/browse/FLINK-17328
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / REST
>Reporter: lining
>Assignee: lining
>Priority: Major
>  Labels: pull-request-available
>
> JobDetailsHandler
>  * pool usage: outPoolUsageAvg, inputExclusiveBuffersUsageAvg, 
> inputFloatingBuffersUsageAvg
>  * back-pressured for show whether it is back pressured(merge all iths 
> subtasks)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17328) Expose network metric for job vertex in rest api

2020-09-14 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195466#comment-17195466
 ] 

Piotr Nowojski commented on FLINK-17328:


What I meant is difficult, is that if you have ~100 of tasks (with hundreds of 
parallel subtasks each), it's really difficult to understand what's happening 
with the Job, without visualising the data in a shape of the job graph. Have 
you tried doing it [~chesnay]? :) With textual form, you are forced to look the 
tasks (or subtasks for data skew) one by one. Grafana or other metrics 
visualisers are not helping with that much.

Now compare this to looking at a graph with green, yellow or red dots and with 
some other similar marker for average state of the buffer pools. One quick 
glance and it becomes immediately obvious:
* what is backpressured and what's not
* if there is some data skew involved and on which edges

More over, just for the sake of sanity of people using Flink or answering to 
users's problems, it's really good to have some basic functionality built into 
the system, that allows to understand what's happening.

> Expose network metric for job vertex in rest api
> 
>
> Key: FLINK-17328
> URL: https://issues.apache.org/jira/browse/FLINK-17328
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / REST
>Reporter: lining
>Assignee: lining
>Priority: Major
>  Labels: pull-request-available
>
> JobDetailsHandler
>  * pool usage: outPoolUsageAvg, inputExclusiveBuffersUsageAvg, 
> inputFloatingBuffersUsageAvg
>  * back-pressured for show whether it is back pressured(merge all iths 
> subtasks)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17328) Expose network metric for job vertex in rest api

2020-09-14 Thread Chesnay Schepler (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195454#comment-17195454
 ] 

Chesnay Schepler commented on FLINK-17328:
--

If the matching of such metrics to the JobGraph is painful, then I don't see 
how exposing these in the UI solves anything. I would think that a better goal 
would be to have the REST API provide a more high-level take on where 
back-pressure is, instead of exposing a bunch of low-level metrics and doing 
matching in the UI.

> Expose network metric for job vertex in rest api
> 
>
> Key: FLINK-17328
> URL: https://issues.apache.org/jira/browse/FLINK-17328
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / REST
>Reporter: lining
>Assignee: lining
>Priority: Major
>  Labels: pull-request-available
>
> JobDetailsHandler
>  * pool usage: outPoolUsageAvg, inputExclusiveBuffersUsageAvg, 
> inputFloatingBuffersUsageAvg
>  * back-pressured for show whether it is back pressured(merge all iths 
> subtasks)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17328) Expose network metric for job vertex in rest api

2020-09-14 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195451#comment-17195451
 ] 

Piotr Nowojski commented on FLINK-17328:


I would tend to agree with [~lining]. With monitoring the backpressure or data 
skew (for which state of the buffer pools can be used), it's important to know 
the topology of the job. Despite most of those informations being currently 
available in one way or another via metrics, correlating information of the 
subtasks/tasks buffers usage with the job graph is very painful and manual 
process, while UI can present it easily in a very readable form.

> Expose network metric for job vertex in rest api
> 
>
> Key: FLINK-17328
> URL: https://issues.apache.org/jira/browse/FLINK-17328
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / REST
>Reporter: lining
>Assignee: lining
>Priority: Major
>  Labels: pull-request-available
>
> JobDetailsHandler
>  * pool usage: outPoolUsageAvg, inputExclusiveBuffersUsageAvg, 
> inputFloatingBuffersUsageAvg
>  * back-pressured for show whether it is back pressured(merge all iths 
> subtasks)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17328) Expose network metric for job vertex in rest api

2020-09-14 Thread lining (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195262#comment-17195262
 ] 

lining commented on FLINK-17328:


WebUI has monitor backpressure. But users need to know current and upstream's 
network metric to judge current whether is the source of backpressure. Now 
users have to record relevant information. It is just improved for the old 
function.

> Expose network metric for job vertex in rest api
> 
>
> Key: FLINK-17328
> URL: https://issues.apache.org/jira/browse/FLINK-17328
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / REST
>Reporter: lining
>Assignee: lining
>Priority: Major
>  Labels: pull-request-available
>
> JobDetailsHandler
>  * pool usage: outPoolUsageAvg, inputExclusiveBuffersUsageAvg, 
> inputFloatingBuffersUsageAvg
>  * back-pressured for show whether it is back pressured(merge all iths 
> subtasks)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17328) Expose network metric for job vertex in rest api

2020-07-03 Thread Chesnay Schepler (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150858#comment-17150858
 ] 

Chesnay Schepler commented on FLINK-17328:
--

I'm not convinced this is necessary. Not only are these metrics fairly 
low-level, but there are already metric REST endpoints for aggregating metrics 
across subtasks.

 

As usual, the WebUI is to serve basic functionality, not replace an entire 
monitoring stack.

> Expose network metric for job vertex in rest api
> 
>
> Key: FLINK-17328
> URL: https://issues.apache.org/jira/browse/FLINK-17328
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / REST
>Reporter: lining
>Assignee: lining
>Priority: Major
>  Labels: pull-request-available
>
> JobDetailsHandler
>  * pool usage: outPoolUsageAvg, inputExclusiveBuffersUsageAvg, 
> inputFloatingBuffersUsageAvg
>  * back-pressured for show whether it is back pressured(merge all iths 
> subtasks)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17328) Expose network metric for job vertex in rest api

2020-04-24 Thread Gary Yao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091368#comment-17091368
 ] 

Gary Yao commented on FLINK-17328:
--

I assigned you but I cannot promise a timely review at the moment.

> Expose network metric for job vertex in rest api
> 
>
> Key: FLINK-17328
> URL: https://issues.apache.org/jira/browse/FLINK-17328
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / REST
>Reporter: lining
>Assignee: lining
>Priority: Major
>
> JobVertexDetailsHandler
>  * pool usage: outPoolUsageAvg, inputExclusiveBuffersUsageAvg, 
> inputFloatingBuffersUsageAvg
>  * back-pressured for show whether it is back pressured(merge all iths 
> subtasks)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17328) Expose network metric for job vertex in rest api

2020-04-22 Thread lining (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090203#comment-17090203
 ] 

lining commented on FLINK-17328:


[~gary] could you assign it to me?

> Expose network metric for job vertex in rest api
> 
>
> Key: FLINK-17328
> URL: https://issues.apache.org/jira/browse/FLINK-17328
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / REST
>Reporter: lining
>Priority: Major
>
> JobVertexDetailsHandler
>  * pool usage: outPoolUsageAvg, inputExclusiveBuffersUsageAvg, 
> inputFloatingBuffersUsageAvg
>  * back-pressured for show whether it is back pressured(merge all iths 
> subtasks)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)