[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-23 Thread zentol
Github user zentol closed the pull request at:

https://github.com/apache/flink/pull/1947


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-22 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-220850557
  
I did a pass over the code and committed the result:

Manually merged in 003ce18efc0249fae874e56c3df6acf19f5f2429 and 
707606ac40dbbbd497fcbbb5442870fec5468bf3


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-19 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-220319335
  
Let me grab the token for this one. There are a few things still, like 
resource leaks, in this code.
I'll pass you back the token as soon as I am done...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-18 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-220112639
  
renaming the module is a good idea


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-18 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-220099057
  
How about renaming the maven module from `flink-metrics` to 
`flink-metric-reporters`? The metrics systems itself is part of `flink-core` 
after all...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-18 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-220087815
  
I've had an offline chat with @StephanEwen and as a result updated the PR.

List of changes:
* dropped Meters, Histograms and Timers
 * we weren't convinced that the performance of the implementation used is 
efficient enough for our use-cases
 * removed Reservoir/Snapshot and various wrapper classes
* moved ScheduledDropWizardReporter into a new flink-metrics-dropwizard 
module within flink-metrics
 * removes Dropwizard usage entirely from flink-core
* the MetricRegistry no longer maintains maps of all metrics; reporters are 
from now on responsible for doing this
 * Listener interface was removed
 * AbstractReporter class was added that implements this behaviour
 * will make it easier to support multiple reporters in the future
* Counter and Gauge no longer implement the DropWizard interface
 * added Counter-/GaugeWrapper classes for DropWizard reporters


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-18 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-220015012
  
I would like to take an iteration on this, make some changes on top of 
this, and open a new pull request afterwards.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-18 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-220014866
  
Few followups are actually needed before merging this:

  1. We need to remove the example metrics

  2. Conflicting metrics names should not result in failures. Metrics are 
tooling, and problems in the tooling should not fail the core programs.

  3. I think we should limit the available metrics types to Gauge and 
Counter for now. I looked at Timers, Meters, and Histograms - they are very 
high overhead each. As a follow-up, I would like to see if we can construct 
simple Meters as views over counters. That way, the runtime code as no overhead 
for the metering (it just maintains counters and gauges) and the registry code 
needs to turn them into Meters asynchronously.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-18 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-219964437
  
Hmm, seeing some local test failures after the re-base. Have to look into 
this.
Typical message is `java.lang.IllegalArgumentException: This group already 
contains a metric named KeyCount`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-18 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-219959102
  
Tests were good!
Rebased again, re-running tests, will merge after that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-17 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-219864781
  
I rebased the branch, waiting for Travis to give a green light...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-17 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-219849631
  
The code in this PR is the most up-to-date.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-17 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-219836974
  
This looks pretty good now.
I would like to get this in soon, now that the test are passing.
Let's iterate over it on the master.

This needs a rebase to master, though,

Also saw that you have a "metrics_v4" branch now. Is that one newer than 
the pull request?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-13 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-218983234
  
The tests should now pass.

I've fixed the following issues:
* several examples had non-transient Metric fields
* removed one failing example (CollectionInputFormat; accessed Context in 
constructor)
* IOMetrics are not granular, effectively gathering the same metrics we 
gather currently
* Resolved several test issues
 * General:
   * Introduced several DummyMetricGroups for use in tests
* mocked Context/Environments now return DummyMetricGroups
 * Core:
   * RuntimeUDFContext constructor modified to take a MetricGroup argument
* Integrated metrics into the CollectionExecutor
 * Table API: asterisks (introduced by select *) were not properly removed 
by the JMXReporter
 * Streaming: StreamingConfig ChainIndex default is now 0 instead of -1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-10 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-218153465
  
I'm currently aware of the following issues:
* quite a few tests in flink-runtime fail; this is mostly due to a missing 
integration into the MockEnvironments. Fix coming up
* there is an issue in the IO metrics for operations with multiple outputs. 
This one is a bit more tricky, but a not-so-pretty solution is on the way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-10 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-218146940
  
Looks nice, I am trying it out now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-09 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-217972596
  
I have updates this PR; it now contains a new feature: configurable system 
scope. It essentially allows users to decide how Flink entities(Taskmanager, 
jobs, tasks and operators) are represented in metric names.

Users can configure a different format strings for every entity type in the 
flink-conf.yaml; which format is applied to a metric depends to which entity it 
is bound to.

For example, users can do the following:
* omit taskmanager information for jobs/tasks/operators
* re-order properties, like having the job properties ahead of the 
taskmanager
* decided whether they want to use names, ID's or a mix of both!

I've updated the main post to include more information.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-07 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-217636194
  
@ankitcha you could write your own reporter that simply wraps the reporters 
you want to use, simply forwarding the calls to all of them. this should 
actually be really easy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-06 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-217389510
  
Okay. I think we need to support multiple instances of the same job on a 
TaskManager.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-06 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-217368732
  
@rmetzger you are correct that you're job failed since the previous one 
wasn't cleaned up yet. Should you try to run 2 identical jobs in parallel it 
will fail, since 2 jobs would use the same metrics due to name clashes. Note 
that in this version this also occurs when 2 operators have the same name. I 
have some additional functionality coming up that would allow you to circumvent 
this issue.

@ankitcha The problem with multiple reporters is our configuration, it only 
supports single-line key-value pairs, and you need to know the exact key to 
access it. In order to configure multiple reporters you would either need a 
nested structure (which is not supportet), or index the configuration keys 
(metrics.reporter.1.class) and add a new parameter containing the indices to 
use (e.g. metrics.reporter: 0, 1), which isn't particularly user-friendly. The 
metric system itself could deal with multiple reporters with minor 
modifications; it's all about the configuration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-05 Thread ankitcha
Github user ankitcha commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-217240628
  
Guys, Thanks you for this awesome addition to Flink. 

I was just wondering if there is a way to configure multiple reporters as 
well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-05 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-217214295
  
I'm just assuming its the missing "close()" call's I've commented already 
causing this issue.

I just tried adding some custom metrics:
```java
new RichFlatMapFunction() {
public Counter output;
public Counter el;
public String lastOut;

@Override
public void open(Configuration 
parameters) throws Exception {
super.open(parameters);
MetricGroup mg = 
getRuntimeContext().getMetricGroup();

this.el = 
mg.counter("elements");

MetricGroup detailedGroup = 
mg.addGroup("detailed");
this.output = 
detailedGroup.counter("output");
detailedGroup.gauge("lastOut", 
new Gauge() {
@Override
public String 
getValue() {
return lastOut;
}
});
}

@Override
public void flatMap(String value, 
Collector out) {
el.inc();
for (String word : 
value.split("\\s")) {
lastOut  = word;
out.collect(new 
WordWithCount(word, 1L));
output.inc();
}
}
}
```

and it works amazingly well


![image](https://cloud.githubusercontent.com/assets/89049/15050706/9ecc6fda-12f5-11e6-8c7d-cc7cb3657ecd.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-05 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-217213101
  
Is it possible that submitting the same job two times doesn't work?

```
./bin/flink run ./examples/streaming/SocketWindowWordCount.jar --port 54323

05/05/2016 19:09:08 Job execution switched to status RUNNING.
05/05/2016 19:09:08 Source: Socket Stream -> Flat Map(1/1) switched to 
SCHEDULED 
05/05/2016 19:09:08 Source: Socket Stream -> Flat Map(1/1) switched to 
DEPLOYING 
05/05/2016 19:09:08 Fast SlidingProcessingTimeWindows(5000, 1000) of 
WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) 
switched to SCHEDULED 
05/05/2016 19:09:08 Fast SlidingProcessingTimeWindows(5000, 1000) of 
WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) 
switched to DEPLOYING 
05/05/2016 19:09:08 Source: Socket Stream -> Flat Map(1/1) switched to 
RUNNING 
05/05/2016 19:09:08 Source: Socket Stream -> Flat Map(1/1) switched to 
FAILED 
java.lang.IllegalArgumentException: This group ([key0, localhost, Actor, 
TaskManager, TaskManager, 964566cfcf032710aff86614010fce21, Category, Tasks, 
Job, "Socket Window WordCount", Operator, Flat Map, SubTask, 0, ChannelType, 
OutputChannel, Index, 0]) already contains a metric named numBytesOut
at org.apache.flink.metrics.MetricGroup.addMetric(MetricGroup.java:246)
at org.apache.flink.metrics.MetricGroup.counter(MetricGroup.java:123)
at 
org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.setMetrics(SpanningRecordSerializer.java:206)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.setMetrics(RecordWriter.java:216)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.createStreamOutput(OperatorChain.java:294)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:94)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:188)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
at java.lang.Thread.run(Thread.java:745)

05/05/2016 19:09:08 Fast SlidingProcessingTimeWindows(5000, 1000) of 
WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) 
switched to RUNNING 
05/05/2016 19:09:08 Job execution switched to status FAILING.

```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---