[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user zentol closed the pull request at: https://github.com/apache/flink/pull/1947 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-220850557 I did a pass over the code and committed the result: Manually merged in 003ce18efc0249fae874e56c3df6acf19f5f2429 and 707606ac40dbbbd497fcbbb5442870fec5468bf3 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-220319335 Let me grab the token for this one. There are a few things still, like resource leaks, in this code. I'll pass you back the token as soon as I am done... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-220112639 renaming the module is a good idea --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-220099057 How about renaming the maven module from `flink-metrics` to `flink-metric-reporters`? The metrics systems itself is part of `flink-core` after all... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-220087815 I've had an offline chat with @StephanEwen and as a result updated the PR. List of changes: * dropped Meters, Histograms and Timers * we weren't convinced that the performance of the implementation used is efficient enough for our use-cases * removed Reservoir/Snapshot and various wrapper classes * moved ScheduledDropWizardReporter into a new flink-metrics-dropwizard module within flink-metrics * removes Dropwizard usage entirely from flink-core * the MetricRegistry no longer maintains maps of all metrics; reporters are from now on responsible for doing this * Listener interface was removed * AbstractReporter class was added that implements this behaviour * will make it easier to support multiple reporters in the future * Counter and Gauge no longer implement the DropWizard interface * added Counter-/GaugeWrapper classes for DropWizard reporters --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-220015012 I would like to take an iteration on this, make some changes on top of this, and open a new pull request afterwards. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-220014866 Few followups are actually needed before merging this: 1. We need to remove the example metrics 2. Conflicting metrics names should not result in failures. Metrics are tooling, and problems in the tooling should not fail the core programs. 3. I think we should limit the available metrics types to Gauge and Counter for now. I looked at Timers, Meters, and Histograms - they are very high overhead each. As a follow-up, I would like to see if we can construct simple Meters as views over counters. That way, the runtime code as no overhead for the metering (it just maintains counters and gauges) and the registry code needs to turn them into Meters asynchronously. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-219964437 Hmm, seeing some local test failures after the re-base. Have to look into this. Typical message is `java.lang.IllegalArgumentException: This group already contains a metric named KeyCount` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-219959102 Tests were good! Rebased again, re-running tests, will merge after that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-219864781 I rebased the branch, waiting for Travis to give a green light... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-219849631 The code in this PR is the most up-to-date. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-219836974 This looks pretty good now. I would like to get this in soon, now that the test are passing. Let's iterate over it on the master. This needs a rebase to master, though, Also saw that you have a "metrics_v4" branch now. Is that one newer than the pull request? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-218983234 The tests should now pass. I've fixed the following issues: * several examples had non-transient Metric fields * removed one failing example (CollectionInputFormat; accessed Context in constructor) * IOMetrics are not granular, effectively gathering the same metrics we gather currently * Resolved several test issues * General: * Introduced several DummyMetricGroups for use in tests * mocked Context/Environments now return DummyMetricGroups * Core: * RuntimeUDFContext constructor modified to take a MetricGroup argument * Integrated metrics into the CollectionExecutor * Table API: asterisks (introduced by select *) were not properly removed by the JMXReporter * Streaming: StreamingConfig ChainIndex default is now 0 instead of -1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-218153465 I'm currently aware of the following issues: * quite a few tests in flink-runtime fail; this is mostly due to a missing integration into the MockEnvironments. Fix coming up * there is an issue in the IO metrics for operations with multiple outputs. This one is a bit more tricky, but a not-so-pretty solution is on the way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-218146940 Looks nice, I am trying it out now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-217972596 I have updates this PR; it now contains a new feature: configurable system scope. It essentially allows users to decide how Flink entities(Taskmanager, jobs, tasks and operators) are represented in metric names. Users can configure a different format strings for every entity type in the flink-conf.yaml; which format is applied to a metric depends to which entity it is bound to. For example, users can do the following: * omit taskmanager information for jobs/tasks/operators * re-order properties, like having the job properties ahead of the taskmanager * decided whether they want to use names, ID's or a mix of both! I've updated the main post to include more information. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-217636194 @ankitcha you could write your own reporter that simply wraps the reporters you want to use, simply forwarding the calls to all of them. this should actually be really easy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-217389510 Okay. I think we need to support multiple instances of the same job on a TaskManager. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-217368732 @rmetzger you are correct that you're job failed since the previous one wasn't cleaned up yet. Should you try to run 2 identical jobs in parallel it will fail, since 2 jobs would use the same metrics due to name clashes. Note that in this version this also occurs when 2 operators have the same name. I have some additional functionality coming up that would allow you to circumvent this issue. @ankitcha The problem with multiple reporters is our configuration, it only supports single-line key-value pairs, and you need to know the exact key to access it. In order to configure multiple reporters you would either need a nested structure (which is not supportet), or index the configuration keys (metrics.reporter.1.class) and add a new parameter containing the indices to use (e.g. metrics.reporter: 0, 1), which isn't particularly user-friendly. The metric system itself could deal with multiple reporters with minor modifications; it's all about the configuration. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user ankitcha commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-217240628 Guys, Thanks you for this awesome addition to Flink. I was just wondering if there is a way to configure multiple reporters as well? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-217214295 I'm just assuming its the missing "close()" call's I've commented already causing this issue. I just tried adding some custom metrics: ```java new RichFlatMapFunction() { public Counter output; public Counter el; public String lastOut; @Override public void open(Configuration parameters) throws Exception { super.open(parameters); MetricGroup mg = getRuntimeContext().getMetricGroup(); this.el = mg.counter("elements"); MetricGroup detailedGroup = mg.addGroup("detailed"); this.output = detailedGroup.counter("output"); detailedGroup.gauge("lastOut", new Gauge() { @Override public String getValue() { return lastOut; } }); } @Override public void flatMap(String value, Collector out) { el.inc(); for (String word : value.split("\\s")) { lastOut = word; out.collect(new WordWithCount(word, 1L)); output.inc(); } } } ``` and it works amazingly well ![image](https://cloud.githubusercontent.com/assets/89049/15050706/9ecc6fda-12f5-11e6-8c7d-cc7cb3657ecd.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/1947#issuecomment-217213101 Is it possible that submitting the same job two times doesn't work? ``` ./bin/flink run ./examples/streaming/SocketWindowWordCount.jar --port 54323 05/05/2016 19:09:08 Job execution switched to status RUNNING. 05/05/2016 19:09:08 Source: Socket Stream -> Flat Map(1/1) switched to SCHEDULED 05/05/2016 19:09:08 Source: Socket Stream -> Flat Map(1/1) switched to DEPLOYING 05/05/2016 19:09:08 Fast SlidingProcessingTimeWindows(5000, 1000) of WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) switched to SCHEDULED 05/05/2016 19:09:08 Fast SlidingProcessingTimeWindows(5000, 1000) of WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) switched to DEPLOYING 05/05/2016 19:09:08 Source: Socket Stream -> Flat Map(1/1) switched to RUNNING 05/05/2016 19:09:08 Source: Socket Stream -> Flat Map(1/1) switched to FAILED java.lang.IllegalArgumentException: This group ([key0, localhost, Actor, TaskManager, TaskManager, 964566cfcf032710aff86614010fce21, Category, Tasks, Job, "Socket Window WordCount", Operator, Flat Map, SubTask, 0, ChannelType, OutputChannel, Index, 0]) already contains a metric named numBytesOut at org.apache.flink.metrics.MetricGroup.addMetric(MetricGroup.java:246) at org.apache.flink.metrics.MetricGroup.counter(MetricGroup.java:123) at org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.setMetrics(SpanningRecordSerializer.java:206) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.setMetrics(RecordWriter.java:216) at org.apache.flink.streaming.runtime.tasks.OperatorChain.createStreamOutput(OperatorChain.java:294) at org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:94) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:188) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584) at java.lang.Thread.run(Thread.java:745) 05/05/2016 19:09:08 Fast SlidingProcessingTimeWindows(5000, 1000) of WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) switched to RUNNING 05/05/2016 19:09:08 Job execution switched to status FAILING. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---