[GitHub] flink pull request #4801: [FLINK-7812] Log system resources metrics
Github user pnowojski commented on a diff in the pull request: https://github.com/apache/flink/pull/4801#discussion_r156010369 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunnerITCase.java --- @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.taskexecutor; + +import org.apache.flink.configuration.Configuration; +import org.apache.flink.metrics.Gauge; +import org.apache.flink.metrics.MetricConfig; +import org.apache.flink.metrics.reporter.AbstractReporter; +import org.apache.flink.runtime.minicluster.MiniCluster; +import org.apache.flink.runtime.minicluster.MiniClusterConfiguration; + +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.util.Collection; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.ConcurrentHashMap; + +import static org.apache.flink.configuration.MetricOptions.REPORTERS_LIST; +import static org.apache.flink.configuration.TaskManagerOptions.ADDITIONAL_LOGGING; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.fail; + +/** + * Integration tests for proper initialization in the {@link TaskManagerRunner}. + */ +public class TaskManagerRunnerITCase { --- End diff -- Renamed to `SystemResourcesMetricsITCase`. I have added this test after some larger rebasing conflicts, because otherwise I wasn't sure that everything is starting up correctly. I think that there would have to be a handful unit tests replacing this one `ITCase` and they would be more prone to fail/cause problems during refactoring. ---
[GitHub] flink pull request #4801: [FLINK-7812] Log system resources metrics
Github user pnowojski commented on a diff in the pull request: https://github.com/apache/flink/pull/4801#discussion_r156008613 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/utils/SystemResourcesCounter.java --- @@ -0,0 +1,236 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.taskexecutor.utils; + +import org.apache.flink.api.common.time.Time; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import oshi.SystemInfo; +import oshi.hardware.CentralProcessor; +import oshi.hardware.CentralProcessor.TickType; +import oshi.hardware.HardwareAbstractionLayer; +import oshi.hardware.NetworkIF; + +import javax.annotation.concurrent.ThreadSafe; + +import java.util.concurrent.atomic.AtomicLongArray; +import java.util.concurrent.atomic.AtomicReferenceArray; + +import static org.apache.flink.util.Preconditions.checkState; + +/** + * Daemon thread logging system resources. + * + * To accurately and consistently report CPU and network usage we have to periodically probe + * CPU ticks and network sent/received bytes and then convert those values to CPU usage and + * send/receive byte rates. + */ +@ThreadSafe --- End diff -- Hmmm :) This annotation is about public getters, but maybe indeed it doesn't make a lot of sense. On the other hand, is it harmful? ---
[GitHub] flink pull request #4801: [FLINK-7812] Log system resources metrics
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/4801#discussion_r155503740 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/util/MetricUtils.java --- @@ -102,37 +103,16 @@ public static void instantiateStatusMetrics( private static void instantiateNetworkMetrics( MetricGroup metrics, final NetworkEnvironment network) { - metrics.gauge("TotalMemorySegments", new Gauge () { - @Override - public Long getValue() { - return (long) network.getNetworkBufferPool().getTotalNumberOfMemorySegments(); - } - }); - metrics. gauge("AvailableMemorySegments", new Gauge () { - @Override - public Long getValue() { - return (long) network.getNetworkBufferPool().getNumberOfAvailableMemorySegments(); - } - }); + final NetworkBufferPool networkBufferPool = network.getNetworkBufferPool(); + metrics. gauge("TotalMemorySegments", () -> (long) networkBufferPool.getTotalNumberOfMemorySegments()); --- End diff -- Replace with "Integer" Gauge and change to method reference? ---
[GitHub] flink pull request #4801: [FLINK-7812] Log system resources metrics
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/4801#discussion_r155503385 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/utils/SystemResourcesCounter.java --- @@ -0,0 +1,236 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.taskexecutor.utils; + +import org.apache.flink.api.common.time.Time; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import oshi.SystemInfo; +import oshi.hardware.CentralProcessor; +import oshi.hardware.CentralProcessor.TickType; +import oshi.hardware.HardwareAbstractionLayer; +import oshi.hardware.NetworkIF; + +import javax.annotation.concurrent.ThreadSafe; + +import java.util.concurrent.atomic.AtomicLongArray; +import java.util.concurrent.atomic.AtomicReferenceArray; + +import static org.apache.flink.util.Preconditions.checkState; + +/** + * Daemon thread logging system resources. + * + * To accurately and consistently report CPU and network usage we have to periodically probe + * CPU ticks and network sent/received bytes and then convert those values to CPU usage and + * send/receive byte rates. + */ +@ThreadSafe --- End diff -- A Thread is ThreadSafe? ;-) ---
[GitHub] flink pull request #4801: [FLINK-7812] Log system resources metrics
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/4801#discussion_r155501129 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunnerITCase.java --- @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.taskexecutor; + +import org.apache.flink.configuration.Configuration; +import org.apache.flink.metrics.Gauge; +import org.apache.flink.metrics.MetricConfig; +import org.apache.flink.metrics.reporter.AbstractReporter; +import org.apache.flink.runtime.minicluster.MiniCluster; +import org.apache.flink.runtime.minicluster.MiniClusterConfiguration; + +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.util.Collection; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.ConcurrentHashMap; + +import static org.apache.flink.configuration.MetricOptions.REPORTERS_LIST; +import static org.apache.flink.configuration.TaskManagerOptions.ADDITIONAL_LOGGING; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.fail; + +/** + * Integration tests for proper initialization in the {@link TaskManagerRunner}. + */ +public class TaskManagerRunnerITCase { --- End diff -- This test seems very specific to this logging, but is named as a generic TaskManagerRunner test. Give it a differnet name? Separate question: Does it have to be an IT case that fully starts the TM, or can it be a unit test that checks config propagation? ---
[GitHub] flink pull request #4801: [FLINK-7812] Log system resources metrics
GitHub user pnowojski opened a pull request: https://github.com/apache/flink/pull/4801 [FLINK-7812] Log system resources metrics ## What is the purpose of the change This PR adds various system resources metrics, useful for analysing issues on machines/clusters for which there are no detailed external resources logging systems. ## Verifying this change This change was manually tested, since it's difficult to write automated tests for this feature. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (**yes** / no) It adds `ohci` dependency. ## Documentation - Does this pull request introduce a new feature? (**yes** / no) - If yes, how is the feature documented? (not applicable / **docs** / **JavaDocs** / not documented) You can merge this pull request into a Git repository by running: $ git pull https://github.com/pnowojski/flink resources Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/4801.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4801 commit d337e322477486daad19fad37747e6a8898bf619 Author: Piotr NowojskiDate: 2017-10-10T15:24:21Z [hotfix][metrics] Replace anonymous classes with lamdas commit 37eaf16bb315be35559d8ac7f5a98c38cfc1e9a4 Author: Piotr Nowojski Date: 2017-10-11T10:06:01Z [hotfix][metrics] Remove redundant TaskManager metrics initialization commit 1b2420746a613a2af73486197d4fa3aeb63f7236 Author: Piotr Nowojski Date: 2017-10-11T07:10:49Z [FLINK-7812][metrics] Add system resources metrics ---