[jira] [Comment Edited] (HDFS-11789) Maintain Short-Circuit Read Statistics
[ https://issues.apache.org/jira/browse/HDFS-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052519#comment-16052519 ] Arpit Agarwal edited comment on HDFS-11789 at 6/21/17 6:25 PM: --- Thanks for updating the patch [~hanishakoneru]. A few more comments: # In BlockReaderLocal constructor, metrics is being passed to BlockReaderIoProvider before it is potentially initialized. {code} this.blockReaderIoProvider = new BlockReaderIoProvider( builder.shortCircuitConf, metrics, timer); if (builder.shortCircuitConf.isScrMetricsEnabled()) { metricsInitializationLock.lock(); if (!isMetricsEnabled) { metrics = BlockReaderLocalMetrics.create(); isMetricsEnabled = true; } {code} # Typo - {{SHORT_CIRCUIT_READ_LATECNY_METRIC_REGISTERD_NAME}}. # Looks like we can eliminate the {{BlockReaderLocal#isMetricsEnabled}}, instead just check whether metrics is null. # Thanks for adding TestBlockReaderIoProvider. -One more related test - ensure {{addShortCircuitReadLatency}} is not invoked when the read takes less than the threshold.- # Also I think we can make BlockReaderIoProvider be a static utility class since it doesn't maintain any mutable state. But this approach is also fine since the allocated object is lightweight. TestBlockReaderLocalMetrics is a nicely written unit test! Edit: Sorry about the confusion, there is no threshold for addShortCircuitReadLatency. was (Author: arpitagarwal): Thanks for updating the patch [~hanishakoneru]. A few more comments: # In BlockReaderLocal constructor, metrics is being passed to BlockReaderIoProvider before it is potentially initialized. {code} this.blockReaderIoProvider = new BlockReaderIoProvider( builder.shortCircuitConf, metrics, timer); if (builder.shortCircuitConf.isScrMetricsEnabled()) { metricsInitializationLock.lock(); if (!isMetricsEnabled) { metrics = BlockReaderLocalMetrics.create(); isMetricsEnabled = true; } {code} # Typo - {{SHORT_CIRCUIT_READ_LATECNY_METRIC_REGISTERD_NAME}}. # Looks like we can eliminate the {{BlockReaderLocal#isMetricsEnabled}}, instead just check whether metrics is null. # Thanks for adding TestBlockReaderIoProvider. One more related test - ensure {{addShortCircuitReadLatency}} is not invoked when the read takes less than the threshold. # Also I think we can make BlockReaderIoProvider be a static utility class since it doesn't maintain any mutable state. But this approach is also fine since the allocated object is lightweight. TestBlockReaderLocalMetrics is a nicely written unit test! > Maintain Short-Circuit Read Statistics > -- > > Key: HDFS-11789 > URL: https://issues.apache.org/jira/browse/HDFS-11789 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Attachments: HDFS-11789.001.patch, HDFS-11789.002.patch, > HDFS-11789.003.patch > > > If a disk or controller hardware is faulty then short-circuit read requests > can stall indefinitely while reading from the file descriptor. Currently > there is no way to detect when short-circuit read requests are slow or > blocked. > This Jira proposes that each BlockReaderLocal maintain read statistics while > it is active by measuring the time taken for a pre-determined fraction of > read requests. These per-reader stats can be aggregated into global stats > when the reader is closed. The aggregate statistics can be exposed via JMX. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-11789) Maintain Short-Circuit Read Statistics
[ https://issues.apache.org/jira/browse/HDFS-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048517#comment-16048517 ] Arpit Agarwal edited comment on HDFS-11789 at 6/13/17 11:37 PM: Thanks for updating the patch [~hanishakoneru]. A few comments: # Typos in {{SHORT_CIRCUIT_READ_LATECNY_METRIC_REGISTERD_NAME}}. # The test-only method {{getShortCircuitReadRollingAverages}} should be tagged with {{\@VisibleForTesting}}. Also it should be package-private if possible. # I think we can eliminate the METRICS_ENABLED config key. We can enable the metric when the sampling percentage is > 0. # Let's avoid static initialization of {{BlockReaderLocal#metrics}}. One issue is that adding the overhead of a MutableRollingAverages object (and its rates roller thread) for all clients, most of which may will never enable SCR statistics. I think your original approach of initializing it within a lock on construction was fine, the overhead of that lock compared to the overhead of cloning a file descriptor via system calls should be minimal. Still reviewing the tests. was (Author: arpitagarwal): Thanks for updating the patch [~hanishakoneru]. A few comments: # Typos in {{SHORT_CIRCUIT_READ_LATECNY_METRIC_REGISTERD_NAME}}. # The test-only method {{getShortCircuitReadRollingAverages}} should be tagged with {{\@VisibleForTesting}}. Also it should be package-private if possible. # I think we can eliminate the METRICS_SAMPLING_PERCENTAGE_KEY config key. We can enable the metric when the sampling percentage is > 0. # Let's avoid static initialization of {{BlockReaderLocal#metrics}}. One issue is that adding the overhead of a MutableRollingAverages object (and its rates roller thread) for all clients, most of which may will never enable SCR statistics. I think your original approach of initializing it within a lock on construction was fine, the overhead of that lock compared to the overhead of cloning a file descriptor via system calls should be minimal. Still reviewing the tests. > Maintain Short-Circuit Read Statistics > -- > > Key: HDFS-11789 > URL: https://issues.apache.org/jira/browse/HDFS-11789 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Attachments: HDFS-11789.001.patch, HDFS-11789.002.patch > > > If a disk or controller hardware is faulty then short-circuit read requests > can stall indefinitely while reading from the file descriptor. Currently > there is no way to detect when short-circuit read requests are slow or > blocked. > This Jira proposes that each BlockReaderLocal maintain read statistics while > it is active by measuring the time taken for a pre-determined fraction of > read requests. These per-reader stats can be aggregated into global stats > when the reader is closed. The aggregate statistics can be exposed via JMX. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-11789) Maintain Short-Circuit Read Statistics
[ https://issues.apache.org/jira/browse/HDFS-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048517#comment-16048517 ] Arpit Agarwal edited comment on HDFS-11789 at 6/13/17 11:35 PM: Thanks for updating the patch [~hanishakoneru]. A few comments: # Typos in {{SHORT_CIRCUIT_READ_LATECNY_METRIC_REGISTERD_NAME}}. # The test-only method {{getShortCircuitReadRollingAverages}} should be tagged with {{\@VisibleForTesting}}. Also it should be package-private if possible. # I think we can eliminate the METRICS_SAMPLING_PERCENTAGE_KEY config key. We can enable the metric when the sampling percentage is > 0. # Let's avoid static initialization of {{BlockReaderLocal#metrics}}. One issue is that adding the overhead of a MutableRollingAverages object (and its rates roller thread) for all clients, most of which may will never enable SCR statistics. I think your original approach of initializing it within a lock on construction was fine, the overhead of that lock compared to the overhead of cloning a file descriptor via system calls should be minimal. Still reviewing the tests. was (Author: arpitagarwal): Thanks for updating the patch [~hanishakoneru]. A few comments: # Typos in {{SHORT_CIRCUIT_READ_LATECNY_METRIC_REGISTERD_NAME}}. # The test-only method {{getShortCircuitReadRollingAverages}} should be tagged with {{\@VisibleForTesting}}. # I think we can eliminate the METRICS_SAMPLING_PERCENTAGE_KEY config key. We can enable the metric when the sampling percentage is > 0. # Let's avoid static initialization of {{BlockReaderLocal#metrics}}. One issue is that adding the overhead of a MutableRollingAverages object (and its rates roller thread) for all clients, most of which may will never enable SCR statistics. I think your original approach of initializing it within a lock on construction was fine, the overhead of that lock compared to the overhead of cloning a file descriptor via system calls should be minimal. Still reviewing the tests. > Maintain Short-Circuit Read Statistics > -- > > Key: HDFS-11789 > URL: https://issues.apache.org/jira/browse/HDFS-11789 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Attachments: HDFS-11789.001.patch, HDFS-11789.002.patch > > > If a disk or controller hardware is faulty then short-circuit read requests > can stall indefinitely while reading from the file descriptor. Currently > there is no way to detect when short-circuit read requests are slow or > blocked. > This Jira proposes that each BlockReaderLocal maintain read statistics while > it is active by measuring the time taken for a pre-determined fraction of > read requests. These per-reader stats can be aggregated into global stats > when the reader is closed. The aggregate statistics can be exposed via JMX. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org