[jira] [Comment Edited] (HDFS-11789) Maintain Short-Circuit Read Statistics

2017-06-21 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052519#comment-16052519
 ] 

Arpit Agarwal edited comment on HDFS-11789 at 6/21/17 6:25 PM:
---

Thanks for updating the patch [~hanishakoneru]. A few more comments:
# In BlockReaderLocal constructor, metrics is being passed to 
BlockReaderIoProvider before it is potentially initialized.
{code}
this.blockReaderIoProvider = new BlockReaderIoProvider(
builder.shortCircuitConf, metrics, timer);

if (builder.shortCircuitConf.isScrMetricsEnabled()) {
  metricsInitializationLock.lock();
  if (!isMetricsEnabled) {
metrics = BlockReaderLocalMetrics.create();
isMetricsEnabled = true;
  }
{code}
# Typo - {{SHORT_CIRCUIT_READ_LATECNY_METRIC_REGISTERD_NAME}}.
# Looks like we can eliminate the {{BlockReaderLocal#isMetricsEnabled}}, 
instead just check whether metrics is null.
# Thanks for adding TestBlockReaderIoProvider. -One more related test - ensure 
{{addShortCircuitReadLatency}} is not invoked when the read takes less than the 
threshold.-
# Also I think we can make BlockReaderIoProvider be a static utility class 
since it doesn't maintain any mutable state. But this approach is also fine 
since the allocated object is lightweight.

TestBlockReaderLocalMetrics is a nicely written unit test!

Edit: Sorry about the confusion, there is no threshold for 
addShortCircuitReadLatency.


was (Author: arpitagarwal):
Thanks for updating the patch [~hanishakoneru]. A few more comments:
# In BlockReaderLocal constructor, metrics is being passed to 
BlockReaderIoProvider before it is potentially initialized.
{code}
this.blockReaderIoProvider = new BlockReaderIoProvider(
builder.shortCircuitConf, metrics, timer);

if (builder.shortCircuitConf.isScrMetricsEnabled()) {
  metricsInitializationLock.lock();
  if (!isMetricsEnabled) {
metrics = BlockReaderLocalMetrics.create();
isMetricsEnabled = true;
  }
{code}
# Typo - {{SHORT_CIRCUIT_READ_LATECNY_METRIC_REGISTERD_NAME}}.
# Looks like we can eliminate the {{BlockReaderLocal#isMetricsEnabled}}, 
instead just check whether metrics is null.
# Thanks for adding TestBlockReaderIoProvider. One more related test - ensure 
{{addShortCircuitReadLatency}} is not invoked when the read takes less than the 
threshold.
# Also I think we can make BlockReaderIoProvider be a static utility class 
since it doesn't maintain any mutable state. But this approach is also fine 
since the allocated object is lightweight.

TestBlockReaderLocalMetrics is a nicely written unit test!

> Maintain Short-Circuit Read Statistics
> --
>
> Key: HDFS-11789
> URL: https://issues.apache.org/jira/browse/HDFS-11789
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: HDFS-11789.001.patch, HDFS-11789.002.patch, 
> HDFS-11789.003.patch
>
>
> If a disk or controller hardware is faulty then short-circuit read requests 
> can stall indefinitely while reading from the file descriptor. Currently 
> there is no way to detect when short-circuit read requests are slow or 
> blocked. 
> This Jira proposes that each BlockReaderLocal maintain read statistics while 
> it is active by measuring the time taken for a pre-determined fraction of 
> read requests. These per-reader stats can be aggregated into global stats 
> when the reader is closed. The aggregate statistics can be exposed via JMX.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11789) Maintain Short-Circuit Read Statistics

2017-06-13 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048517#comment-16048517
 ] 

Arpit Agarwal edited comment on HDFS-11789 at 6/13/17 11:37 PM:


Thanks for updating the patch [~hanishakoneru]. A few comments:
# Typos in {{SHORT_CIRCUIT_READ_LATECNY_METRIC_REGISTERD_NAME}}.
# The test-only method {{getShortCircuitReadRollingAverages}} should be tagged 
with {{\@VisibleForTesting}}. Also it should be package-private if possible.
# I think we can eliminate the METRICS_ENABLED config key. We can enable the 
metric when the sampling percentage is > 0.
# Let's avoid static initialization of {{BlockReaderLocal#metrics}}. One issue 
is that adding the overhead of a MutableRollingAverages object (and its rates 
roller thread) for all clients, most of which may will never enable SCR 
statistics. I think your original approach of initializing it within a lock on 
construction was fine, the overhead of that lock compared to the overhead of 
cloning a file descriptor via system calls should be minimal.

Still reviewing the tests.


was (Author: arpitagarwal):
Thanks for updating the patch [~hanishakoneru]. A few comments:
# Typos in {{SHORT_CIRCUIT_READ_LATECNY_METRIC_REGISTERD_NAME}}.
# The test-only method {{getShortCircuitReadRollingAverages}} should be tagged 
with {{\@VisibleForTesting}}. Also it should be package-private if possible.
# I think we can eliminate the METRICS_SAMPLING_PERCENTAGE_KEY config key. We 
can enable the metric when the sampling percentage is > 0.
# Let's avoid static initialization of {{BlockReaderLocal#metrics}}. One issue 
is that adding the overhead of a MutableRollingAverages object (and its rates 
roller thread) for all clients, most of which may will never enable SCR 
statistics. I think your original approach of initializing it within a lock on 
construction was fine, the overhead of that lock compared to the overhead of 
cloning a file descriptor via system calls should be minimal.

Still reviewing the tests.

> Maintain Short-Circuit Read Statistics
> --
>
> Key: HDFS-11789
> URL: https://issues.apache.org/jira/browse/HDFS-11789
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: HDFS-11789.001.patch, HDFS-11789.002.patch
>
>
> If a disk or controller hardware is faulty then short-circuit read requests 
> can stall indefinitely while reading from the file descriptor. Currently 
> there is no way to detect when short-circuit read requests are slow or 
> blocked. 
> This Jira proposes that each BlockReaderLocal maintain read statistics while 
> it is active by measuring the time taken for a pre-determined fraction of 
> read requests. These per-reader stats can be aggregated into global stats 
> when the reader is closed. The aggregate statistics can be exposed via JMX.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11789) Maintain Short-Circuit Read Statistics

2017-06-13 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048517#comment-16048517
 ] 

Arpit Agarwal edited comment on HDFS-11789 at 6/13/17 11:35 PM:


Thanks for updating the patch [~hanishakoneru]. A few comments:
# Typos in {{SHORT_CIRCUIT_READ_LATECNY_METRIC_REGISTERD_NAME}}.
# The test-only method {{getShortCircuitReadRollingAverages}} should be tagged 
with {{\@VisibleForTesting}}. Also it should be package-private if possible.
# I think we can eliminate the METRICS_SAMPLING_PERCENTAGE_KEY config key. We 
can enable the metric when the sampling percentage is > 0.
# Let's avoid static initialization of {{BlockReaderLocal#metrics}}. One issue 
is that adding the overhead of a MutableRollingAverages object (and its rates 
roller thread) for all clients, most of which may will never enable SCR 
statistics. I think your original approach of initializing it within a lock on 
construction was fine, the overhead of that lock compared to the overhead of 
cloning a file descriptor via system calls should be minimal.

Still reviewing the tests.


was (Author: arpitagarwal):
Thanks for updating the patch [~hanishakoneru]. A few comments:
# Typos in {{SHORT_CIRCUIT_READ_LATECNY_METRIC_REGISTERD_NAME}}.
# The test-only method {{getShortCircuitReadRollingAverages}} should be tagged 
with {{\@VisibleForTesting}}.
# I think we can eliminate the METRICS_SAMPLING_PERCENTAGE_KEY config key. We 
can enable the metric when the sampling percentage is > 0.
# Let's avoid static initialization of {{BlockReaderLocal#metrics}}. One issue 
is that adding the overhead of a MutableRollingAverages object (and its rates 
roller thread) for all clients, most of which may will never enable SCR 
statistics. I think your original approach of initializing it within a lock on 
construction was fine, the overhead of that lock compared to the overhead of 
cloning a file descriptor via system calls should be minimal.

Still reviewing the tests.

> Maintain Short-Circuit Read Statistics
> --
>
> Key: HDFS-11789
> URL: https://issues.apache.org/jira/browse/HDFS-11789
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: HDFS-11789.001.patch, HDFS-11789.002.patch
>
>
> If a disk or controller hardware is faulty then short-circuit read requests 
> can stall indefinitely while reading from the file descriptor. Currently 
> there is no way to detect when short-circuit read requests are slow or 
> blocked. 
> This Jira proposes that each BlockReaderLocal maintain read statistics while 
> it is active by measuring the time taken for a pre-determined fraction of 
> read requests. These per-reader stats can be aggregated into global stats 
> when the reader is closed. The aggregate statistics can be exposed via JMX.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org