[jira] [Created] (HADOOP-14993) AliyunOSS: Override listFiles and listLocatedStatus

2017-10-29 Thread Genmao Yu (JIRA)
Genmao Yu created HADOOP-14993:
--

 Summary: AliyunOSS: Override listFiles and listLocatedStatus 
 Key: HADOOP-14993
 URL: https://issues.apache.org/jira/browse/HADOOP-14993
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/oss
Affects Versions: 3.0.0-beta1
Reporter: Genmao Yu
Assignee: Genmao Yu


Do a bulk listing off all entries under a path in one single operation, there 
is no need to recursively walk the directory tree.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-10768) Optimize Hadoop RPC encryption performance

2017-10-29 Thread Dian Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224316#comment-16224316
 ] 

Dian Fu commented on HADOOP-10768:
--

Hi, [~daryn], [~atm], very sorry for the late response.  I'm afraid that 
currently I have no bandwidth to continue the work of this ticket and it would 
be great if someone could take over it. 

> Optimize Hadoop RPC encryption performance
> --
>
> Key: HADOOP-10768
> URL: https://issues.apache.org/jira/browse/HADOOP-10768
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: performance, security
>Affects Versions: 3.0.0-alpha1
>Reporter: Yi Liu
>Assignee: Dian Fu
> Attachments: HADOOP-10768.001.patch, HADOOP-10768.002.patch, 
> HADOOP-10768.003.patch, Optimize Hadoop RPC encryption performance.pdf
>
>
> Hadoop RPC encryption is enabled by setting {{hadoop.rpc.protection}} to 
> "privacy". It utilized SASL {{GSSAPI}} and {{DIGEST-MD5}} mechanisms for 
> secure authentication and data protection. Even {{GSSAPI}} supports using 
> AES, but without AES-NI support by default, so the encryption is slow and 
> will become bottleneck.
> After discuss with [~atm], [~tucu00] and [~umamaheswararao], we can do the 
> same optimization as in HDFS-6606. Use AES-NI with more than *20x* speedup.
> On the other hand, RPC message is small, but RPC is frequent and there may be 
> lots of RPC calls in one connection, we needs to setup benchmark to see real 
> improvement and then make a trade-off. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14984) [JDK9] Fail to run yarn application after building hadoop pkg with jdk9 in jdk9 env

2017-10-29 Thread liyunzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224308#comment-16224308
 ] 

liyunzhang commented on HADOOP-14984:
-

[~stevel]: thanks for triggering test. I have 1 question , is there any option 
we can set for the JAVA_OPTS for the yarn-container(except following way)
{code}
@Override
 public void command(List command) {
  line("exec /bin/bash -c \"", StringUtils.join(" ", command), 
"\"");   
  String tmp=StringUtils.join(" ", command);
  //in order to pass on jdk9 see HADOOP-14984
  String newCommand = tmp.replace("bin/java","bin/java 
--add-modules=ALL-SYSTEM");
  line("exec /bin/bash -c \"", newCommand, "\"");
}   
}
{code}

> [JDK9] Fail to run yarn application after building hadoop pkg with jdk9 in 
> jdk9 env
> ---
>
> Key: HADOOP-14984
> URL: https://issues.apache.org/jira/browse/HADOOP-14984
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: liyunzhang
> Attachments: HADOOP-14984.patch
>
>
> After building latest code with jdk9. (patch HADOOP-12760.03.patch, 
> HDFS-11610.001.patch). And start hdfs, yarn service(HADOOP-14978) 
> successfully. I met exception when running TestDFSIO
> {code}
> hadoop jar 
> $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.1.0-SNAPSHOT-tests.jar
>  TestDFSIO -write -nrFiles 8 -fileSize 1MB -resFile ./write.1MB.8
> {code}
> the exception
> {code}
> 67 1) Error injecting constructor, java.lang.NoClassDefFoundError: 
> javax/activation/DataSource
> 68   at 
> org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver.(JAXBContextResolver.java:72)
> 69   at 
> org.apache.hadoop.mapreduce.v2.app.webapp.AMWebApp.setup(AMWebApp.java:33)
> 70   while locating 
> org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver
> 71 
>  72 1 error
> 73 at 
> com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1025)
> 74 at 
> com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1051)
> 75 at 
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory$GuiceInstantiatedComponentProvider.getInstance(GuiceComponentProviderFactory.java:345)
> 76 at 
> com.sun.jersey.core.spi.component.ioc.IoCProviderFactory$ManagedSingleton.(IoCProviderFactory.java:202)
> 77 at 
> com.sun.jersey.core.spi.component.ioc.IoCProviderFactory.wrap(IoCProviderFactory.java:123)
> 78 at 
> com.sun.jersey.core.spi.component.ioc.IoCProviderFactory._getComponentProvider(IoCProviderFactory.java:116)
> 79 at 
> com.sun.jersey.core.spi.component.ProviderFactory.getComponentProvider(ProviderFactory.java:153)
> 80 at 
> com.sun.jersey.core.spi.component.ProviderServices.getComponent(ProviderServices.java:278)
> 81 at 
> com.sun.jersey.core.spi.component.ProviderServices.getProviders(ProviderServices.java:151)
> 82 at 
> com.sun.jersey.core.spi.factory.ContextResolverFactory.init(ContextResolverFactory.java:83)
> 83 at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._initiate(WebApplicationImpl.java:1332)
> 84 at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.access$700(WebApplicationImpl.java:180)
> 85 at 
> com.sun.jersey.server.impl.application.WebApplicationImpl$13.f(WebApplicationImpl.java:799)
> 86 at 
> com.sun.jersey.server.impl.application.WebApplicationImpl$13.f(WebApplicationImpl.java:795)
> 87 at 
> com.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:193)
> 88 at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.initiate(WebApplicationImpl.java:795)
> 89 at 
> com.sun.jersey.guice.spi.container.servlet.GuiceContainer.initiate(GuiceContainer.java:121)
> 90 at 
> com.sun.jersey.spi.container.servlet.ServletContainer$InternalWebComponent.initiate(ServletContainer.java:339)
> 91 at 
> com.sun.jersey.spi.container.servlet.WebComponent.load(WebComponent.java:605)
> 92 at 
> com.sun.jersey.spi.container.servlet.WebComponent.init(WebComponent.java:207)
> 93 at 
> com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:394)
> 94 at 
> com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-12956) Inevitable Log4j2 migration via slf4j

2017-10-29 Thread Ralph Goers (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224234#comment-16224234
 ] 

Ralph Goers edited comment on HADOOP-12956 at 10/29/17 10:29 PM:
-

The only thing that supports the log4j 1 properties files is Log4j 1.x.  That 
was declared EOL 2 years ago. The last release of Log4j 1 was 5 1/2 years ago. 
It doesn't run in Java 9 without hacking it.

At some point you are going to have to get off of Log4j 1.

The log4j team started an effort to create a properties file converter but it 
would only be able to convert Appenders & Layouts that are part of Log4j 1 
itself. That is working to some degree but is still considered experimental. 
Any user created Appenders and Layouts would not be able to be migrated. As we 
would not be able to convert them to a Log4j 2 plugin.

That said, we welcome any ideas or contributions anyone wants to contribute to 
make the migration easier.

I should also point out, SLF4J isn't really an answer for this problem either 
as Logback doesn't support Log4j 1 configurations and its migration tool can't 
handle custom Appenders or Layouts either.


was (Author: ralph.go...@dslextreme.com):
The only thing that supports the log4j 1 properties files is Log4j 1.x.  That 
was declared EOL 2 years ago. The last release of Log4j 1 was 5 1/2 years ago. 
It doesn't run in Java 9 without hacking it.

At some point you are going to have to get off of Log4j 1.

The log4j team started an effort to create a properties file converter but it 
would only be able to convert Appenders & Layouts that are part of Log4j 1 
itself. That is working to some degree but is still considered experimental. 
Any user created Appenders and Layouts would not be able to be migrated. As we 
would not be able to convert them to a Log4j 2 plugin.

That said, we welcome any ideas or contributions anyone wants to contribute to 
make the migration easier.

> Inevitable Log4j2 migration via slf4j
> -
>
> Key: HADOOP-12956
> URL: https://issues.apache.org/jira/browse/HADOOP-12956
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Gopal V
>Assignee: Haohui Mai
>
> {{5 August 2015 --The Apache Logging Services™ Project Management Committee 
> (PMC) has announced that the Log4j™ 1.x logging framework has reached its end 
> of life (EOL) and is no longer officially supported.}}
> https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces
> A whole framework log4j2 upgrade has to be synchronized, partly for improved 
> performance brought about by log4j2.
> https://logging.apache.org/log4j/2.x/manual/async.html#Performance



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-12956) Inevitable Log4j2 migration via slf4j

2017-10-29 Thread Ralph Goers (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224234#comment-16224234
 ] 

Ralph Goers edited comment on HADOOP-12956 at 10/29/17 10:27 PM:
-

The only thing that supports the log4j 1 properties files is Log4j 1.x.  That 
was declared EOL 2 years ago. The last release of Log4j 1 was 5 1/2 years ago. 
It doesn't run in Java 9 without hacking it.

At some point you are going to have to get off of Log4j 1.

The log4j team started an effort to create a properties file converter but it 
would only be able to convert Appenders & Layouts that are part of Log4j 1 
itself. That is working to some degree but is still considered experimental. 
Any user created Appenders and Layouts would not be able to be migrated. As we 
would not be able to convert them to a Log4j 2 plugin.

That said, we welcome any ideas or contributions anyone wants to contribute to 
make the migration easier.


was (Author: ralph.go...@dslextreme.com):
The only thing that supports the log4j 1 properties files is Log4j 1.x.  That 
was declared EOL 2 years ago. The last release of Log4j 1 was 5 1/2 years ago. 
It doesn't run in Java 9 without hacking it.

At some point you are going to have to get off of Log4j 1.

The log4j team started an effort to create a properties file converter but it 
would only be able to convert Appenders & Layouts that are part of Log4j 1 
itself. That is working to some degree but is still considered experimental. 
Any user created Appenders and Layouts would not be able to be migrated. As we 
would not be able to convert them to a Log4j 2 plugin.

> Inevitable Log4j2 migration via slf4j
> -
>
> Key: HADOOP-12956
> URL: https://issues.apache.org/jira/browse/HADOOP-12956
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Gopal V
>Assignee: Haohui Mai
>
> {{5 August 2015 --The Apache Logging Services™ Project Management Committee 
> (PMC) has announced that the Log4j™ 1.x logging framework has reached its end 
> of life (EOL) and is no longer officially supported.}}
> https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces
> A whole framework log4j2 upgrade has to be synchronized, partly for improved 
> performance brought about by log4j2.
> https://logging.apache.org/log4j/2.x/manual/async.html#Performance



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12956) Inevitable Log4j2 migration via slf4j

2017-10-29 Thread Ralph Goers (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224234#comment-16224234
 ] 

Ralph Goers commented on HADOOP-12956:
--

The only thing that supports the log4j 1 properties files is Log4j 1.x.  That 
was declared EOL 2 years ago. The last release of Log4j 1 was 5 1/2 years ago. 
It doesn't run in Java 9 without hacking it.

At some point you are going to have to get off of Log4j 1.

The log4j team started an effort to create a properties file converter but it 
would only be able to convert Appenders & Layouts that are part of Log4j 1 
itself. That is working to some degree but is still considered experimental. 
Any user created Appenders and Layouts would not be able to be migrated. As we 
would not be able to convert them to a Log4j 2 plugin.

> Inevitable Log4j2 migration via slf4j
> -
>
> Key: HADOOP-12956
> URL: https://issues.apache.org/jira/browse/HADOOP-12956
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Gopal V
>Assignee: Haohui Mai
>
> {{5 August 2015 --The Apache Logging Services™ Project Management Committee 
> (PMC) has announced that the Log4j™ 1.x logging framework has reached its end 
> of life (EOL) and is no longer officially supported.}}
> https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces
> A whole framework log4j2 upgrade has to be synchronized, partly for improved 
> performance brought about by log4j2.
> https://logging.apache.org/log4j/2.x/manual/async.html#Performance



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12956) Inevitable Log4j2 migration via slf4j

2017-10-29 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224226#comment-16224226
 ] 

Sean Busbey commented on HADOOP-12956:
--

We need to keep working with runtime deployments that rely on the log4j v1 
properties files. To date that hasn't been possible with Log4j v2 as far as I 
know.

> Inevitable Log4j2 migration via slf4j
> -
>
> Key: HADOOP-12956
> URL: https://issues.apache.org/jira/browse/HADOOP-12956
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Gopal V
>Assignee: Haohui Mai
>
> {{5 August 2015 --The Apache Logging Services™ Project Management Committee 
> (PMC) has announced that the Log4j™ 1.x logging framework has reached its end 
> of life (EOL) and is no longer officially supported.}}
> https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces
> A whole framework log4j2 upgrade has to be synchronized, partly for improved 
> performance brought about by log4j2.
> https://logging.apache.org/log4j/2.x/manual/async.html#Performance



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12956) Inevitable Log4j2 migration via slf4j

2017-10-29 Thread Ralph Goers (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224215#comment-16224215
 ] 

Ralph Goers commented on HADOOP-12956:
--

I have to question why you would move to SLF4J as the API when you can 
accomplish the same thing using the Log4j 2 API.

> Inevitable Log4j2 migration via slf4j
> -
>
> Key: HADOOP-12956
> URL: https://issues.apache.org/jira/browse/HADOOP-12956
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Gopal V
>Assignee: Haohui Mai
>
> {{5 August 2015 --The Apache Logging Services™ Project Management Committee 
> (PMC) has announced that the Log4j™ 1.x logging framework has reached its end 
> of life (EOL) and is no longer officially supported.}}
> https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces
> A whole framework log4j2 upgrade has to be synchronized, partly for improved 
> performance brought about by log4j2.
> https://logging.apache.org/log4j/2.x/manual/async.html#Performance



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-12956) Inevitable Log4j2 migration via slf4j

2017-10-29 Thread Ralph Goers (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224215#comment-16224215
 ] 

Ralph Goers edited comment on HADOOP-12956 at 10/29/17 9:33 PM:


I have to question why you would move to SLF4J as the API when you can 
accomplish the same thing using the Log4j 2 API and the Log4j 2 API provides 
many more features than SLF4J.


was (Author: ralph.go...@dslextreme.com):
I have to question why you would move to SLF4J as the API when you can 
accomplish the same thing using the Log4j 2 API.

> Inevitable Log4j2 migration via slf4j
> -
>
> Key: HADOOP-12956
> URL: https://issues.apache.org/jira/browse/HADOOP-12956
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Gopal V
>Assignee: Haohui Mai
>
> {{5 August 2015 --The Apache Logging Services™ Project Management Committee 
> (PMC) has announced that the Log4j™ 1.x logging framework has reached its end 
> of life (EOL) and is no longer officially supported.}}
> https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces
> A whole framework log4j2 upgrade has to be synchronized, partly for improved 
> performance brought about by log4j2.
> https://logging.apache.org/log4j/2.x/manual/async.html#Performance



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14875) Create end user documentation from the compatibility guidelines

2017-10-29 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HADOOP-14875:
--
Attachment: HADOOP-14875.001.patch

First pass.  It borrows heavily from HADOOP-14876 in some parts.

> Create end user documentation from the compatibility guidelines
> ---
>
> Key: HADOOP-14875
> URL: https://issues.apache.org/jira/browse/HADOOP-14875
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.0.0-beta1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: HADOOP-14875.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14876) Create downstream developer docs from the compatibility guidelines

2017-10-29 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HADOOP-14876:
--
Attachment: HADOOP-14876.003.patch

This should resolve all your comments, [~rkanter].  I also corrected all the 
other spelling mistakes you missed. :)

> Create downstream developer docs from the compatibility guidelines
> --
>
> Key: HADOOP-14876
> URL: https://issues.apache.org/jira/browse/HADOOP-14876
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.0.0-beta1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: Compatibility.pdf, DownstreamDev.pdf, 
> HADOOP-14876.001.patch, HADOOP-14876.002.patch, HADOOP-14876.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14989) Multiple metrics2 sinks (incl JMX) result in inconsistent Mutable(Stat|Rate) values

2017-10-29 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223883#comment-16223883
 ] 

Eric Yang commented on HADOOP-14989:


HI [~xkrogen],

Can we keep both sinks using the same refresh rate, like 10 seconds?  I would 
not recommend to have different refresh rate, this is comparing data samples at 
different frequency.  The resulting graph will not look the same.  Total is a 
high watermark and it will eventually overflow.  This is the reason that Hadoop 
community favored gauge system to minimize compute and interested to monitor 
metrics at real time only during the development phase.

If we want to produce high fidelity data samples.  Time stamp, previous count, 
current count, and Time passed since last sample (or refresh rate) are the 
essential information to record for high fidelity data samples, but post 
processing is more expensive.  Gauge and average are only good for measuring 
velocity of the metrics for a point in time.  Most monitoring system can only 
handle time precision at second or minute scale.  Hence, MutableRate is heavily 
dependent on time precision that the down stream can consume.  One important 
limitation is JMX cache reset requires JMX sink to be the last one in the chain 
with slowest refresh rate to avoid accurate problem like you described.  JMX 
sink should not have a lower refresh rate than FileSink to avoid destroying 
samples before data is sent.


> Multiple metrics2 sinks (incl JMX) result in inconsistent Mutable(Stat|Rate) 
> values
> ---
>
> Key: HADOOP-14989
> URL: https://issues.apache.org/jira/browse/HADOOP-14989
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 2.6.5
>Reporter: Erik Krogen
>Priority: Critical
>
> While doing some digging in the metrics2 system recently, we noticed that the 
> way {{MutableStat}} values are collected (and thus {{MutableRate}}, since it 
> is based off of {{MutableStat}}) mean that each sink configured (including 
> JMX) only receives a portion of the average information.
> {{MutableStat}}, to compute its average value, maintains a total value since 
> last snapshot, as well as operation count since last snapshot. Upon 
> snapshotting, the average is calculated as (total / opCount) and placed into 
> a gauge metric, and total / operation count are cleared. So the average value 
> represents the average since the last snapshot. If only a single sink ever 
> snapshots, this would result in the expected behavior that the value is the 
> average over the reporting period. However, if multiple sinks are configured, 
> or if the JMX cache is refreshed, this is another snapshot operation. So, for 
> example, if you have a FileSink configured at a 60 second interval and your 
> JMX cache refreshes itself 1 second before the FileSink period fires, the 
> values emitted to your FileSink only represent averages _over the last one 
> second_.
> A few ways to solve this issue:
> * From an operator perspective, ensure only one sink is configured. This is 
> not realistic given that the JMX cache exhibits the same behavior.
> * Make {{MutableRate}} manage its own average refresh, similar to 
> {{MutableQuantiles}}, which has a refresh thread and saves a snapshot of the 
> last quantile values that it will serve up until the next refresh. Given how 
> many {{MutableRate}} metrics there are, a thread per metric is not really 
> feasible, but could be done on e.g. a per-source basis. This has some 
> downsides: if multiple sinks are configured with different periods, what is 
> the right refresh period for the {{MutableRate}}? 
> * Make {{MutableRate}} emit two counters, one for total and one for operation 
> count, rather than an average gauge and an operation count counter. The 
> average could then be calculated downstream from this information. This is 
> cumbersome for operators and not backwards compatible. To improve on both of 
> those downsides, we could have it keep the current behavior but 
> _additionally_ emit the total as a counter. The snapshotted average is 
> probably sufficient in the common case (we've been using it for years), and 
> when more guaranteed accuracy is required, the average could be derived from 
> the total and operation count.
> Open to suggestions & input here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org