[jira] [Created] (HADOOP-13777) Trim configuration values in `rumen`
Tianyin Xu created HADOOP-13777: --- Summary: Trim configuration values in `rumen` Key: HADOOP-13777 URL: https://issues.apache.org/jira/browse/HADOOP-13777 Project: Hadoop Common Issue Type: Bug Components: tools Affects Versions: 3.0.0-alpha1 Reporter: Tianyin Xu Priority: Minor The current implementation of {{ClassName.java}} in {{rumen}} does not follow the practice of trimming configuration values. This leads to silent and hard-to-diagnosis errors if users set values containing space or newline---basically classes supposed to need anonymization will not do. See the previous commits as reference (just list a few): HADOOP-6578. Configuration should trim whitespace around a lot of value types HADOOP-6534. Trim whitespace from directory lists initializing Patch is available against trunk HDFS-9708. FSNamesystem.initAuditLoggers() doesn't trim classnames HDFS-2799. Trim fs.checkpoint.dir values. YARN-3395. FairScheduler: Trim whitespaces when using username for queuename. YARN-2869. CapacityScheduler should trim sub queue names when parse configuration. Patch is available against trunk (tested): {code:title=ClassName.java|borderStyle=solid} @@ -43,15 +43,13 @@ protected String getPrefix() { @Override protected boolean needsAnonymization(Configuration conf) { -String[] preserves = conf.getStrings(CLASSNAME_PRESERVE_CONFIG); -if (preserves != null) { - // do a simple starts with check - for (String p : preserves) { -if (className.startsWith(p)) { - return false; -} +String[] preserves = conf.getTrimmedStrings(CLASSNAME_PRESERVE_CONFIG); +// do a simple starts with check +for (String p : preserves) { + if (className.startsWith(p)) { +return false; } } return true; } {code} (the NULL check is no longer needed because {{getTrimmedStrings}} returns an empty array if nothing is set) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-12676) Inconsistent assumptions of the default keytab file of Kerberos
[ https://issues.apache.org/jira/browse/HADOOP-12676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianyin Xu resolved HADOOP-12676. - Resolution: Invalid > Inconsistent assumptions of the default keytab file of Kerberos > --- > > Key: HADOOP-12676 > URL: https://issues.apache.org/jira/browse/HADOOP-12676 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 2.7.1, 2.6.2 >Reporter: Tianyin Xu >Assignee: Tianyin Xu >Priority: Minor > > In the current implementation of {{SecurityUtil}}, we do not consider the > default keytab file of Kerberos (which is {{/etc/krb5.keytab}} in [MIT > Kerberos > defaults|http://web.mit.edu/kerberos/krb5-1.13/doc/mitK5defaults.html#paths]). > If the user does not set the keytab file, an {{IOException}} will be thrown. > {code:title=SecurityUtil.java|borderStyle=solid} > 230 public static void login(final Configuration conf, > 231 final String keytabFileKey, final String userNameKey, String > hostname) > 232 throws IOException { > ... > 237 String keytabFilename = conf.get(keytabFileKey); > 238 if (keytabFilename == null || keytabFilename.length() == 0) { > 239 throw new IOException("Running in secure mode, but config doesn't > have a keytab"); > 240 } > {code} > However, the default keytab location is assumed by some of the callers. For > example, in > [{{yarn-default.xml}}|https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml], > ||property|| default|| > |yarn.resourcemanager.keytab | /etc/krb5.keytab > |yarn.nodemanager.keytab| /etc/krb5.keytab > |yarn.timeline-service.keytab | /etc/krb5.keytab > On the other hand, these callers directly call the {{SecurityUtil.login}} > method; therefore, the docs are incorrect that the defaults are actually > {{null}} (as we do not have a default)... > {code:title=NodeManager.java|borderStyle=solid} > protected void doSecureLogin() throws IOException { > SecurityUtil.login(getConfig(), YarnConfiguration.NM_KEYTAB, > YarnConfiguration.NM_PRINCIPAL); > } > {code} > I don't know if we should make {{/etc/krb5.keytab}} as the default in > {{SecurityUtil}}, or ask the callers to correct their assumptions. I post > here as a minor issue. > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12676) Consider the default keytab file of Kerberos
Tianyin Xu created HADOOP-12676: --- Summary: Consider the default keytab file of Kerberos Key: HADOOP-12676 URL: https://issues.apache.org/jira/browse/HADOOP-12676 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 2.6.2, 2.7.1 Reporter: Tianyin Xu Priority: Minor In the current implementation of {{SecurityUtil}}, we do not consider the default keytab file of Kerberos (which is {{/etc/krb5.keytab}} in [MIT Kerberos defaults|http://web.mit.edu/kerberos/krb5-1.13/doc/mitK5defaults.html#paths]). If the user does not set the keytab file, an {{IOException}} will be thrown. {code:title=SecurityUtil.java|borderStyle=solid} 230 public static void login(final Configuration conf, 231 final String keytabFileKey, final String userNameKey, String hostname) 232 throws IOException { ... 237 String keytabFilename = conf.get(keytabFileKey); 238 if (keytabFilename == null || keytabFilename.length() == 0) { 239 throw new IOException("Running in secure mode, but config doesn't have a keytab"); 240 } {code} However, the default keytab location is assumed by some of the callers. For example, in [{{yarn-default.xml}}|https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml], the defaults of {{yarn.resourcemanager.keytab}}, {{yarn.nodemanager.keytab}}, and {{yarn.timeline-service.keytab}} all point to {{/etc/krb5.keytab}}. On the other hand, these callers directly call the {{SecurityUtil.login}} method; therefore, the docs are incorrect that the defaults are actually {{null}} (as we do not have a default)... {code:title=NodeManager.java|borderStyle=solid} protected void doSecureLogin() throws IOException { SecurityUtil.login(getConfig(), YarnConfiguration.NM_KEYTAB, YarnConfiguration.NM_PRINCIPAL); } {code} I don't know if we should make {{/etc/krb5.keytab}} as the default in {{SecurityUtil}}, or ask the callers to correct their assumptions. I post here as a potential improvement. Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12671) Inconsistent configuration values and incorrect comments
Tianyin Xu created HADOOP-12671: --- Summary: Inconsistent configuration values and incorrect comments Key: HADOOP-12671 URL: https://issues.apache.org/jira/browse/HADOOP-12671 Project: Hadoop Common Issue Type: Bug Components: conf, documentation, fs/s3 Affects Versions: 2.6.2, 2.7.1 Reporter: Tianyin Xu The two values in [core-default.xml | https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/core-default.xml] are wrong. {{fs.s3a.multipart.purge.age}} {{fs.s3a.connection.timeout}} {{fs.s3a.connection.establish.timeout}} \\ \\ *1. {{fs.s3a.multipart.purge.age}}* (in both {{2.6.2}} and {{2.7.1}}) In [core-default.xml | https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/core-default.xml], the value is {{86400}} ({{24}} hours), while in the code it is {{14400}} ({{4}} hours). \\ \\ *2. {{fs.s3a.connection.timeout}}* (only appear in {{2.6.2}}) In [core-default.xml (2.6.2) | https://hadoop.apache.org/docs/r2.6.2/hadoop-project-dist/hadoop-common/core-default.xml], the value is {{5000}}, while in the code it is {{5}}. {code} // seconds until we give up on a connection to s3 public static final String SOCKET_TIMEOUT = "fs.s3a.connection.timeout"; public static final int DEFAULT_SOCKET_TIMEOUT = 5; {code} \\ *3. {{fs.s3a.connection.establish.timeout}}* (only appear in {{2.7.1}}) In [core-default.xml (2.7.1)| https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/core-default.xml], the value is {{5000}}, while in the code it is {{5}}. {code} // seconds until we give up trying to establish a connection to s3 public static final String ESTABLISH_TIMEOUT = "fs.s3a.connection.establish.timeout"; public static final int DEFAULT_ESTABLISH_TIMEOUT = 5; {code} \\ btw, the code comments are wrong! The two parameters are in the unit of *milliseconds* instead of *seconds*... {code} - // seconds until we give up on a connection to s3 + // milliseconds until we give up on a connection to s3 ... - // seconds until we give up trying to establish a connection to s3 + // milliseconds until we give up trying to establish a connection to s3 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12659) Incorrect usage of config parameters in token manager of KMS
Tianyin Xu created HADOOP-12659: --- Summary: Incorrect usage of config parameters in token manager of KMS Key: HADOOP-12659 URL: https://issues.apache.org/jira/browse/HADOOP-12659 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.6.2, 2.7.1 Reporter: Tianyin Xu Hi, the usage of the following configs of Key Management Server (KMS) are problematic: {{hadoop.kms.authentication.delegation-token.renew-interval.sec}} {{hadoop.kms.authentication.delegation-token.removal-scan-interval.sec}} The name indicates that the units are {{sec}}, and the online doc shows that the default values are {{86400}} and {{3600}}, respectively. https://hadoop.apache.org/docs/stable/hadoop-kms/index.html which is also defined in {code:title=DelegationTokenManager.java|borderStyle=solid} 55 public static final String RENEW_INTERVAL = PREFIX + "renew-interval.sec"; 56 public static final long RENEW_INTERVAL_DEFAULT = 24 * 60 * 60; ... 58 public static final String REMOVAL_SCAN_INTERVAL = PREFIX + 59 "removal-scan-interval.sec"; 60 public static final long REMOVAL_SCAN_INTERVAL_DEFAULT = 60 * 60; {code} However, in {{DelegationTokenManager.java}} and {{ZKDelegationTokenSecretManager.java}}, these two parameters are used incorrectly. 1. *{{DelegationTokenManager.java}}* {code} 70 conf.getLong(RENEW_INTERVAL, RENEW_INTERVAL_DEFAULT) * 1000, 71 conf.getLong(REMOVAL_SCAN_INTERVAL, 72 REMOVAL_SCAN_INTERVAL_DEFAULT * 1000)); {code} Apparently, at Line 72, {{REMOVAL_SCAN_INTERVAL}} should be used in the same way as {{RENEW_INTERVAL}}, like {code} 72c72 < REMOVAL_SCAN_INTERVAL_DEFAULT * 1000)); --- > REMOVAL_SCAN_INTERVAL_DEFAULT) * 1000); {code} Currently, the unit of {{hadoop.kms.authentication.delegation-token.removal-scan-interval.sec}} is not {{sec}} but {{millisec}}. 2. *{{ZKDelegationTokenSecretManager.java}}* {code} 142 conf.getLong(DelegationTokenManager.RENEW_INTERVAL, 143 DelegationTokenManager.RENEW_INTERVAL_DEFAULT * 1000), 144 conf.getLong(DelegationTokenManager.REMOVAL_SCAN_INTERVAL, 145 DelegationTokenManager.REMOVAL_SCAN_INTERVAL_DEFAULT) * 1000); {code} The situation is the opposite in this class that {{hadoop.kms.authentication.delegation-token.renew-interval.sec}} is wrong but the other is correct... A patch should be like {code} 143c143 < DelegationTokenManager.RENEW_INTERVAL_DEFAULT * 1000), --- > DelegationTokenManager.RENEW_INTERVAL_DEFAULT) * 1000, {code} Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11328) ZKFailoverController.java does not log Exception and causes latent problems during failover
Tianyin Xu created HADOOP-11328: --- Summary: ZKFailoverController.java does not log Exception and causes latent problems during failover Key: HADOOP-11328 URL: https://issues.apache.org/jira/browse/HADOOP-11328 Project: Hadoop Common Issue Type: Bug Components: ha Affects Versions: 2.5.1 Reporter: Tianyin Xu In _ZKFailoverController.java_, the _Exception_ caught by the _run()_ method does not have a single error log. This causes latent problems that are only manifested during failover. h5. The problem we encountered An _Exception_ is thrown from the _doRun()_ method during _initHM()_ (caused by a configuration error). If you want to repeat, you can set _ha.health-monitor.connect-retry-interval.ms_ to be any nonsensical value. {code:title=ZKFailoverController.java|borderStyle=solid} private int doRun(String[] args) ... initRPC(); initHM(); startRPC(); } {code} The Exception is caught in the _run()_ method, as follows, {code:title=ZKFailoverController.java|borderStyle=solid} public int run(final String[] args) throws Exception { ... try { ... @Override public Integer run() { try { return doRun(args); } catch (Exception t) { throw new RuntimeException(t); } finally { if (elector != null) { elector.terminateConnection(); } } } }); } catch (RuntimeException rte) { throw (Exception)rte.getCause(); } } {code} Unfortunately, the Exception (causing the shutdown of the process) is *not logged at all*. This causes latent errors which is only manifested during failover (because ZKFC is dead). The tricky thing here is that everything looks perfectly fine: the _jps_ command shows a running DFSZKFailoverController process and the two NameNode (active and standby) work fine. h5. Patch We strongly suggest to add a error log to notify the error caught, such as, --- hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java (revision 1641307) +++ hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java (working copy) {code:title=@@ -178,6 +178,7 @@|borderStyle=solid} } }); } catch (RuntimeException rte) { + LOG.fatal(The failover controller encounters runtime error: + rte); throw (Exception)rte.getCause(); } } {code} Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)