[jira] [Commented] (IMPALA-3189) Address scalability issue with N^2 KDC requests on cluster startup
[ https://issues.apache.org/jira/browse/IMPALA-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990098#comment-16990098 ] Michael Ho commented on IMPALA-3189: Hi [~tlipcon], we still saw that in the cold startup case even with KRPC under a large enough scale (e.g. 300+ nodes). It will manifest as some sort of negotiation error and we had to increase the timeout or something to work around it (see IMPALA-5901) > Address scalability issue with N^2 KDC requests on cluster startup > -- > > Key: IMPALA-3189 > URL: https://issues.apache.org/jira/browse/IMPALA-3189 > Project: IMPALA > Issue Type: Improvement > Components: Distributed Exec, Security >Affects Versions: Impala 2.5.0 >Reporter: Henry Robinson >Priority: Critical > Labels: kerberos, scalability > > When Impala runs a query that shuffles data amongst all nodes in a > Kerberos-secured cluster, every node will need to acquire a TGS for every > other node. In a cluster of 100 nodes or more, this can overwhelm the KDC, > and queries can exit with an error ("Could not contact KDC for realm"). > A simple workaround is to run a warm-up query until it succeeds (which can > take a few minutes after cluster startup). The KDC can also be scaled (e.g. > with secondary KDC nodes). > Impala can also consider either forcing a TGS request on start-up in a > staggered fashion, or we can move to recommending SSL + client certificates > for server<->server communication. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-3189) Address scalability issue with N^2 KDC requests on cluster startup
[ https://issues.apache.org/jira/browse/IMPALA-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990091#comment-16990091 ] Todd Lipcon commented on IMPALA-3189: - This should be largely better with KRPC since we maintain long-running connections between nodes. Do people still see this issue on the first query after startup? > Address scalability issue with N^2 KDC requests on cluster startup > -- > > Key: IMPALA-3189 > URL: https://issues.apache.org/jira/browse/IMPALA-3189 > Project: IMPALA > Issue Type: Improvement > Components: Distributed Exec, Security >Affects Versions: Impala 2.5.0 >Reporter: Henry Robinson >Priority: Critical > Labels: kerberos, scalability > > When Impala runs a query that shuffles data amongst all nodes in a > Kerberos-secured cluster, every node will need to acquire a TGS for every > other node. In a cluster of 100 nodes or more, this can overwhelm the KDC, > and queries can exit with an error ("Could not contact KDC for realm"). > A simple workaround is to run a warm-up query until it succeeds (which can > take a few minutes after cluster startup). The KDC can also be scaled (e.g. > with secondary KDC nodes). > Impala can also consider either forcing a TGS request on start-up in a > staggered fashion, or we can move to recommending SSL + client certificates > for server<->server communication. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org