[jira] [Created] (HADOOP-17638) Checksum type is always forced to be crc32
Daryn Sharp created HADOOP-17638: Summary: Checksum type is always forced to be crc32 Key: HADOOP-17638 URL: https://issues.apache.org/jira/browse/HADOOP-17638 Project: Hadoop Common Issue Type: Bug Components: common Reporter: Daryn Sharp HADOOP-14405 made a change to for non-direct byte buffer input to use native crc32. In doing so it forces all non-native byte buffers to use crc32. This is the root cause of the problem in HDFS-14582. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-16521) Subject has a contradiction between proxy user and real user
[ https://issues.apache.org/jira/browse/HADOOP-16521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp resolved HADOOP-16521. -- Resolution: Invalid > Subject has a contradiction between proxy user and real user > > > Key: HADOOP-16521 > URL: https://issues.apache.org/jira/browse/HADOOP-16521 > Project: Hadoop Common > Issue Type: Bug >Reporter: Yicong Cai >Priority: Major > > In the method UserGroupInformation#loginUserFromSubject, if you specify > ProxyUser with HADOOP_PROXY_USER, and create a Proxy UGI instance, the valid > Credentials are included in the User's PrivateCredentials. The UGI > information is as follows: > > {code:java} > proxyUGI > | > |--subject 1 > | | > | |--principals > | | | > | | |--user > | | | > | | --real user > | | > | --privCredentials(all cred) > | > --proxy user > {code} > > If you first login Real User and then use > UserGroupInformation#createProxyUser to create a Proxy UGI, the valid > Credentials information is included in RealUser's subject PrivateCredentials. > The UGI information is as follows: > > {code:java} > proxyUGI > | > |--subject 1 > | | > | |--principals > | | | > | | |--user > | | | > | | --real user > | || > | | --subject 2 > | | | > | |--privCredentials(all cred) > | | > | --privCredentials(empty) > | > --proxy user{code} > > Use the proxy user in the HDFS FileSystem to perform token-related operations. > However, in the RPC Client Connection, use the token in RealUser for > SaslRpcClient#saslConnect. > So the main contradiction is, should ProxyUser's real Credentials information > be placed in ProxyUGI's subject, or should it be placed in RealUser's subject? -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-16291) HDFS Permissions Guide appears incorrect about getFileStatus()/getFileInfo()
[ https://issues.apache.org/jira/browse/HADOOP-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp resolved HADOOP-16291. -- Resolution: Not A Problem > HDFS Permissions Guide appears incorrect about getFileStatus()/getFileInfo() > > > Key: HADOOP-16291 > URL: https://issues.apache.org/jira/browse/HADOOP-16291 > Project: Hadoop Common > Issue Type: Bug > Components: documentation >Reporter: Aaron Fabbri >Priority: Minor > Labels: newbie > > Fix some errors in the HDFS Permissions doc. > Noticed this when reviewing HADOOP-16251. The FS Permissions > [documentation|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html] > seems to mark a lot of permissions as Not Applicable (N/A) when that is not > the case. In particular getFileInfo (getFileStatus) checks READ permission > bit > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L3202-L3204], > as it should. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15980) Enable TLS in RPC client/server
Daryn Sharp created HADOOP-15980: Summary: Enable TLS in RPC client/server Key: HADOOP-15980 URL: https://issues.apache.org/jira/browse/HADOOP-15980 Project: Hadoop Common Issue Type: Sub-task Reporter: Daryn Sharp Assignee: Daryn Sharp Once the RPC client and server can be configured to use Netty, the TLS engine can be added to the channel pipeline. The server should allow QoS-like functionality to determine if TLS is mandatory or optional for a client. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15981) Add mutual TLS support for RPC
Daryn Sharp created HADOOP-15981: Summary: Add mutual TLS support for RPC Key: HADOOP-15981 URL: https://issues.apache.org/jira/browse/HADOOP-15981 Project: Hadoop Common Issue Type: Sub-task Components: ipc, security Reporter: Daryn Sharp Assignee: Daryn Sharp The RPC server should allow optionally enabling mutual TLS as 1st class authentication. If enabled, a client cert may provide the user's identity or fallback to kerberos or token. Essentially the placeholder CERTIFICATE authentication method will be implemented and offered as an authentication method during connection negotiation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15979) Add Netty support to the RPC client
Daryn Sharp created HADOOP-15979: Summary: Add Netty support to the RPC client Key: HADOOP-15979 URL: https://issues.apache.org/jira/browse/HADOOP-15979 Project: Hadoop Common Issue Type: Sub-task Components: ipc, security Reporter: Daryn Sharp Assignee: Daryn Sharp Adding Netty will allow later using a native TLS transport layer with much better performance than that offered by Java's SSLEngine. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15978) Add Netty support to the RPC server
Daryn Sharp created HADOOP-15978: Summary: Add Netty support to the RPC server Key: HADOOP-15978 URL: https://issues.apache.org/jira/browse/HADOOP-15978 Project: Hadoop Common Issue Type: Sub-task Components: ipc, security Reporter: Daryn Sharp Assignee: Daryn Sharp Adding Netty will allow later using a native TLS transport layer with much better performance than that offered by Java's SSLEngine. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15977) RPC support for TLS
Daryn Sharp created HADOOP-15977: Summary: RPC support for TLS Key: HADOOP-15977 URL: https://issues.apache.org/jira/browse/HADOOP-15977 Project: Hadoop Common Issue Type: Improvement Components: ipc, security Reporter: Daryn Sharp Assignee: Daryn Sharp Umbrella ticket to track adding TLS and mutual TLS support to RPC. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15897) Port range binding fails due to socket bind race condition
Daryn Sharp created HADOOP-15897: Summary: Port range binding fails due to socket bind race condition Key: HADOOP-15897 URL: https://issues.apache.org/jira/browse/HADOOP-15897 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.0.2-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Java's {{ServerSocket#bind}} does both a bind and listen. At a system level, multiple processes may bind to the same port but only one may listen. Java sockets are left in an unrecoverable state when a process loses the race to listen first. Servers that compete over a listening port range (ex. App Master) will fail the entire range after a collision. The IPC layer should make a better effort to recover from failed binds. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15813) Enable more reliable SSL connection reuse
Daryn Sharp created HADOOP-15813: Summary: Enable more reliable SSL connection reuse Key: HADOOP-15813 URL: https://issues.apache.org/jira/browse/HADOOP-15813 Project: Hadoop Common Issue Type: Bug Components: common Affects Versions: 2.6.0 Reporter: Daryn Sharp The java keep-alive cache relies on instance equivalence of the SSL socket factory. In many java versions, SSLContext#getSocketFactory always returns a new instance which completely breaks the cache. Clients flooding a service with lingering per-request connections that can lead to port exhaustion. The hadoop SSLFactory should cache the socket factory associated with the context. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15811) Optimizations for Java's TLS performance
Daryn Sharp created HADOOP-15811: Summary: Optimizations for Java's TLS performance Key: HADOOP-15811 URL: https://issues.apache.org/jira/browse/HADOOP-15811 Project: Hadoop Common Issue Type: Bug Components: common Affects Versions: 1.0.0 Reporter: Daryn Sharp Java defaults to using /dev/random and disables intrinsic methods used in hot code paths. Both cause highly synchronized impls to be used that significantly degrade performance. * -Djava.security.egd=file:/dev/urandom * -XX:+UseMontgomerySquareIntrinsic * -XX:+UseMontgomeryMultiplyIntrinsic * -XX:+UseSquareToLenIntrinsic * -XX:+UseMultiplyToLenIntrinsic These settings significantly boost KMS server performance. Under load, threads are not jammed in the SSLEngine. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15743) Jetty and SSL tunings to stabilize KMS performance
Daryn Sharp created HADOOP-15743: Summary: Jetty and SSL tunings to stabilize KMS performance Key: HADOOP-15743 URL: https://issues.apache.org/jira/browse/HADOOP-15743 Project: Hadoop Common Issue Type: Bug Components: kms Affects Versions: 2.8.0 Reporter: Daryn Sharp The KMS has very low throughput with high client failure rates. The following config options will "stabilize" the KMS under load: # Disable ECDH algos because java's SSL engine is inexplicably HORRIBLE. # Reduce SSL session cache size (unlimited) and ttl (24h). The memory cache has very poor performance and causes extreme GC collection pressure. Load balancing diminishes the effectiveness of the cache to 1/N-hosts anyway. ** -Djavax.net.ssl.sessionCacheSize=1000 ** -Djavax.net.ssl.sessionCacheTimeout=6 # Completely disable thread LowResourceMonitor to stop jetty from immediately closing incoming connections during connection bursts. Client retries cause jetty to remain in a low resource state until many clients fail and cause thousands of sockets to linger in various close related states. # Set min/max threads to 4x processors. Jetty recommends only 50 to 500 threads. Java's SSL engine has excessive synchronization that limits performance anyway. # Set https idle timeout to 6s. # Significantly increase max fds to at least 128k. Recommend using a VIP load balancer with a lower limit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15402) Prevent double logout of UGI's LoginContext
Daryn Sharp created HADOOP-15402: Summary: Prevent double logout of UGI's LoginContext Key: HADOOP-15402 URL: https://issues.apache.org/jira/browse/HADOOP-15402 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 3.1.0 Reporter: Daryn Sharp Assignee: Daryn Sharp HADOOP-15294 worked around a LoginContext NPE resulting from a double logout by peering into the Subject. A cleaner fix is tracking whether the LoginContext is logged in. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15212) Add independent secret manager method for logging expired tokens
Daryn Sharp created HADOOP-15212: Summary: Add independent secret manager method for logging expired tokens Key: HADOOP-15212 URL: https://issues.apache.org/jira/browse/HADOOP-15212 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 2.7.6 Reporter: Daryn Sharp Assignee: Daryn Sharp {{AbstractDelegationTokenSecretManager#removeExpiredToken}} has two phases. First phase synchronizes to collect expired tokens. Second phase loops over the collected tokens to log them while not holding the monitor. HDFS-13112 needs to acquire the namesystem lock during the second logging phase, which requires splitting the method apart to allow a method override. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14912) FairCallQueue may defer servicing calls
Daryn Sharp created HADOOP-14912: Summary: FairCallQueue may defer servicing calls Key: HADOOP-14912 URL: https://issues.apache.org/jira/browse/HADOOP-14912 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.8.2 Reporter: Daryn Sharp Assignee: Daryn Sharp HADOOP-14033 switched a lock to a semaphore to allow concurrency for producers & consumers to the underlying queues. A race condition was created that may cause a consumer to acquire a permit but not extract an element, leaving the semaphore with fewer permits than queued elements. This causes a minimum number of calls to always be present in the call queue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-9856) Avoid Krb5LoginModule.logout issue
[ https://issues.apache.org/jira/browse/HADOOP-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp resolved HADOOP-9856. - Resolution: Duplicate Release Note: Incorporated into parent jira. > Avoid Krb5LoginModule.logout issue > -- > > Key: HADOOP-9856 > URL: https://issues.apache.org/jira/browse/HADOOP-9856 > Project: Hadoop Common > Issue Type: Sub-task > Components: security >Affects Versions: 2.0.0-alpha, 3.0.0-alpha1 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HADOOP-9856.patch > > > The kerberos login module's logout method arguably has a bug. > {{Subject#getPrivateCredentials()}} returns a synchronized set. Iterating > the set requires explicitly locking the set. The > {{Krb5LoginModule#logout()}} is iterating and modifying the set w/o a lock. > This may lead to a {{ConcurrentModificationException}} which is what lead to > {{UGI.getCurrentUser()}} being unnecessarily synchronized. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-9852) UGI login user keytab and principal should not be static
[ https://issues.apache.org/jira/browse/HADOOP-9852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp resolved HADOOP-9852. - Resolution: Duplicate Incorporated into parent jira. > UGI login user keytab and principal should not be static > > > Key: HADOOP-9852 > URL: https://issues.apache.org/jira/browse/HADOOP-9852 > Project: Hadoop Common > Issue Type: Sub-task > Components: security >Affects Versions: 2.0.0-alpha, 3.0.0-alpha1 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HADOOP-9852.patch > > > The static keytab and principal for the login user is problematic. The login > conf explicitly references these statics. As a result, > loginUserFromKeytabAndReturnUGI is unnecessarily synch'ed on the class to > swap out the login user's keytab and principal, login, then restore the > keytab/principal. This method's synch blocks further de-synching of other > methods. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14687) AuthenticatedURL will reuse bad/expired session cookies
Daryn Sharp created HADOOP-14687: Summary: AuthenticatedURL will reuse bad/expired session cookies Key: HADOOP-14687 URL: https://issues.apache.org/jira/browse/HADOOP-14687 Project: Hadoop Common Issue Type: Bug Components: common Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical AuthenticatedURL with kerberos was designed to perform spnego, then use a session cookie to avoid renegotiation overhead. Unfortunately the client will continue to use a cookie after it expires. Every request elicits a 401, connection closes (despite keepalive because 401 is an "error"), TGS is obtained, connection re-opened, re-requests with TGS, repeat cycle. This places a strain on the kdc and creates lots of time_wait sockets. The main problem is unbeknownst to the auth url, the JDK transparently does spnego. The server issues a new cookie but the auth url doesn't scrape the cookie from the response because it doesn't know the JDK re-authenticated. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14659) UGI getShortUserName does not need to search the Subject
Daryn Sharp created HADOOP-14659: Summary: UGI getShortUserName does not need to search the Subject Key: HADOOP-14659 URL: https://issues.apache.org/jira/browse/HADOOP-14659 Project: Hadoop Common Issue Type: Improvement Components: common Affects Versions: 2.0.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp {{UGI#getShortUserName}} searches the subject for the {{User}} instance. It's not cheap to iterate a synchronized set, copy matches into a new set, then iterating that set. The UGI ctor already set the {{User}} into a final field... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Reopened] (HADOOP-14146) KerberosAuthenticationHandler should authenticate with SPN in AP-REQ
[ https://issues.apache.org/jira/browse/HADOOP-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp reopened HADOOP-14146: -- Failing with java 7 due to java 8's Iterator having a default impl of remove. Will post addendum shortly. > KerberosAuthenticationHandler should authenticate with SPN in AP-REQ > > > Key: HADOOP-14146 > URL: https://issues.apache.org/jira/browse/HADOOP-14146 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 2.5.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2 > > Attachments: HADOOP-14146.1.patch, HADOOP-14146.2.patch, > HADOOP-14146.3.patch, HADOOP-14146.patch > > > Many attempts (HADOOP-10158, HADOOP-11628, HADOOP-13565) have tried to add > multiple SPN host and/or realm support to spnego authentication. The basic > problem is the server tries to guess and/or brute force what SPN the client > used. The server should just decode the SPN from the AP-REQ. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14578) Bind IPC connections to kerberos UPN host for proxy users
Daryn Sharp created HADOOP-14578: Summary: Bind IPC connections to kerberos UPN host for proxy users Key: HADOOP-14578 URL: https://issues.apache.org/jira/browse/HADOOP-14578 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.0.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp The IPC client will only bind to a kerberos UPN host for the effective user. For proxy users, it does not bind to the authenticating real user's UPN host. This is inconsistent and prevents strict host checking of connection. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-14031) Reduce fair call queue performance impact
[ https://issues.apache.org/jira/browse/HADOOP-14031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp resolved HADOOP-14031. -- Resolution: Fixed All subtasks are complete. > Reduce fair call queue performance impact > - > > Key: HADOOP-14031 > URL: https://issues.apache.org/jira/browse/HADOOP-14031 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.7.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > > The fair call queue has performance deficits that create an illusion of > reasonable performance under heavy load. However, there is excessive lock > contention, priority inversion, and pushback/reconnect issues that combine to > create an artificial rate-limiting on the ingress. > The result is server metrics look good, call queue looks low, yet clients > experience dismal latencies. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-9749) Remove synchronization for UGI.getCurrentUser
[ https://issues.apache.org/jira/browse/HADOOP-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp resolved HADOOP-9749. - Resolution: Duplicate > Remove synchronization for UGI.getCurrentUser > - > > Key: HADOOP-9749 > URL: https://issues.apache.org/jira/browse/HADOOP-9749 > Project: Hadoop Common > Issue Type: Sub-task > Components: security >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0-alpha1 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Attachments: HADOOP-9749.branch-2.patch, HADOOP-9749.trunk.2.patch, > HADOOP-9749.trunk.patch, HADOOP-9749.trunk.patch > > > HADOOP-7854 added synchronization to {{getCurrentUser}} due to > {{ConcurrentModificationExceptions}}. This degrades NN call handler > performance. > The problem was not well understood at the time, but it's caused by a > collision between relogin and {{getCurrentUser}} due to a bug in > {{Krb5LoginModule}}. Avoiding the collision will allow removal of the > synchronization. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14146) KerberosAuthenticationHandler should authenticate with SPN in AP-REQ
Daryn Sharp created HADOOP-14146: Summary: KerberosAuthenticationHandler should authenticate with SPN in AP-REQ Key: HADOOP-14146 URL: https://issues.apache.org/jira/browse/HADOOP-14146 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.5.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Many attempts (HADOOP-10158, HADOOP-11628, HADOOP-13565) have tried to add multiple SPN host and/or realm support to spnego authentication. The basic problem is the server tries to guess and/or brute force what SPN the client used. The server should just decode the SPN from the AP-REQ. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14035) Reduce fair call queue backoff's impact on clients
Daryn Sharp created HADOOP-14035: Summary: Reduce fair call queue backoff's impact on clients Key: HADOOP-14035 URL: https://issues.apache.org/jira/browse/HADOOP-14035 Project: Hadoop Common Issue Type: Sub-task Components: ipc Affects Versions: 2.7.0 Reporter: Daryn Sharp Assignee: Daryn Sharp When fcq backoff is enabled and an abusive client overflows the call queue, its connection is closed, as well as subsequent good client connections. Disconnects are very disruptive, esp. to multi-threaded clients with multiple outstanding requests, or clients w/o a retry proxy (ex. datanodes). Until the abusive user is downgraded to a lower priority queue, disconnect/reconnect mayhem occurs which significantly degrades performance. Server metrics look good despite horrible client latency. The fcq should utilize selective ipc disconnects to avoid pushback disconnecting good clients. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14034) Allow ipc layer exceptions to selectively close connections
Daryn Sharp created HADOOP-14034: Summary: Allow ipc layer exceptions to selectively close connections Key: HADOOP-14034 URL: https://issues.apache.org/jira/browse/HADOOP-14034 Project: Hadoop Common Issue Type: Sub-task Components: ipc Affects Versions: 2.7.0 Reporter: Daryn Sharp Assignee: Daryn Sharp IPC layer exceptions generated in the readers are translated into fatal errors - resulting in connection closure. Ex. RetriableExceptions from call queue pushback. Always closing the connection degrades performance for all clients since a disconnected client will immediately reconnect on retry. Readers become overwhelmed servicing new connections and re-authentications from bad clients instead of servicing calls from good clients. The call queues run dry. Exceptions originating in the readers should be able to indicate if the exception is an error or fatal so connections can remain open. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14033) Reduce fair call queue lock contention
Daryn Sharp created HADOOP-14033: Summary: Reduce fair call queue lock contention Key: HADOOP-14033 URL: https://issues.apache.org/jira/browse/HADOOP-14033 Project: Hadoop Common Issue Type: Sub-task Components: ipc Affects Versions: 2.7.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Under heavy load the call queue may run dry yet clients experience high latency. The fcq requires producers and consumers to sync via a shared lock. Polling consumers hold the lock while scanning all sub-queues. Consumers are serialized despite the sub-queues being thread-safe blocking queues. The effect is to cause other producers/consumers to frequently park. The lock is unfair, so producers/consumers attempt to barge in on the lock. The outnumbered producers tend to remain blocked for an extended time. As load increases and the queues fill, the barging consumers drain the queues faster than the producers can fill it. Server metrics provide an illusion of healthy throughput, response time, and call queue length due to starvation on the ingress. Often as the load gets worse, the server looks better. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14032) Reduce fair call queue priority inversion
Daryn Sharp created HADOOP-14032: Summary: Reduce fair call queue priority inversion Key: HADOOP-14032 URL: https://issues.apache.org/jira/browse/HADOOP-14032 Project: Hadoop Common Issue Type: Sub-task Components: ipc Affects Versions: 2.7.0 Reporter: Daryn Sharp Assignee: Daryn Sharp The fcq's round robin multiplexer actually rewards abusive users. Queue consumers scan for a call from the roving multiplexer index to the lowest prio ring before wrapping around to the higher prio rings. Let's take a fcq with 4 priority rings. Multiplexer shares per index are 8, 4, 2, 1. All well behaved clients are operating in ring 0. Bad client floods the server and drops to the lowest prio. Unfortunately the service order gives 8 shares to the good clients, followed by 4+2+1=7 shares to the bad client. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14031) Reduce fair call queue performance impact
Daryn Sharp created HADOOP-14031: Summary: Reduce fair call queue performance impact Key: HADOOP-14031 URL: https://issues.apache.org/jira/browse/HADOOP-14031 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.7.0 Reporter: Daryn Sharp Assignee: Daryn Sharp The fair call queue has performance deficits that create an illusion of reasonable performance under heavy load. However, there is excessive lock contention, priority inversion, and pushback/reconnect issues that combine to create an artificial rate-limiting on the ingress. The result is server metrics look good, call queue looks low, yet clients experience dismal latencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13549) Eliminate intermediate buffer for server-side PB encoding
Daryn Sharp created HADOOP-13549: Summary: Eliminate intermediate buffer for server-side PB encoding Key: HADOOP-13549 URL: https://issues.apache.org/jira/browse/HADOOP-13549 Project: Hadoop Common Issue Type: Sub-task Components: ipc Reporter: Daryn Sharp Assignee: Daryn Sharp HADOOP-13426 improved encoding and added framed buffers. Upon further review, the intermediate response buffer is completely unnecessary for PB responses since the size can be pre-computed unlike Writables. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13547) Optimize IPC client protobuf decoding
Daryn Sharp created HADOOP-13547: Summary: Optimize IPC client protobuf decoding Key: HADOOP-13547 URL: https://issues.apache.org/jira/browse/HADOOP-13547 Project: Hadoop Common Issue Type: Sub-task Reporter: Daryn Sharp Assignee: Daryn Sharp Counterpart to HADOOP-13438. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13537) Support external calls in the RPC call queue
Daryn Sharp created HADOOP-13537: Summary: Support external calls in the RPC call queue Key: HADOOP-13537 URL: https://issues.apache.org/jira/browse/HADOOP-13537 Project: Hadoop Common Issue Type: Improvement Components: ipc Reporter: Daryn Sharp Assignee: Daryn Sharp Leveraging HADOOP-13465 will allow non-rpc calls to be added to the call queue. This is intended to support routing webhdfs calls through the call queue to provide a unified and protocol-independent QoS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13465) Design Server.Call to be extensible for unified call queue
Daryn Sharp created HADOOP-13465: Summary: Design Server.Call to be extensible for unified call queue Key: HADOOP-13465 URL: https://issues.apache.org/jira/browse/HADOOP-13465 Project: Hadoop Common Issue Type: Sub-task Components: ipc Reporter: Daryn Sharp Assignee: Daryn Sharp The RPC layer supports QoS but other protocols, ex. webhdfs, are completely unconstrained. Generalizing {{Server.Call}} to be extensible with simple changes to the handlers will enable unifying the call queue for multiple protocols. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13442) Optimize UGI group lookups
Daryn Sharp created HADOOP-13442: Summary: Optimize UGI group lookups Key: HADOOP-13442 URL: https://issues.apache.org/jira/browse/HADOOP-13442 Project: Hadoop Common Issue Type: Improvement Reporter: Daryn Sharp Assignee: Daryn Sharp {{UGI#getGroups}} and its usage is inefficient. The list is unnecessarily converted to multiple collections. For _every_ invocation, the {{List}} from the group provider is converted into a {{LinkedHashSet}} (to de-dup), back to a {{String[]}}. Then callers testing for group membership convert back to a {{List}}. This should be done once to reduce allocations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13438) Optimize IPC server protobuf decoding
Daryn Sharp created HADOOP-13438: Summary: Optimize IPC server protobuf decoding Key: HADOOP-13438 URL: https://issues.apache.org/jira/browse/HADOOP-13438 Project: Hadoop Common Issue Type: Sub-task Reporter: Daryn Sharp Assignee: Daryn Sharp The current use of the protobuf API uses an expensive code path. The builder uses the parser to instantiate a message, then copies the message into the builder. The parser is creating multi-layered internally buffering streams that cause excessive byte[] allocations. Using the parser directly with a coded input stream backed by the byte[] from the wire will take a fast-path straight to the pb message's ctor. Substantially less garbage is generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13429) Dispose of unnecessary SASL servers
Daryn Sharp created HADOOP-13429: Summary: Dispose of unnecessary SASL servers Key: HADOOP-13429 URL: https://issues.apache.org/jira/browse/HADOOP-13429 Project: Hadoop Common Issue Type: Sub-task Components: ipc Reporter: Daryn Sharp Assignee: Daryn Sharp The IPC server retains a per-connection SASL server for the duration of the connection. This causes many unnecessary objects to be promoted to old gen. The SASL server should be disposed of unless required for subsequent encryption. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13426) More efficiently build IPC responses
Daryn Sharp created HADOOP-13426: Summary: More efficiently build IPC responses Key: HADOOP-13426 URL: https://issues.apache.org/jira/browse/HADOOP-13426 Project: Hadoop Common Issue Type: Sub-task Reporter: Daryn Sharp Assignee: Daryn Sharp The call response buffer is allowed to dynamically grow until a max size is reached. Often times the full size of the response can be known in advance which avoids copies. This is very advantageous for large responses. Automatic framing of the response buffer will also prevent unnecessary allocations and copies when the size is/isn't known. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13425) IPC layer optimizations
Daryn Sharp created HADOOP-13425: Summary: IPC layer optimizations Key: HADOOP-13425 URL: https://issues.apache.org/jira/browse/HADOOP-13425 Project: Hadoop Common Issue Type: Improvement Reporter: Daryn Sharp Assignee: Daryn Sharp Umbrella jira for y! optimizations to reduce object allocations, more efficiently use protobuf APIs, unified ipc and webhdfs callq to enable QoS, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13052) ChecksumFileSystem mishandles crc file permissions
Daryn Sharp created HADOOP-13052: Summary: ChecksumFileSystem mishandles crc file permissions Key: HADOOP-13052 URL: https://issues.apache.org/jira/browse/HADOOP-13052 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Daryn Sharp Assignee: Daryn Sharp CheckFileSystem does not override permission related calls to apply those operations to the hidden crc files. Clients may be unable to read the crcs if the file is created with strict permissions and then relaxed. The checksum fs is designed to work with or w/o crcs present, so it silently ignores FNF exceptions. The java file stream apis unfortunately may only throw FNF, so permission denied becomes FNF resulting in this bug going silently unnoticed. (Problem discovered via public localizer. Files are downloaded as user-readonly and then relaxed to all-read. The crc remains user-readonly) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-7733) Mapreduce jobs are failing when JT has hadoop.security.token.service.use_ip=false and client has hadoop.security.token.service.use_ip=true
[ https://issues.apache.org/jira/browse/HADOOP-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp resolved HADOOP-7733. - Resolution: Won't Fix The configuration mismatch isn't something that can be worked around. The submitter and tasks used conflicting confs which prevented the token selector from finding the token. (note: this was filed by y! many years ago and it's no longer an issue here) > Mapreduce jobs are failing when JT has > hadoop.security.token.service.use_ip=false and client has > hadoop.security.token.service.use_ip=true > -- > > Key: HADOOP-7733 > URL: https://issues.apache.org/jira/browse/HADOOP-7733 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Affects Versions: 0.20.205.0 >Reporter: Rajit Saha >Assignee: Daryn Sharp > > I have added following property in core-site.xml of all the nodes in cluster > and restarted > > hadoop.security.token.service.use_ip > false > desc > > > Then ran a randomwriter, distcp jobs, they are all failing > $HADOOP_HOME/bin/hadoop --config $HADOOP_CONFIG_DIR jar > $HADOOP_HOME/hadoop-examples.jar randomwriter > -Dtest.randomwrite.bytes_per_map=256000 input_1318325953 > Running 140 maps. > Job started: Tue Oct 11 09:48:09 UTC 2011 > 11/10/11 09:48:09 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 14 > for on :8020 > 11/10/11 09:48:09 INFO security.TokenCache: Got dt for > hdfs:// Hostname>/user//.staging/job_201110110946_0001;uri= IP>:8020;t.service=:8020 > 11/10/11 09:48:09 INFO mapred.JobClient: Cleaning up the staging area > hdfs:///user//.staging/job_201110110946_0001 > org.apache.hadoop.ipc.RemoteException: java.io.IOException: > java.io.IOException: Call to > /:8020 failed on local exception: > java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3943) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) > Caused by: java.io.IOException: Call to / IP>:8020 failed on local exception: > java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed > [Caused by GSSException: No valid > credentials provided (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103) > at org.apache.hadoop.ipc.Client.call(Client.java:1071) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) > at $Proxy7.getProtocolVersion(Unknown Source) > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) > at > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:118) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:222) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:187) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1328) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:65) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1346) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:244) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) > at > org.apache.hadoop.mapred.JobInProgress$2.run(JobInProgress.java:401) > at > org.apache.hadoop.mapred.JobInProgress$2.run(JobInProgress.java:399) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at >
[jira] [Created] (HADOOP-12861) RPC client fails too quickly when server connection limit is reached
Daryn Sharp created HADOOP-12861: Summary: RPC client fails too quickly when server connection limit is reached Key: HADOOP-12861 URL: https://issues.apache.org/jira/browse/HADOOP-12861 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.7.0 Reporter: Daryn Sharp Assignee: Daryn Sharp The NN's rpc server immediately closes new client connections when a connection limit is reached. The client rapidly retries a small number of times with no delay which causes clients to fail quickly. If the connection is refused or timedout, the connection retry policy tries with backoff. Clients should treat a reset connection as a connection failure so the connection retry policy is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12858) Reduce UGI getGroups overhead
Daryn Sharp created HADOOP-12858: Summary: Reduce UGI getGroups overhead Key: HADOOP-12858 URL: https://issues.apache.org/jira/browse/HADOOP-12858 Project: Hadoop Common Issue Type: Improvement Components: performance Affects Versions: 2.0.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Group lookup generates excessive garbage with multiple conversions between collections and arrays. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12483) Maintain wrapped SASL ordering for postponed IPC responses
Daryn Sharp created HADOOP-12483: Summary: Maintain wrapped SASL ordering for postponed IPC responses Key: HADOOP-12483 URL: https://issues.apache.org/jira/browse/HADOOP-12483 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.8.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical A SASL encryption algorithm (wrapping) may have a required ordering for encrypted responses. The IPC layer encrypts when the response is set based on the assumption it is being immediately sent. Postponed responses violate that assumption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-11019) Queued IPC calls are not aborted if the connection drops
[ https://issues.apache.org/jira/browse/HADOOP-11019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp resolved HADOOP-11019. -- Resolution: Duplicate Queued IPC calls are not aborted if the connection drops Key: HADOOP-11019 URL: https://issues.apache.org/jira/browse/HADOOP-11019 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Once a call is read from the wire and queued, it will be executed even if the connection has already dropped. If the client closes the connection due to timeout, perhaps because the server is overloaded, the client's retry will only exacerbate the problem. One specific example is DNs with large block reports overwhelming an already unhealthy NN. Ideally calls should be cancelled when the connection is dropped and/or connection state should be checked when the call is extracted from the callq, prior to decoding and invoking the call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-11888) bootstrapStandby command broken in JDK1.8 with kerberos
[ https://issues.apache.org/jira/browse/HADOOP-11888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp resolved HADOOP-11888. -- Resolution: Duplicate The hostname must be canonicalized, not just resolved. This is a dup of HADOOP-11628 bootstrapStandby command broken in JDK1.8 with kerberos --- Key: HADOOP-11888 URL: https://issues.apache.org/jira/browse/HADOOP-11888 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.6.0 Environment: Suse 11 Sp3 java = 1.8.0_40 Reporter: Bibin A Chundatt Assignee: surendra singh lilhore Priority: Blocker Attachments: HADOOP-11888.patch bootstrapStandby is failing incase of JDK1.8 with kerberos ./hdfs namenode -bootstrapStandby {code} Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: org.apache.hadoop.security.authentication.client.AuthenticationException: Invalid SPNEGO sequence, status code: 403 at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:335) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:206) at org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215) at org.apache.hadoop.hdfs.web.URLConnectionFactory.openConnection(URLConnectionFactory.java:162) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.doGetUrl(TransferFsImage.java:403) ... 16 more Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: Invalid SPNEGO sequence, status code: 403 at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.readToken(KerberosAuthenticator.java:370) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.access$300(KerberosAuthenticator.java:55) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator$1.run(KerberosAuthenticator.java:320) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator$1.run(KerberosAuthenticator.java:288) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:288) ... 20 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11780) Prevent IPC reader thread death
Daryn Sharp created HADOOP-11780: Summary: Prevent IPC reader thread death Key: HADOOP-11780 URL: https://issues.apache.org/jira/browse/HADOOP-11780 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.0.0-alpha Reporter: Daryn Sharp Priority: Critical Reader threads can die to a race condition with the responder thread. If the server's ipc handler cannot send a response in one write, it delegates sending the rest of the response to the responder thread. The race occurs when the responder thread has an exception writing to the socket. The responder closes the socket. This wakes up the reader polling on the socket. If a {{CancelledKeyException}} is thrown, which is a runtime exception, the reader dies. All connections serviced by that reader are now in limbo until the client possibly times out. New connections play roulette as to whether they are assigned to a defunct reader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11628) SPNEGO auth does not work with CNAMEs in JDK8
Daryn Sharp created HADOOP-11628: Summary: SPNEGO auth does not work with CNAMEs in JDK8 Key: HADOOP-11628 URL: https://issues.apache.org/jira/browse/HADOOP-11628 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Pre-JDK8, GSSName auto-canonicalized the hostname when constructing the principal for SPNEGO. JDK8 no longer does this which breaks the use of user-friendly CNAMEs for services. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11019) Queued IPC calls are not aborted if the connection drops
Daryn Sharp created HADOOP-11019: Summary: Queued IPC calls are not aborted if the connection drops Key: HADOOP-11019 URL: https://issues.apache.org/jira/browse/HADOOP-11019 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Once a call is read from the wire and queued, it will be executed even if the connection has already dropped. If the client closes the connection due to timeout, perhaps because the server is overloaded, the client's retry will only exacerbate the problem. One specific example is DNs with large block reports overwhelming an already unhealthy NN. Ideally calls should be cancelled when the connection is dropped and/or connection state should be checked when the call is extracted from the callq, prior to decoding and invoking the call. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10944) AbstractFileSystem may not return correct home directory
Daryn Sharp created HADOOP-10944: Summary: AbstractFileSystem may not return correct home directory Key: HADOOP-10944 URL: https://issues.apache.org/jira/browse/HADOOP-10944 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp {{AbstractFileSystem#getHomeDirectory}} uses the property user.name instead of the current UGI user. This is erroneous for proxy users, users with a TGT principal not matching the local system user, nor custom ugis via {{UGI.createRemoteUser}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10941) Proxy user verification NPEs if remote host is unresolvable
Daryn Sharp created HADOOP-10941: Summary: Proxy user verification NPEs if remote host is unresolvable Key: HADOOP-10941 URL: https://issues.apache.org/jira/browse/HADOOP-10941 Project: Hadoop Common Issue Type: Bug Components: ipc, security Affects Versions: 3.0.0, 2.5.0 Reporter: Daryn Sharp Priority: Critical A null is passed to the impersonation providers for the remote address if it is unresolvable. {{DefaultImpersationProvider}} will NPE, ipc will close the connection immediately (correct behavior for such unexpected exceptions), client fails on {{EOFException}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10942) Globbing optimizations and regression fix
Daryn Sharp created HADOOP-10942: Summary: Globbing optimizations and regression fix Key: HADOOP-10942 URL: https://issues.apache.org/jira/browse/HADOOP-10942 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.1.0-beta, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical When globbing was commonized to support both filesystem and filecontext, it regressed a fix that prevents an intermediate glob that matches a file from throwing a confusing permissions exception. The hdfs traverse check requires the exec bit which a file does not have. Additional optimizations to reduce rpcs actually increases them if directories contain 1 item. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10940) RPC client does poor bounds checking of responses
Daryn Sharp created HADOOP-10940: Summary: RPC client does poor bounds checking of responses Key: HADOOP-10940 URL: https://issues.apache.org/jira/browse/HADOOP-10940 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical The rpc client does no bounds checking of server responses. In the case of communicating with an older and incompatible RPC, this may lead to OOM issues and leaking of resources. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10716) Cannot use more than 1 har filesystem
Daryn Sharp created HADOOP-10716: Summary: Cannot use more than 1 har filesystem Key: HADOOP-10716 URL: https://issues.apache.org/jira/browse/HADOOP-10716 Project: Hadoop Common Issue Type: Bug Components: conf, fs Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Priority: Critical Filesystems are cached purely on scheme + authority. Har filesystems actually need further differentiation based on path to the har file itself. For this reason, the fs cache used to be explicitly disable for har via fs.har.impl.cache.disable in core-default.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10585) Retry polices ignore interrupted exceptions
Daryn Sharp created HADOOP-10585: Summary: Retry polices ignore interrupted exceptions Key: HADOOP-10585 URL: https://issues.apache.org/jira/browse/HADOOP-10585 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Retry polices should not use {{ThreadUtil.sleepAtLeastIgnoreInterrupts}}. This prevents {{FsShell}} commands from being aborted during retries. It also causes orphaned webhdfs DN DFSClients to keep running after the webhdfs client closes the connection. Jetty goes into a loop constantly sending interrupts to the handler thread. Webhdfs retries cause multiple nodes to have these orphaned clients. The DN cannot shutdown until orphaned clients complete. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-8490) Add Configuration to FileSystem cache key
[ https://issues.apache.org/jira/browse/HADOOP-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp resolved HADOOP-8490. - Resolution: Won't Fix Specific issue that prompted the bug was fixed in the NM long ago. Add Configuration to FileSystem cache key - Key: HADOOP-8490 URL: https://issues.apache.org/jira/browse/HADOOP-8490 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 0.23.0, 0.24.0, 2.0.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp The {{FileSystem#get(URI, Configuration}} does not take the given {{Configuration}} into consideration before returning an existing fs instance from the cache with a possibly different conf. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10504) Document proxy server support
Daryn Sharp created HADOOP-10504: Summary: Document proxy server support Key: HADOOP-10504 URL: https://issues.apache.org/jira/browse/HADOOP-10504 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 3.0.0, 2.5.0 Reporter: Daryn Sharp Document http proxy support introduced by HADOOP-10498. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10498) Add support for proxy server
Daryn Sharp created HADOOP-10498: Summary: Add support for proxy server Key: HADOOP-10498 URL: https://issues.apache.org/jira/browse/HADOOP-10498 Project: Hadoop Common Issue Type: New Feature Components: util Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp HDFS-6218 HDFS-6219 require support for configurable proxy servers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-10269) SaslException is completely ignored
[ https://issues.apache.org/jira/browse/HADOOP-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp resolved HADOOP-10269. -- Resolution: Not A Problem SaslException is completely ignored --- Key: HADOOP-10269 URL: https://issues.apache.org/jira/browse/HADOOP-10269 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.2.0 Reporter: Ding Yuan In org/apache/hadoop/security/SaslOutputStream.java, there is the following code pattern: {noformat} 172try { 173 if (saslServer != null) { // using saslServer 174saslToken = saslServer.wrap(inBuf, off, len); 175 } else { // using saslClient 176saslToken = saslClient.wrap(inBuf, off, len); 177 } 178} catch (SaslException se) { 179 try { 180 disposeSasl(); 181 } catch (SaslException ignored) { 182 } 183 throw se; 184} {noformat} On line 181, the exception thrown by disposeSasl(), which can be from SaslServer.dispose() or SaslClient.dispose(), is ignored completely without even logging it. Maybe at least log it? Ding -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10332) HttpServer's jetty audit log always logs 200 OK
Daryn Sharp created HADOOP-10332: Summary: HttpServer's jetty audit log always logs 200 OK Key: HADOOP-10332 URL: https://issues.apache.org/jira/browse/HADOOP-10332 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp HttpServer inserts the audit logger handler _before_ the actual servlet handlers, so the default 200 is always logged even if the operation fails. {code} handlers.setHandlers(new Handler[] {requestLogHandler, contexts}); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HADOOP-10301) AuthenticationFilter should return Forbidden for failed authentication
Daryn Sharp created HADOOP-10301: Summary: AuthenticationFilter should return Forbidden for failed authentication Key: HADOOP-10301 URL: https://issues.apache.org/jira/browse/HADOOP-10301 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker The hadoop-auth AuthenticationFilter returns a 401 Unauthorized without a WWW-Authenticate headers. The is illegal per the HTTP RPC and causes a NPE in the HttpUrlConnection. This is half of a fix that affects webhdfs. See HDFS-4564. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HADOOP-10233) RPC lacks output flow control
Daryn Sharp created HADOOP-10233: Summary: RPC lacks output flow control Key: HADOOP-10233 URL: https://issues.apache.org/jira/browse/HADOOP-10233 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Priority: Critical The RPC layer has input flow control via the callq, however it lacks any output flow control. A handler will try to directly send the response. If the full response is not sent then it is queued for the background responder thread. The RPC layer may end up queuing so many buffers that it locks up in GC. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HADOOP-10173) Remove UGI from DIGEST-MD5 SASL server creation
Daryn Sharp created HADOOP-10173: Summary: Remove UGI from DIGEST-MD5 SASL server creation Key: HADOOP-10173 URL: https://issues.apache.org/jira/browse/HADOOP-10173 Project: Hadoop Common Issue Type: Improvement Components: ipc Affects Versions: 0.23.0, 3.0.0, 2.4.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Instantiation of SASL server instances within the readers threads is performed within a {{UGI.getCurrentUser().doAs}}. {{getCurrentUser}} is synchronized, and doAs also degrades performance. GSSAPI (kerberos) requires instantiation within a doAs, but DIGEST-MD5 (token) does not. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HADOOP-10174) Run RPC server within the Subject that instantiated it
Daryn Sharp created HADOOP-10174: Summary: Run RPC server within the Subject that instantiated it Key: HADOOP-10174 URL: https://issues.apache.org/jira/browse/HADOOP-10174 Project: Hadoop Common Issue Type: Improvement Components: ipc Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp RPC servers would not require as many doAs blocks if the server threads run within the access control context that instantiates the server. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HADOOP-10172) Cache SASL server factories
Daryn Sharp created HADOOP-10172: Summary: Cache SASL server factories Key: HADOOP-10172 URL: https://issues.apache.org/jira/browse/HADOOP-10172 Project: Hadoop Common Issue Type: Improvement Components: ipc Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Performance for SASL server creation is _atrocious_. {{Sasl.createSaslServer}} does not cache the provider resolution for the factories. Factory resolution and server instantiation has 3 major contention points. During bursts of connections, one reader accepting a connection stalls other readers accepting connections, in turn stalling all existing connections handled by those readers. I benched 5 threads at 187 instances/s - total, not per thread. With this and another change, I've boosted it to 33K instances/s. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HADOOP-10146) Workaround JDK7 Process fd close bug
Daryn Sharp created HADOOP-10146: Summary: Workaround JDK7 Process fd close bug Key: HADOOP-10146 URL: https://issues.apache.org/jira/browse/HADOOP-10146 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical JDK7's {{Process}} output streams have an async fd-close race bug. This manifests as commands run via o.a.h.u.Shell causing threads to hang, OOM, or cause other bizarre behavior. The NM is likely to encounter the bug under heavy load. Specifically, {{ProcessBuilder}}'s {{UNIXProcess}} starts a thread to reap the process and drain stdout/stderr to avoid a lingering zombie process. A race occurs if the thread using the stream closes it, the underlying fd is recycled/reopened, while the reaper is draining it. {{ProcessPipeInputStream.drainInputStream}}'s will OOM allocating an array if {{in.available()}} returns a huge number, or may wreak havoc by incorrectly draining the fd. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HADOOP-10129) Distcp may succeed when it fails
Daryn Sharp created HADOOP-10129: Summary: Distcp may succeed when it fails Key: HADOOP-10129 URL: https://issues.apache.org/jira/browse/HADOOP-10129 Project: Hadoop Common Issue Type: Bug Components: tools/distcp Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Distcp uses {{IOUtils.cleanup}} to close its output streams w/o previously attempting to close the streams. {{IOUtils.cleanup}} will swallow close or implicit flush on close exceptions. As a result, distcp may silently skip files when a partial file listing is generated, and/or appear to succeed when individual copies fail. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HADOOP-10099) Reduce chance for RPC denial of service
Daryn Sharp created HADOOP-10099: Summary: Reduce chance for RPC denial of service Key: HADOOP-10099 URL: https://issues.apache.org/jira/browse/HADOOP-10099 Project: Hadoop Common Issue Type: Improvement Components: ipc Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Priority: Minor A RPC server may accept an unlimited number of connections unless indirectly bounded by a blocking operation in the RPC handler threads. The NN's namespace locking happens to cause this blocking, but other RPC servers such as yarn's generate async events which allow unbridled connection acceptance. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HADOOP-10013) FileSystem checkPath accepts invalid paths with an authority but no scheme
Daryn Sharp created HADOOP-10013: Summary: FileSystem checkPath accepts invalid paths with an authority but no scheme Key: HADOOP-10013 URL: https://issues.apache.org/jira/browse/HADOOP-10013 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp {{FileSystem#checkPath}} will consider paths of the form //junk/path as being valid for the given fs. The problem is {{checkPath}} shorts out if the path contains no scheme - assuming it must be a relative or absolute path for the given fs - whereas the condition should be no scheme _and_ no authority. This causes {{DistributedFileSystem#getPathName}} to convert //junk/path into /path, which silently hides the use of invalid paths. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HADOOP-10014) Symlink resolution is fundamentally broken for multi-layered filesystems
Daryn Sharp created HADOOP-10014: Summary: Symlink resolution is fundamentally broken for multi-layered filesystems Key: HADOOP-10014 URL: https://issues.apache.org/jira/browse/HADOOP-10014 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Priority: Critical Symlink resolution is performed on a per-filesystem basis. In a multi-layered filesystem, the symlinks need to be resolved relative to the highest level filesystem in the stack. Otherwise, fs implementations like viewfs and chroot fs behave incorrectly. Absolute symlinks may violate the base of the chroot. Links that should have crossed viewfs mount points are again incorrectly resolved relative to the base filesystem. Symlink resolution has occur above the level of any individual fs to allow a multi-layered fs stack to work correctly, such as via a symlink-aware {{FilteredFileSystem}} that wraps any arbitrary fs to ensure links are resolved from the top-down of the stack. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HADOOP-9953) Improve RPC server throughput
Daryn Sharp created HADOOP-9953: --- Summary: Improve RPC server throughput Key: HADOOP-9953 URL: https://issues.apache.org/jira/browse/HADOOP-9953 Project: Hadoop Common Issue Type: Improvement Components: ipc Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Bottlenecks in the RPC layer are in part holding back the performance of the NN. Even under very heavy load, the NN usually can't saturate more than a few cores even with load patterns dominated by read ops. This will be an umbrella for issues discovered. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9955) RPC idle connection closing is extremely inefficient
Daryn Sharp created HADOOP-9955: --- Summary: RPC idle connection closing is extremely inefficient Key: HADOOP-9955 URL: https://issues.apache.org/jira/browse/HADOOP-9955 Project: Hadoop Common Issue Type: Sub-task Components: ipc Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp The RPC server listener loops accepting connections, distributing the new connections to socket readers, and then conditionally periodically performs a scan for idle connections. The idle scan choses a _random index range_ to scan in a _synchronized linked list_. With 20k+ connections, walking the range of indices in the linked list is extremely expensive. During the sweep, other threads (socket responder and readers) that want to close connections are blocked, and no new connections are being accepted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9930) Desync AbstractDelegationTokenSecretManager
Daryn Sharp created HADOOP-9930: --- Summary: Desync AbstractDelegationTokenSecretManager Key: HADOOP-9930 URL: https://issues.apache.org/jira/browse/HADOOP-9930 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp The ADTSM is heavily synchronized. The result is that verifying, creating, renewing, and canceling tokens are all unnecessarily serialized. The only operations should be serialized are per-token renew and cancel operations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9868) Server must not advertise kerberos realm
Daryn Sharp created HADOOP-9868: --- Summary: Server must not advertise kerberos realm Key: HADOOP-9868 URL: https://issues.apache.org/jira/browse/HADOOP-9868 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 3.0.0, 2.1.1-beta Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker HADOOP-9789 broke kerberos authentication by making the RPC server advertise the kerberos service principal realm. SASL clients and servers do not support specifying a realm, so it must be removed from the advertisement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-9789) Support server advertised kerberos principals
[ https://issues.apache.org/jira/browse/HADOOP-9789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp resolved HADOOP-9789. - Resolution: Fixed Will be fixed by HADOOP-9868. Support server advertised kerberos principals - Key: HADOOP-9789 URL: https://issues.apache.org/jira/browse/HADOOP-9789 Project: Hadoop Common Issue Type: New Feature Components: ipc, security Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Fix For: 3.0.0, 2.1.1-beta Attachments: HADOOP-9789.2.patch, HADOOP-9789.patch, HADOOP-9789.patch, hadoop-ojoshi-datanode-HW10351.local.log, hadoop-ojoshi-namenode-HW10351.local.log The RPC client currently constructs the kerberos principal based on the a config value, usually with an _HOST substitution. This means the service principal must match the hostname the client is using to connect. This causes problems: * Prevents using HA with IP failover when the servers have distinct principals from the failover hostname * Prevents clients from being able to access a service bound to multiple interfaces. Only the interface that matches the server's principal may be used. The client should be able to use the SASL advertised principal (HADOOP-9698), with appropriate safeguards, to acquire the correct service ticket. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9856) Avoid Krb5LoginModule.logout issue
Daryn Sharp created HADOOP-9856: --- Summary: Avoid Krb5LoginModule.logout issue Key: HADOOP-9856 URL: https://issues.apache.org/jira/browse/HADOOP-9856 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp The kerberos login module's logout method arguably has a bug. {{Subject#getPrivateCredentials()}} returns a synchronized set. Iterating the set requires explicitly locking the set. The {{Krb5LoginModule#logout()}} is iterating and modifying the set w/o a lock. This may lead to a {{ConcurrentModificationException}} which is what lead to {{UGI.getCurrentUser()}} being unnecessarily synchronized. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9852) UGI login user keytab and principal should not be static
Daryn Sharp created HADOOP-9852: --- Summary: UGI login user keytab and principal should not be static Key: HADOOP-9852 URL: https://issues.apache.org/jira/browse/HADOOP-9852 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp The static keytab and principal for the login user is problematic. The login conf explicitly references these statics. As a result, loginUserFromKeytabAndReturnUGI is unnecessarily synch'ed on the class to swap out the login user's keytab and principal, login, then restore the keytab/principal. This method's synch blocks further de-synching of other methods. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9850) RPC kerberos errors don't trigger relogin
Daryn Sharp created HADOOP-9850: --- Summary: RPC kerberos errors don't trigger relogin Key: HADOOP-9850 URL: https://issues.apache.org/jira/browse/HADOOP-9850 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 3.0.0, 2.1.0-beta Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker Hadoop auto-renews a ticket cache TGT. However, a TGT acquired via keytab is just allowed to expire. To compensate, any exception during a kerberos RPC connection triggers a relogin. Prior to HADOOP-9698, the RPC client knew the SASL client was attempting authMethod kerberos. Now the SASL client negotiates and returns the authMethod to the RPC Client. When an exception occurs, such as TGT expired, the Client doesn't know what the SASL client was attempting so no relogin is attempted. After 24 hours, keytab based services that act as clients (ex. RM for token renewal) go dead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9832) Add RPC header to client ping
Daryn Sharp created HADOOP-9832: --- Summary: Add RPC header to client ping Key: HADOOP-9832 URL: https://issues.apache.org/jira/browse/HADOOP-9832 Project: Hadoop Common Issue Type: Sub-task Components: ipc Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker Splitting out the ping part of the umbrella jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9816) RPC Sasl QOP is broken
Daryn Sharp created HADOOP-9816: --- Summary: RPC Sasl QOP is broken Key: HADOOP-9816 URL: https://issues.apache.org/jira/browse/HADOOP-9816 Project: Hadoop Common Issue Type: Bug Components: ipc, security Affects Versions: 3.0.0, 2.1.0-beta, 2.3.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker HADOOP-9421 broke the handling of SASL wrapping for RPC QOP integrity and privacy options. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9820) RPCv9 wire protocol is insufficient to support multiplexing
Daryn Sharp created HADOOP-9820: --- Summary: RPCv9 wire protocol is insufficient to support multiplexing Key: HADOOP-9820 URL: https://issues.apache.org/jira/browse/HADOOP-9820 Project: Hadoop Common Issue Type: Bug Components: ipc, security Affects Versions: 3.0.0, 2.1.0-beta Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker RPCv9 is intended to allow future support of multiplexing. This requires all wire messages to be tagged with a RPC header so a demux can decode and route the messages accordingly. RPC ping packets and SASL QOP wrapped data is known to not be tagged with a header. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9789) Support server advertised kerberos principals
Daryn Sharp created HADOOP-9789: --- Summary: Support server advertised kerberos principals Key: HADOOP-9789 URL: https://issues.apache.org/jira/browse/HADOOP-9789 Project: Hadoop Common Issue Type: New Feature Components: ipc, security Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical The RPC client currently constructs the kerberos principal based on the a config value, usually with an _HOST substitution. This means the service principal must match the hostname the client is using to connect. This causes problems: * Prevents using HA with IP failover when the servers have distinct principals from the failover hostname * Prevents clients from being able to access a service bound to multiple interfaces. Only the interface that matches the server's principal may be used. The client should be able to use the SASL advertised principal (HADOOP-9698), with appropriate safeguards, to acquire the correct service ticket. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9747) Reduce unnecessary UGI synchronization
Daryn Sharp created HADOOP-9747: --- Summary: Reduce unnecessary UGI synchronization Key: HADOOP-9747 URL: https://issues.apache.org/jira/browse/HADOOP-9747 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Jstacks of heavily loaded NNs show up to dozens of threads blocking in the UGI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9748) Reduce blocking on UGI.ensureInitialized
Daryn Sharp created HADOOP-9748: --- Summary: Reduce blocking on UGI.ensureInitialized Key: HADOOP-9748 URL: https://issues.apache.org/jira/browse/HADOOP-9748 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical EnsureInitialized is always sync'ed on the class, when it should only sync if it actually has to initialize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9749) Remove synchronization for UGI.getCurrentUser
Daryn Sharp created HADOOP-9749: --- Summary: Remove synchronization for UGI.getCurrentUser Key: HADOOP-9749 URL: https://issues.apache.org/jira/browse/HADOOP-9749 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical HADOOP-7854 added synchronization to {{getCurrentUser}} due to {{ConcurrentModificationExceptions}}. This degrades NN call handler performance. The problem was not well understood at the time, but it's caused by a collision between relogin and {{getCurrentUser}} due to a bug in {{Krb5LoginModule}}. Avoiding the collision will allow removal of the synchronization. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9698) RPCv9 client must honor server's SASL negotiate response
Daryn Sharp created HADOOP-9698: --- Summary: RPCv9 client must honor server's SASL negotiate response Key: HADOOP-9698 URL: https://issues.apache.org/jira/browse/HADOOP-9698 Project: Hadoop Common Issue Type: Sub-task Components: ipc Affects Versions: 3.0.0, 2.1.0-beta Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical As of HADOOP-9421, a RPCv9 server will advertise its authentication methods. This is meant to support features such as IP failover, better token selection, and interoperability in a heterogenous security environment. Currently the client ignores the negotiate response and just blindly attempts to authenticate instead of choosing a mutually agreeable auth method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9645) KerberosAuthenticator NPEs on connect error
Daryn Sharp created HADOOP-9645: --- Summary: KerberosAuthenticator NPEs on connect error Key: HADOOP-9645 URL: https://issues.apache.org/jira/browse/HADOOP-9645 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.0.5-alpha Reporter: Daryn Sharp Priority: Critical A NPE occurs if there's a kerberos error during initial connect. In this case, the NN was using a HTTP service principal with a stale kvno. It causes webhdfs to fail in a non-user friendly manner by masking the real error from the user. {noformat} java.lang.RuntimeException: java.lang.NullPointerException at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1241) at sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:2713) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:477) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.isNegotiate(KerberosAuthenticator.java:164) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:140) at org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:217) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.openHttpUrlConnection(WebHdfsFileSystem.java:364) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-9507) LocalFileSystem rename() is broken in some cases when destination exists
[ https://issues.apache.org/jira/browse/HADOOP-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp resolved HADOOP-9507. - Resolution: Invalid LocalFileSystem rename() is broken in some cases when destination exists Key: HADOOP-9507 URL: https://issues.apache.org/jira/browse/HADOOP-9507 Project: Hadoop Common Issue Type: Bug Components: fs Reporter: Mostafa Elhemali Assignee: Mostafa Elhemali Priority: Minor Attachments: HADOOP-9507.branch-1-win.patch The rename() method in RawLocalFileSystem uses FileUtil.copy() without realizing that FileUtil.copy() has a special behavior that if you're copying /foo to /bar and /bar exists and is a directory, it'll copy /foo inside /bar instead of overwriting it, which is not what rename() wants. So you end up with weird behaviors like in this repro: {code} c: cd \ md Foo md Bar md Foo\X md Bar\X hadoop fs -mv file:///c:/Foo file:///c:/Bar {code} At the end of this, you would expect to find only Bar\X, but you instead find Bar\X\X. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9516) Enable spnego filters only if kerberos is enabled
Daryn Sharp created HADOOP-9516: --- Summary: Enable spnego filters only if kerberos is enabled Key: HADOOP-9516 URL: https://issues.apache.org/jira/browse/HADOOP-9516 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Spnego filters are currently enabled if security is enabled - which is predicated on security=kerberos. With the advent of the PLAIN authentication method, the filters should only be enabled if kerberos is enabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9363) AuthenticatedURL will NPE if server closes connection
Daryn Sharp created HADOOP-9363: --- Summary: AuthenticatedURL will NPE if server closes connection Key: HADOOP-9363 URL: https://issues.apache.org/jira/browse/HADOOP-9363 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp A NPE occurs if the server unexpectedly closes the connection for an {{AuthenticatedURL}} w/o sending a response. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9366) AuthenticatedURL.Token has a mutable hashCode
Daryn Sharp created HADOOP-9366: --- Summary: AuthenticatedURL.Token has a mutable hashCode Key: HADOOP-9366 URL: https://issues.apache.org/jira/browse/HADOOP-9366 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp Hash codes must be immutable, but {{AuthenticatedURL.Token#hashCode}} is not. It will return 0 if the token is not set, else the token's hash code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9339) IPC.Server incorrectly sets UGI auth type
Daryn Sharp created HADOOP-9339: --- Summary: IPC.Server incorrectly sets UGI auth type Key: HADOOP-9339 URL: https://issues.apache.org/jira/browse/HADOOP-9339 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp For non-secure servers, {{IPC.Server#processConnectionContext}} will explicitly set the UGI's auth type to SIMPLE. However the auth type has already been set by this point, and this explicit set causes proxy UGIs to be SIMPLE/SIMPLE instead of PROXY/SIMPLE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9341) Secret Managers should allow explicit purging of tokens and secret keys
Daryn Sharp created HADOOP-9341: --- Summary: Secret Managers should allow explicit purging of tokens and secret keys Key: HADOOP-9341 URL: https://issues.apache.org/jira/browse/HADOOP-9341 Project: Hadoop Common Issue Type: New Feature Components: security Affects Versions: 2.0.0-alpha, 3.0.0, 0.23.7 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Per HDFS-4477, the fsimage retains all secret keys and uncanceled tokens forever. There should be a way to explicitly purge a secret manager of expired items w/o starting its threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9336) Allow UGI of current connection to be queried
Daryn Sharp created HADOOP-9336: --- Summary: Allow UGI of current connection to be queried Key: HADOOP-9336 URL: https://issues.apache.org/jira/browse/HADOOP-9336 Project: Hadoop Common Issue Type: Improvement Components: ipc Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Querying {{UGI.getCurrentUser}} is synch'ed and inefficient for short-lived RPC requests. Since the connection already contains the UGI, there should be a means to query it directly and avoid a call to {{UGI.getCurrentUser}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9317) User cannot specify a kerberos keytab for commands
Daryn Sharp created HADOOP-9317: --- Summary: User cannot specify a kerberos keytab for commands Key: HADOOP-9317 URL: https://issues.apache.org/jira/browse/HADOOP-9317 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical {{UserGroupInformation}} only allows kerberos users to be logged in via the ticket cache when running hadoop commands. {{UGI}} allows a keytab to be used, but it's only exposed programatically. This forces keytab-based users running hadoop commands to periodically issue a kinit from the keytab. A race condition exists during the kinit when the ticket cache is deleted and re-created. Hadoop commands will fail when the ticket cache does not momentarily exist. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9289) FsShell rm -f fails for non-matching globs
Daryn Sharp created HADOOP-9289: --- Summary: FsShell rm -f fails for non-matching globs Key: HADOOP-9289 URL: https://issues.apache.org/jira/browse/HADOOP-9289 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Rm -f isn't supposed to error for paths that don't exist. It works as expected for exact paths, but fails for non-matching globs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9284) Authentication method is wrong if no TGT is present
Daryn Sharp created HADOOP-9284: --- Summary: Authentication method is wrong if no TGT is present Key: HADOOP-9284 URL: https://issues.apache.org/jira/browse/HADOOP-9284 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp If security is enabled, {{UGI.getLoginUser()}} will attempt an os-specific login followed by looking for a TGT in the ticket cache. If no TGT is found, the UGI's authentication method is still set as KERBEROS instead of SIMPLE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9238) FsShell -put from stdin auto-creates paths
Daryn Sharp created HADOOP-9238: --- Summary: FsShell -put from stdin auto-creates paths Key: HADOOP-9238 URL: https://issues.apache.org/jira/browse/HADOOP-9238 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp FsShell put is no longer supposed to auto-create paths. There's an inconsistency where a put from stdin will still auto-create paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9105) FsShell -moreFromLocal erroneously fails
Daryn Sharp created HADOOP-9105: --- Summary: FsShell -moreFromLocal erroneously fails Key: HADOOP-9105 URL: https://issues.apache.org/jira/browse/HADOOP-9105 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp The move successfully completes, but then reports error upon trying to delete the local source directory even though it succeeded. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9070) Kerberos SASL server cannot find kerberos key
Daryn Sharp created HADOOP-9070: --- Summary: Kerberos SASL server cannot find kerberos key Key: HADOOP-9070 URL: https://issues.apache.org/jira/browse/HADOOP-9070 Project: Hadoop Common Issue Type: Sub-task Components: ipc Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker HADOOP-9015 inadvertently removed a {{doAs}} block around instantiation of the sasl server which renders a server incapable of accepting kerberized connections. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9034) SASL negotiation is insufficient to support all types
Daryn Sharp created HADOOP-9034: --- Summary: SASL negotiation is insufficient to support all types Key: HADOOP-9034 URL: https://issues.apache.org/jira/browse/HADOOP-9034 Project: Hadoop Common Issue Type: Bug Components: ipc, security Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp A SASL negotiation requires a series of 1 or more challenge/responses. The current server-side RPC SASL implementation may respond with another challenge, an exception, or a switch to simple method. The server does not reply when the authentication handshake is complete. For SASL mechanisms that require multiple exchanges before the client believes the authentication is complete, the client has an opportunity to read the exception or switch to simple. However some mechanisms, ex. PLAIN, consider the exchange complete as soon as it sends the initial response. The following proxy call will read the SASL response and throw an incomplete protobuf exception. The same issue may manifest when a client sends the final response for a multi-exchange mechanism and the server returns an exception. Fixing the problem requires breaking RPC compatibility. We should consider having the SASL server always return success when authentication is complete. HADOOP-8999 added a short-term workaround to send a success response only for PLAIN, and for the client to always read at least one RPC response to ensure PLAIN will work. Another complication is a SASL server returns non-null when initiating another challenge and null when authentication is established. However, the current RPC exchange does not allow a zero-byte response (client, you initiate the exchange) to be differentiated from a null (client, we're authenticated!). We should consider using a different RPC status to indicate SASL authentication is in progress, so a zero-byte RPC success is interpreted as authentication is complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira