Hello Lin,

Great catch!  Yes, there is an implicit assumption that 
dfs.block.access.token.enable would have to be set to true.

The main purpose of this property is to activate usage of block access tokens 
for client DataNode interactions.  For example, upon opening a file, the user 
authenticates to the NameNode, the NameNode issues block access tokens that 
declare the user is authorized to access the blocks of the file, and then the 
client presents those block access tokens to the DataNodes to authorize access 
to the block.

The implicit dependency for data transfer protocol encryption comes about 
because that feature also relies on the same infrastructure within the NameNode 
for management of encryption keys.  Without the property enabled, the NameNode 
doesn't activate its secret management infrastructure, and therefore there is 
no encryption key available to use for data transfer protocol encryption.

Most fully secured deployments would set dfs.block.access.token.enable=true as 
part of the full security enablement procedure.  Since you're not fully 
enabling security, your case is the first time that I've seen this happen.

--Chris Nauroth

From: Lin Zhao <[email protected]<mailto:[email protected]>>
Date: Wednesday, April 6, 2016 at 7:48 PM
To: Chris Nauroth <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Is it possible to turn on data node encryption without kerberos?

Chris,

I followed the TestEncryptedTransfer test case and set 
dfs.block.access.token.enable to true and this issue is resolved. I can't find 
documentation this property is mandatory for encrypted data transfer. What does 
this property do?

From: Chris Nauroth <[email protected]<mailto:[email protected]>>
Date: Wednesday, April 6, 2016 at 4:02 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, Lin Zhao 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Is it possible to turn on data node encryption without kerberos?

It is possible to turn on data transfer protocol encryption without enabling 
Kerberos authentication.  We have a test suite in the Hadoop codebase named 
TestEncryptedTransfer that configures data transfer encryption, but not 
Kerberos, and those tests are passing.

The hadoop.rpc.protection setting is unrelated to data transfer protocol.  
Instead, it controls the SASL quality of protection for the RPC connections 
used by many Hadoop client/server interactions.  This won't really be active 
unless Kerberos authentication is enabled though.

Please note that even though it's possible to enable data transfer protocol 
encryption without using Kerberos authentication in the cluster, the benefit of 
that is questionable in a production deployment.  Without Kerberos 
authentication, it's very easy for an unauthenticated user to spoof another 
user and access their HDFS files.  Whether or not the data is encrypted in 
transit becomes irrelevant at that point.

--Chris Nauroth

From: Musty Rehmani 
<[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, April 6, 2016 at 2:54 PM
To: Lin Zhao <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Is it possible to turn on data node encryption without kerberos?


Kerberos is used to authenticate user or service principal to grant access to 
cluster. It doesn't encrypt data blocks coming in and out of cluster.
Sent from Yahoo Mail on 
Android<https://overview.mail.yahoo.com/mobile/?.src=Android>

On Wed, Apr 6, 2016 at 4:36 PM, Lin Zhao
<[email protected]<mailto:[email protected]>> wrote:
I've been trying to secure block data transferred by HDFS. I added below to 
hdfs-site.xml and core-site xml to the data node and name node and restart both.


<property>
  <name>dfs.encrypt.data.transfer</name>
  <value>true</value>
</property>

<property>
  <name>hadoop.rpc.protection</name>
  <value>privacy</value>
</property>


When I try to put a file from the hdfs command line shell, the operation fails 
with "connection is reset" and I see following from the datanode log:

"org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to read expected 
encryption handshake from client at /172.31.36.56:48271. Perhaps the client is 
running an older version of Hadoop which does not support encryption"


I am able to reproduce this on two different deployments. I was following 
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html#Authentication,
 but didn't turn on kerberos authentication. No authentication works in my 
environment. Can this be the reason the handshake fails?

Any help is appreciated.

Thanks,

Lin Zhao

Reply via email to