Re: Secure Hadoop - invalid Kerberos principal errors

2016-10-20 Thread Mark Selby
Thanks for the suggestion.

We are an Ubuntu shop which means that HOST expands to the shortname not the 
fqdn which is why we can not use the _HOST macro.

When I originally tried using the _HOST macro nothing worked because all of our 
kerberos principals use FQDN and then there was a legitimate mismatch.

This is still something that the more I look at seem to be missing something 
subbtle


From: Wei-Chiu Chuang <weic...@cloudera.com<mailto:weic...@cloudera.com>>
Date: Thursday, October 20, 2016 at 10:41 AM
To: Mark Selby <mse...@pandora.com<mailto:mse...@pandora.com>>
Cc: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" 
<user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Re: Secure Hadoop - invalid Kerberos principal errors

Instead of specifying host name of server principal,
have you tried to use hdfs/_h...@tnbsound.com<mailto:hdfs/_h...@tnbsound.com>?

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html#Kerberos_principals_for_Hadoop_Daemons<https://urldefense.proofpoint.com/v2/url?u=http-3A__hadoop.apache.org_docs_current_hadoop-2Dproject-2Ddist_hadoop-2Dcommon_SecureMode.html-23Kerberos-5Fprincipals-5Ffor-5FHadoop-5FDaemons=CwMF-g=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo07QsHw-JRepxyw=YlDlAfFJooj99XMmxd05I9hPTKiALIF7K7_bEzgX0c8=gWnlUPC6anzlO6NeSaX8xlwp189jSGTjcz71WnXB1rw=teNtbg2bYpdsjdUzYf5Lpb5oLLeyXndlR_PeqoecJ_Y=>

dfs.journalnode.kerberos.principal

hdfs/aw1hdnn001.tnbsound@tnbsound.com<mailto:hdfs/aw1hdnn001.tnbsound@tnbsound.com>

Wei-Chiu Chuang
A very happy Clouderan

On Oct 20, 2016, at 10:19 AM, Mark Selby 
<mse...@pandora.com<mailto:mse...@pandora.com>> wrote:

We have an existing CDH 5.5.1 cluster with simple authentication and no 
authorization. We are building out a new cluster and plan to move to CDH 5.8.2 
wiith Kerberos based authentication. We have an existing MIT Kerberos 
infrastructure which we sucessfully use for a variety of services. 
(ssh,apache,postfix)

I am very confident that out /etc/krb5.conf and name resolution is working. I 
have even used HadoopDNSVerifier-1.0.jar to verify that java sees the same name 
canonicalization that we see.

I have built and test cluster and closely followed the instructions on the 
secure hadoop install doc from the clodera site making sure that all the conf 
files are properly edited and all the Kerberos keytabs contain the correct 
principals and have the correct permissions.

We are using HA namenodes with Quorm based journalmanagers

I am running into a persistent problem with many hadoop compents when they need 
to talk securely to remote servers. The two example that I post here are the 
namenode needing to talk to remote journalnodes and command line hdfs client 
needing to speak to a remote namenode. Both give the same error

Server has invalid Kerberos principal: 
hdfs/aw1hdnn002.tnbsound@tnbsound.com<mailto:hdfs/aw1hdnn002.tnbsound@tnbsound.com>;
 Host Details : local host is: 
"aw1hdnn001.tnbsound.com/10.132.8.19<https://urldefense.proofpoint.com/v2/url?u=http-3A__aw1hdnn001.tnbsound.com_10.132.8.19=CwMF-g=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo07QsHw-JRepxyw=YlDlAfFJooj99XMmxd05I9hPTKiALIF7K7_bEzgX0c8=gWnlUPC6anzlO6NeSaX8xlwp189jSGTjcz71WnXB1rw=kVxVK3JFpM9ksSfvnePg0Y5BN0AzuxDAmsU7NVDOV20=>";
 destination host is: 
"aw1hdnn002.tnbsound.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__aw1hdnn002.tnbsound.com=CwMF-g=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo07QsHw-JRepxyw=YlDlAfFJooj99XMmxd05I9hPTKiALIF7K7_bEzgX0c8=gWnlUPC6anzlO6NeSaX8xlwp189jSGTjcz71WnXB1rw=Tdbs0Idas8PoQXGDzL-p14M8U_Wg7g6J_xE_qTa0ZOI=>":8020;

There is not much on the inter-webs about this and the error that is showing up 
is leading me to belive that the issue is aroung the kerberos realm being used 
in one place and not the other.

I just can not seem to figure out what is going on here as I know these are 
vaild principals. I have added a snippet at the end where I have enabled 
kerberos debugging to see if that helps at all

The weird part is that this error applies only to remote daemons. The local 
namenode and journal node does not have the issue. We can “speak” locally but 
not remotely.

All and Any help is greatly appreciated

#
# This is me with hdfs kerberos credentials trying to run hdfs dfsadmin 
-refreshServiceAcl
#

hdfs@aw1hdnn001 /var/lib/hadoop-hdfs 53$ klist
Ticket cache: FILE:/tmp/krb5cc_115
Default principal: 
hdfs/aw1hdnn001.tnbsound@tnbsound.com<mailto:hdfs/aw1hdnn001.tnbsound@tnbsound.com>
Valid starting Expires Service principal
10/20/2016 15:34:49 10/21/2016 15:34:49 
krbtgt/tnbsound@tnbsound.com<mailto:krbtgt/tnbsound@tnbsound.com>
renew until 10/27/2016 15:34:49

hdfs@aw1hdnn001 /var/lib/hadoop-hdfs 54$ hdfs dfsadmin -refreshServiceAcl
Refresh service acl successful for 
aw1hdnn001.tnbsound.com/10.132.8.19:8020<https://ur

Secure Hadoop - invalid Kerberos principal errors

2016-10-20 Thread Mark Selby
We have an existing CDH 5.5.1 cluster with simple authentication and no 
authorization. We are building out a new cluster and plan to move to CDH 5.8.2 
wiith Kerberos based authentication. We have an existing MIT Kerberos 
infrastructure which we sucessfully use for a variety of services. 
(ssh,apache,postfix)

I am very confident that out /etc/krb5.conf and name resolution is working. I 
have even used HadoopDNSVerifier-1.0.jar to verify that java sees the same name 
canonicalization that we see.

I have built and test cluster and closely followed the instructions on the 
secure hadoop install doc from the clodera site making sure that all the conf 
files are properly edited and all the Kerberos keytabs contain the correct 
principals and have the correct permissions.

We are using HA namenodes with Quorm based journalmanagers

I am running into a persistent problem with many hadoop compents when they need 
to talk securely to remote servers. The two example that I post here are the 
namenode needing to talk to remote journalnodes and command line hdfs client 
needing to speak to a remote namenode. Both give the same error

Server has invalid Kerberos principal: 
hdfs/aw1hdnn002.tnbsound@tnbsound.com; Host Details : local host is: 
"aw1hdnn001.tnbsound.com/10.132.8.19"; destination host is: 
"aw1hdnn002.tnbsound.com":8020;

There is not much on the inter-webs about this and the error that is showing up 
is leading me to belive that the issue is aroung the kerberos realm being used 
in one place and not the other.

I just can not seem to figure out what is going on here as I know these are 
vaild principals. I have added a snippet at the end where I have enabled 
kerberos debugging to see if that helps at all

The weird part is that this error applies only to remote daemons. The local 
namenode and journal node does not have the issue. We can “speak” locally but 
not remotely.

All and Any help is greatly appreciated

#
# This is me with hdfs kerberos credentials trying to run hdfs dfsadmin 
-refreshServiceAcl
#

hdfs@aw1hdnn001 /var/lib/hadoop-hdfs 53$ klist
Ticket cache: FILE:/tmp/krb5cc_115
Default principal: hdfs/aw1hdnn001.tnbsound@tnbsound.com
Valid starting Expires Service principal
10/20/2016 15:34:49 10/21/2016 15:34:49 krbtgt/tnbsound@tnbsound.com
renew until 10/27/2016 15:34:49

hdfs@aw1hdnn001 /var/lib/hadoop-hdfs 54$ hdfs dfsadmin -refreshServiceAcl
Refresh service acl successful for aw1hdnn001.tnbsound.com/10.132.8.19:8020
refreshServiceAcl: Failed on local exception: java.io.IOException: 
java.lang.IllegalArgumentException: Server has invalid Kerberos principal: 
hdfs/aw1hdnn002.tnbsound@tnbsound.com; Host Details : local host is: 
"aw1hdnn001.tnbsound.com/10.132.8.19"; destination host is: 
"aw1hdnn002.tnbsound.com":8020;

#
# This is the namenode trying to start up and contant and off server jornalnode
#
2016-10-20 16:51:40,703 WARN org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hdfs/aw1hdnn001.tnbsound@tnbsound.com 
(auth:KERBEROS) cause:java.io.IOException: java.lang.IllegalArgumentException: 
Server has invalid Kerberos principal: hdfs/aw1hdrm001.tnbsound@tnbsound.com
10.132.8.21:8485: Failed on local exception: java.io.IOException: 
java.lang.IllegalArgumentException: Server has invalid Kerberos principal: 
hdfs/aw1hdrm001.tnbsound@tnbsound.com; Host Details : local host is: 
"aw1hdnn001.tnbsound.com/10.132.8.19"; destination host is: 
"aw1hdrm001.tnbsound.com":8485;

#
# This is me with hdfs kerberos credentials trying to run hdfs dfsadmin 
-refreshServiceAcl with debug into
#
hdfs@aw1hdnn001 /var/lib/hadoop-hdfs 46$ 
HADOOP_OPTS="-Dsun.security.krb5.debug=true" hdfs dfsadmin -refreshServiceAcl
Java config name: null
Native config name: /etc/krb5.conf
Loaded from native config
>>>KinitOptions cache name is /tmp/krb5cc_115
>>>DEBUG  client principal is 
>>>hdfs/aw1hdnn001.tnbsound@tnbsound.com
>>>DEBUG  server principal is 
>>>krbtgt/tnbsound@tnbsound.com
>>>DEBUG  key type: 18
>>>DEBUG  auth time: Thu Oct 20 16:55:42 UTC 2016
>>>DEBUG  start time: Thu Oct 20 16:55:42 UTC 2016
>>>DEBUG  end time: Fri Oct 21 16:55:42 UTC 2016
>>>DEBUG  renew_till time: Thu Oct 27 16:55:42 UTC 2016
>>> CCacheInputStream: readFlags() FORWARDABLE; PROXIABLE; RENEWABLE; INITIAL; 
>>> PRE_AUTH;
>>>DEBUG  client principal is 
>>>hdfs/aw1hdnn001.tnbsound@tnbsound.com
>>>DEBUG  server principal is 
>>>X-CACHECONF:/krb5_ccache_conf_data/fast_avail/krbtgt/tnbsound@tnbsound.com
>>>DEBUG  key type: 0
>>>DEBUG  auth time: Thu Jan 01 00:00:00 UTC 1970
>>>DEBUG  start time: null
>>>DEBUG  end time: Thu Jan 01 00:00:00 UTC 1970
>>>DEBUG  renew_till time: null
>>> CCacheInputStream: readFlags()
>>>DEBUG  client principal is 
>>>hdfs/aw1hdnn001.tnbsound@tnbsound.com
>>>DEBUG  server principal is 
>>>X-CACHECONF:/krb5_ccache_conf_data/pa_type/krbtgt/tnbsound@tnbsound.com
>>>DEBUG  key type: 0
>>>DEBUG  auth time: Thu Jan 01 00:00:00 UTC 

hadoop distcp and hbase ExportSnapshot hdfs replication factor question.

2016-02-24 Thread Mark Selby
I have a primary Hadoop cluster (2.6.0) running Mapreduce and HBase. I 
am backing up to a remote data center that has many fewer machines with 
a higher per disk density.


The default HDFS replication factor on the primary is 3.
The default HDFS replication factor on the primary is 2.

When I run distcp on the primary cluster specifying the remote are the 
source, and I DO NOT specify preserve replication factor as an argument, 
I still get 3 replicas on the remote.


All my HBase snapshots that are copied from the primary to the backup 
also end up with h-files that have a replication factor of 3.


As a test I ran distcp from the backup pulling from the primary and this 
did result in a replication factor of 2. I have many fewer resources on 
the backup and think that it would be faster to perform the large copy 
with a larger number of machines.


As well I can not pull HBase snapshots from the backup cluster. The 
ExportSnapshot utility does not support this.


Does anyone know if it is possible to distcp to another cluster that has 
a smaller replication factor and have that take effect.


Thanks!

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org