Re: Secure Hadoop - invalid Kerberos principal errors
Thanks for the suggestion. We are an Ubuntu shop which means that HOST expands to the shortname not the fqdn which is why we can not use the _HOST macro. When I originally tried using the _HOST macro nothing worked because all of our kerberos principals use FQDN and then there was a legitimate mismatch. This is still something that the more I look at seem to be missing something subbtle From: Wei-Chiu Chuang <weic...@cloudera.com<mailto:weic...@cloudera.com>> Date: Thursday, October 20, 2016 at 10:41 AM To: Mark Selby <mse...@pandora.com<mailto:mse...@pandora.com>> Cc: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>> Subject: Re: Secure Hadoop - invalid Kerberos principal errors Instead of specifying host name of server principal, have you tried to use hdfs/_h...@tnbsound.com<mailto:hdfs/_h...@tnbsound.com>? http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html#Kerberos_principals_for_Hadoop_Daemons<https://urldefense.proofpoint.com/v2/url?u=http-3A__hadoop.apache.org_docs_current_hadoop-2Dproject-2Ddist_hadoop-2Dcommon_SecureMode.html-23Kerberos-5Fprincipals-5Ffor-5FHadoop-5FDaemons=CwMF-g=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo07QsHw-JRepxyw=YlDlAfFJooj99XMmxd05I9hPTKiALIF7K7_bEzgX0c8=gWnlUPC6anzlO6NeSaX8xlwp189jSGTjcz71WnXB1rw=teNtbg2bYpdsjdUzYf5Lpb5oLLeyXndlR_PeqoecJ_Y=> dfs.journalnode.kerberos.principal hdfs/aw1hdnn001.tnbsound@tnbsound.com<mailto:hdfs/aw1hdnn001.tnbsound@tnbsound.com> Wei-Chiu Chuang A very happy Clouderan On Oct 20, 2016, at 10:19 AM, Mark Selby <mse...@pandora.com<mailto:mse...@pandora.com>> wrote: We have an existing CDH 5.5.1 cluster with simple authentication and no authorization. We are building out a new cluster and plan to move to CDH 5.8.2 wiith Kerberos based authentication. We have an existing MIT Kerberos infrastructure which we sucessfully use for a variety of services. (ssh,apache,postfix) I am very confident that out /etc/krb5.conf and name resolution is working. I have even used HadoopDNSVerifier-1.0.jar to verify that java sees the same name canonicalization that we see. I have built and test cluster and closely followed the instructions on the secure hadoop install doc from the clodera site making sure that all the conf files are properly edited and all the Kerberos keytabs contain the correct principals and have the correct permissions. We are using HA namenodes with Quorm based journalmanagers I am running into a persistent problem with many hadoop compents when they need to talk securely to remote servers. The two example that I post here are the namenode needing to talk to remote journalnodes and command line hdfs client needing to speak to a remote namenode. Both give the same error Server has invalid Kerberos principal: hdfs/aw1hdnn002.tnbsound@tnbsound.com<mailto:hdfs/aw1hdnn002.tnbsound@tnbsound.com>; Host Details : local host is: "aw1hdnn001.tnbsound.com/10.132.8.19<https://urldefense.proofpoint.com/v2/url?u=http-3A__aw1hdnn001.tnbsound.com_10.132.8.19=CwMF-g=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo07QsHw-JRepxyw=YlDlAfFJooj99XMmxd05I9hPTKiALIF7K7_bEzgX0c8=gWnlUPC6anzlO6NeSaX8xlwp189jSGTjcz71WnXB1rw=kVxVK3JFpM9ksSfvnePg0Y5BN0AzuxDAmsU7NVDOV20=>"; destination host is: "aw1hdnn002.tnbsound.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__aw1hdnn002.tnbsound.com=CwMF-g=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo07QsHw-JRepxyw=YlDlAfFJooj99XMmxd05I9hPTKiALIF7K7_bEzgX0c8=gWnlUPC6anzlO6NeSaX8xlwp189jSGTjcz71WnXB1rw=Tdbs0Idas8PoQXGDzL-p14M8U_Wg7g6J_xE_qTa0ZOI=>":8020; There is not much on the inter-webs about this and the error that is showing up is leading me to belive that the issue is aroung the kerberos realm being used in one place and not the other. I just can not seem to figure out what is going on here as I know these are vaild principals. I have added a snippet at the end where I have enabled kerberos debugging to see if that helps at all The weird part is that this error applies only to remote daemons. The local namenode and journal node does not have the issue. We can “speak” locally but not remotely. All and Any help is greatly appreciated # # This is me with hdfs kerberos credentials trying to run hdfs dfsadmin -refreshServiceAcl # hdfs@aw1hdnn001 /var/lib/hadoop-hdfs 53$ klist Ticket cache: FILE:/tmp/krb5cc_115 Default principal: hdfs/aw1hdnn001.tnbsound@tnbsound.com<mailto:hdfs/aw1hdnn001.tnbsound@tnbsound.com> Valid starting Expires Service principal 10/20/2016 15:34:49 10/21/2016 15:34:49 krbtgt/tnbsound@tnbsound.com<mailto:krbtgt/tnbsound@tnbsound.com> renew until 10/27/2016 15:34:49 hdfs@aw1hdnn001 /var/lib/hadoop-hdfs 54$ hdfs dfsadmin -refreshServiceAcl Refresh service acl successful for aw1hdnn001.tnbsound.com/10.132.8.19:8020<https://ur
Secure Hadoop - invalid Kerberos principal errors
We have an existing CDH 5.5.1 cluster with simple authentication and no authorization. We are building out a new cluster and plan to move to CDH 5.8.2 wiith Kerberos based authentication. We have an existing MIT Kerberos infrastructure which we sucessfully use for a variety of services. (ssh,apache,postfix) I am very confident that out /etc/krb5.conf and name resolution is working. I have even used HadoopDNSVerifier-1.0.jar to verify that java sees the same name canonicalization that we see. I have built and test cluster and closely followed the instructions on the secure hadoop install doc from the clodera site making sure that all the conf files are properly edited and all the Kerberos keytabs contain the correct principals and have the correct permissions. We are using HA namenodes with Quorm based journalmanagers I am running into a persistent problem with many hadoop compents when they need to talk securely to remote servers. The two example that I post here are the namenode needing to talk to remote journalnodes and command line hdfs client needing to speak to a remote namenode. Both give the same error Server has invalid Kerberos principal: hdfs/aw1hdnn002.tnbsound@tnbsound.com; Host Details : local host is: "aw1hdnn001.tnbsound.com/10.132.8.19"; destination host is: "aw1hdnn002.tnbsound.com":8020; There is not much on the inter-webs about this and the error that is showing up is leading me to belive that the issue is aroung the kerberos realm being used in one place and not the other. I just can not seem to figure out what is going on here as I know these are vaild principals. I have added a snippet at the end where I have enabled kerberos debugging to see if that helps at all The weird part is that this error applies only to remote daemons. The local namenode and journal node does not have the issue. We can “speak” locally but not remotely. All and Any help is greatly appreciated # # This is me with hdfs kerberos credentials trying to run hdfs dfsadmin -refreshServiceAcl # hdfs@aw1hdnn001 /var/lib/hadoop-hdfs 53$ klist Ticket cache: FILE:/tmp/krb5cc_115 Default principal: hdfs/aw1hdnn001.tnbsound@tnbsound.com Valid starting Expires Service principal 10/20/2016 15:34:49 10/21/2016 15:34:49 krbtgt/tnbsound@tnbsound.com renew until 10/27/2016 15:34:49 hdfs@aw1hdnn001 /var/lib/hadoop-hdfs 54$ hdfs dfsadmin -refreshServiceAcl Refresh service acl successful for aw1hdnn001.tnbsound.com/10.132.8.19:8020 refreshServiceAcl: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: hdfs/aw1hdnn002.tnbsound@tnbsound.com; Host Details : local host is: "aw1hdnn001.tnbsound.com/10.132.8.19"; destination host is: "aw1hdnn002.tnbsound.com":8020; # # This is the namenode trying to start up and contant and off server jornalnode # 2016-10-20 16:51:40,703 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs/aw1hdnn001.tnbsound@tnbsound.com (auth:KERBEROS) cause:java.io.IOException: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: hdfs/aw1hdrm001.tnbsound@tnbsound.com 10.132.8.21:8485: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: hdfs/aw1hdrm001.tnbsound@tnbsound.com; Host Details : local host is: "aw1hdnn001.tnbsound.com/10.132.8.19"; destination host is: "aw1hdrm001.tnbsound.com":8485; # # This is me with hdfs kerberos credentials trying to run hdfs dfsadmin -refreshServiceAcl with debug into # hdfs@aw1hdnn001 /var/lib/hadoop-hdfs 46$ HADOOP_OPTS="-Dsun.security.krb5.debug=true" hdfs dfsadmin -refreshServiceAcl Java config name: null Native config name: /etc/krb5.conf Loaded from native config >>>KinitOptions cache name is /tmp/krb5cc_115 >>>DEBUG client principal is >>>hdfs/aw1hdnn001.tnbsound@tnbsound.com >>>DEBUG server principal is >>>krbtgt/tnbsound@tnbsound.com >>>DEBUG key type: 18 >>>DEBUG auth time: Thu Oct 20 16:55:42 UTC 2016 >>>DEBUG start time: Thu Oct 20 16:55:42 UTC 2016 >>>DEBUG end time: Fri Oct 21 16:55:42 UTC 2016 >>>DEBUG renew_till time: Thu Oct 27 16:55:42 UTC 2016 >>> CCacheInputStream: readFlags() FORWARDABLE; PROXIABLE; RENEWABLE; INITIAL; >>> PRE_AUTH; >>>DEBUG client principal is >>>hdfs/aw1hdnn001.tnbsound@tnbsound.com >>>DEBUG server principal is >>>X-CACHECONF:/krb5_ccache_conf_data/fast_avail/krbtgt/tnbsound@tnbsound.com >>>DEBUG key type: 0 >>>DEBUG auth time: Thu Jan 01 00:00:00 UTC 1970 >>>DEBUG start time: null >>>DEBUG end time: Thu Jan 01 00:00:00 UTC 1970 >>>DEBUG renew_till time: null >>> CCacheInputStream: readFlags() >>>DEBUG client principal is >>>hdfs/aw1hdnn001.tnbsound@tnbsound.com >>>DEBUG server principal is >>>X-CACHECONF:/krb5_ccache_conf_data/pa_type/krbtgt/tnbsound@tnbsound.com >>>DEBUG key type: 0 >>>DEBUG auth time: Thu Jan 01 00:00:00 UTC
hadoop distcp and hbase ExportSnapshot hdfs replication factor question.
I have a primary Hadoop cluster (2.6.0) running Mapreduce and HBase. I am backing up to a remote data center that has many fewer machines with a higher per disk density. The default HDFS replication factor on the primary is 3. The default HDFS replication factor on the primary is 2. When I run distcp on the primary cluster specifying the remote are the source, and I DO NOT specify preserve replication factor as an argument, I still get 3 replicas on the remote. All my HBase snapshots that are copied from the primary to the backup also end up with h-files that have a replication factor of 3. As a test I ran distcp from the backup pulling from the primary and this did result in a replication factor of 2. I have many fewer resources on the backup and think that it would be faster to perform the large copy with a larger number of machines. As well I can not pull HBase snapshots from the backup cluster. The ExportSnapshot utility does not support this. Does anyone know if it is possible to distcp to another cluster that has a smaller replication factor and have that take effect. Thanks! - To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org