Re: [Freeipa-users] any tips or horror stories about automating dynamic enrollment and removal of IPA clients?
Hah! I've been deep into SGE (user, trainer, consultant) for years. Our setups are pretty similar but I'm hoping to use the AWS cfnCluster stack (https://github.com/awslabs/cfncluster) because it is officially blessed by AWS and since it's a cloudformation template at the end of the day it's both easy to support and extend. It also does all the hard work (auto-scaling etc.) that I don't want to have to code myself via ansible. Since I'm a consultant I need to hand off something to my users that is easy for them to operate moving forward without me. Your experience using IPA in an HPC environment is very helpful. We also use ansible to automate "ipa-client-install --unattended ..." so scripting the install and remove commands should be pretty straightforward. Just trying to my compare my appreciation for IPA vs what I saw on the ground at massive HPC installations where the operators jumped through hoops to remove network services that could break user info or affect stablity. I lost count of how many sites I saw people dumping NIS maps and LDAP directories into plaintext files every 4-6hours that they'd spread across the cluster simply to remove any chance that a failed NIS/LDAP query could mess up a node, user or job. Thanks! Chris Gerald-Markus ZabosApril 13, 2017 at 9:21 AM Hi Chris,we're facing a similar use case from day to day, but changed from AWS toanother cloud provider. Our use case works on both, so i am refering toAWS.We decided..to use SGE for our HPC infrastructure...recycle network ranges for 100 static IP addresses + 100 statichostnames...to use scripts & cronjobs & ansible (depending on "qstat" and "qhost"output) on the cluster head node to determine how many additionalcluster nodes have to be created as an additional reserve for"What-if-we-need-more-nodes?" scenarios...to create cluster nodes via ansible-playbook on AWS from apre-defined image, do software installation & configuration viaansible-playbook, do the IPA domain join via ansible-playbook("ipa-client-install --domain= --mkhomedir--hostname=. --ip-address=address> -p -w --unattended")...to destroy cluster nodes in two steps: 1) ansible-playbook"ipa-client-install --uninstall", 2) ansible-playbook destroy clusternode on AWS via API(Right now, i am working on a bulk creation script of IPA users/groupsfor expanding our single HPC cluster into several ones, whereas we havethe same set of users (~65-100) with differing suffix in the usernamee.g. "it_ops01", "it_ops20", etc...)We're using 2x IPA-Servers (ESXi VMs, 4GB RAM, 2 CPU) in replicationwith another 2x IPA Servers (same dimensions) on our main physicaldatacenter. Didn't see much impact on the IPA servers duringenrollment/removal of domain hosts. So far after three months ofoperations, we had several "bad box" scenarios, all of them because ofproblems with SGE. We solved these problems manually, by removing/addingcluster nodes via SGE commands. As you can see, i tend to [Option 1], since it does all the magic withpre-defined software commands(sge, ansible, ipa cli), instead of jumpingaround with additional scripts doing work, which can be done by"built-in" commands. For us, this works best.Regards,Gerald -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
Re: [Freeipa-users] any tips or horror stories about automating dynamic enrollment and removal of IPA clients?
On Thu, 2017-04-13 at 17:16 +0300, Alexander Bokovoy wrote: > On to, 13 huhti 2017, Simo Sorce wrote: > >On Thu, 2017-04-13 at 08:05 -0400, Chris Dagdigian wrote: > >> Hi folks, > >> > >> I've got a high performance computing (HPC) use case that will need AD > >> integration for user identity management. We've got a working IPA server > >> in AWS that has 1-way trusts going to several remote AD forests and > >> child domains. Works fine but so far all of the enrolled clients are > >> largely static/persistent boxes. > >> > >> The issue is that the HPC cluster footprint is going to be elastic by > >> design. We'll likely keep 3-5 nodes in the grid online 24x7 but the vast > >> majority of the compute node fleet (hundreds of nodes quite likely) will > >> be fired up on demand as a mixture of spot, RI and hourly-rate EC2 > >> instances. The cluster will automatically shrink in size as well when > >> needed. > >> > >> Trying to think of which method I should use for managing users (mainly > >> UID and GID values) on the compute fleet: > >> > >> [Option 1] Script the enrollment and de-install actions via existing > >> hooks we have for running scripts at "first boot" as well as > >> "pre-termination". I think this seems technically pretty > >> straightforward but I'm not sure I really need to stuff our IPA server > >> with host information for boxes that are considered anonymous and > >> disposable. We don't care about them really and don't need to implement > >> RBAC controls on them. Also slightly worried that a large-scale > >> enrollment or uninstall action may bog down the server or (worse) > >> perhaps only partially complete leading to an HPC grid where jobs flow > >> into a bad box and die en-mass because "user does not exist..." > >> > >> [Option 2] Steal from the HPC ops playbook and minimize network > >> services that can cause failures. Distribute static files to the worker > >> fleet -- Bind the 24x7 persistent systems to the IPA server and force > >> all HPC users to provide a public SSH key. Then use commands like "id > >> >> that we can manufacture static /etc/passwd, /etc/shadow and /etc/group > >> files that can be pushed out to the compute node fleet. The main win > >> here is that we can maintain consistent IPA-derived > >> UID/GID/username/group data cluster wide while totally removing the need > >> for an elastic set of anonymous boxes to be individually enrolled and > >> removed from IPA all the time. > >> > >> Right now I'm leaning towards Option #2 but would love to hear > >> experiences regarding moderate-scale automatic enrollment and removal of > >> clients! > > > >One option could also be to keep a (set of) keytab(s) you can copy on > >the elastic hosts and preconfigure their sssd daemon. At boot you copy > >the keytab in the host and start sssd and everything should magically > >work. They all are basically the same identity so using the same key for > >all of them may be acceptable. > It would be better to avoid using Kerberos authentication here at all. > > Multiple hosts authenticating with the same key would cause a lot of > updates in the LDAP entry representing this principal. This is going to > break replication if this is the only key that is used by multiple hosts > against multiple IPA masters. If replication is a issue we should probably mask those attributes from replication as well, just like we do for attributes for failed auth. Simo. -- Simo Sorce Sr. Principal Software Engineer Red Hat, Inc -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
Re: [Freeipa-users] any tips or horror stories about automating dynamic enrollment and removal of IPA clients?
On to, 13 huhti 2017, Simo Sorce wrote: On Thu, 2017-04-13 at 08:05 -0400, Chris Dagdigian wrote: Hi folks, I've got a high performance computing (HPC) use case that will need AD integration for user identity management. We've got a working IPA server in AWS that has 1-way trusts going to several remote AD forests and child domains. Works fine but so far all of the enrolled clients are largely static/persistent boxes. The issue is that the HPC cluster footprint is going to be elastic by design. We'll likely keep 3-5 nodes in the grid online 24x7 but the vast majority of the compute node fleet (hundreds of nodes quite likely) will be fired up on demand as a mixture of spot, RI and hourly-rate EC2 instances. The cluster will automatically shrink in size as well when needed. Trying to think of which method I should use for managing users (mainly UID and GID values) on the compute fleet: [Option 1] Script the enrollment and de-install actions via existing hooks we have for running scripts at "first boot" as well as "pre-termination". I think this seems technically pretty straightforward but I'm not sure I really need to stuff our IPA server with host information for boxes that are considered anonymous and disposable. We don't care about them really and don't need to implement RBAC controls on them. Also slightly worried that a large-scale enrollment or uninstall action may bog down the server or (worse) perhaps only partially complete leading to an HPC grid where jobs flow into a bad box and die en-mass because "user does not exist..." [Option 2] Steal from the HPC ops playbook and minimize network services that can cause failures. Distribute static files to the worker fleet -- Bind the 24x7 persistent systems to the IPA server and force all HPC users to provide a public SSH key. Then use commands like "id One option could also be to keep a (set of) keytab(s) you can copy on the elastic hosts and preconfigure their sssd daemon. At boot you copy the keytab in the host and start sssd and everything should magically work. They all are basically the same identity so using the same key for all of them may be acceptable. It would be better to avoid using Kerberos authentication here at all. Multiple hosts authenticating with the same key would cause a lot of updates in the LDAP entry representing this principal. This is going to break replication if this is the only key that is used by multiple hosts against multiple IPA masters. -- / Alexander Bokovoy -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
Re: [Freeipa-users] any tips or horror stories about automating dynamic enrollment and removal of IPA clients?
On Thu, 2017-04-13 at 08:05 -0400, Chris Dagdigian wrote: > Hi folks, > > I've got a high performance computing (HPC) use case that will need AD > integration for user identity management. We've got a working IPA server > in AWS that has 1-way trusts going to several remote AD forests and > child domains. Works fine but so far all of the enrolled clients are > largely static/persistent boxes. > > The issue is that the HPC cluster footprint is going to be elastic by > design. We'll likely keep 3-5 nodes in the grid online 24x7 but the vast > majority of the compute node fleet (hundreds of nodes quite likely) will > be fired up on demand as a mixture of spot, RI and hourly-rate EC2 > instances. The cluster will automatically shrink in size as well when > needed. > > Trying to think of which method I should use for managing users (mainly > UID and GID values) on the compute fleet: > > [Option 1] Script the enrollment and de-install actions via existing > hooks we have for running scripts at "first boot" as well as > "pre-termination". I think this seems technically pretty > straightforward but I'm not sure I really need to stuff our IPA server > with host information for boxes that are considered anonymous and > disposable. We don't care about them really and don't need to implement > RBAC controls on them. Also slightly worried that a large-scale > enrollment or uninstall action may bog down the server or (worse) > perhaps only partially complete leading to an HPC grid where jobs flow > into a bad box and die en-mass because "user does not exist..." > > [Option 2] Steal from the HPC ops playbook and minimize network > services that can cause failures. Distribute static files to the worker > fleet -- Bind the 24x7 persistent systems to the IPA server and force > all HPC users to provide a public SSH key. Then use commands like "id > that we can manufacture static /etc/passwd, /etc/shadow and /etc/group > files that can be pushed out to the compute node fleet. The main win > here is that we can maintain consistent IPA-derived > UID/GID/username/group data cluster wide while totally removing the need > for an elastic set of anonymous boxes to be individually enrolled and > removed from IPA all the time. > > Right now I'm leaning towards Option #2 but would love to hear > experiences regarding moderate-scale automatic enrollment and removal of > clients! One option could also be to keep a (set of) keytab(s) you can copy on the elastic hosts and preconfigure their sssd daemon. At boot you copy the keytab in the host and start sssd and everything should magically work. They all are basically the same identity so using the same key for all of them may be acceptable. >From the IPA side it will look like suddenly the same host has multiple IP addresses and is opening one connection from each of them, but that is ok. Simo. -- Simo Sorce Sr. Principal Software Engineer Red Hat, Inc -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
Re: [Freeipa-users] any tips or horror stories about automating dynamic enrollment and removal of IPA clients?
Am Donnerstag, den 13.04.2017, 08:05 -0400 schrieb Chris Dagdigian: > Right now I'm leaning towards Option #2 but would love to hear > experiences regarding moderate-scale automatic enrollment and removal of > clients! > > -Chris Hi Chris, we're facing a similar use case from day to day, but changed from AWS to another cloud provider. Our use case works on both, so i am refering to AWS. We decided... ...to use SGE for our HPC infrastructure ...recycle network ranges for 100 static IP addresses + 100 static hostnames ...to use scripts & cronjobs & ansible (depending on "qstat" and "qhost" output) on the cluster head node to determine how many additional cluster nodes have to be created as an additional reserve for "What-if-we-need-more-nodes?" scenarios ...to create cluster nodes via ansible-playbook on AWS from a pre-defined image, do software installation & configuration via ansible-playbook, do the IPA domain join via ansible-playbook ("ipa-client-install --domain= --mkhomedir --hostname=. --ip-address= -p -w --unattended") ...to destroy cluster nodes in two steps: 1) ansible-playbook "ipa-client-install --uninstall", 2) ansible-playbook destroy cluster node on AWS via API (Right now, i am working on a bulk creation script of IPA users/groups for expanding our single HPC cluster into several ones, whereas we have the same set of users (~65-100) with differing suffix in the username e.g. "it_ops01", "it_ops20", etc...) We're using 2x IPA-Servers (ESXi VMs, 4GB RAM, 2 CPU) in replication with another 2x IPA Servers (same dimensions) on our main physical datacenter. Didn't see much impact on the IPA servers during enrollment/removal of domain hosts. So far after three months of operations, we had several "bad box" scenarios, all of them because of problems with SGE. We solved these problems manually, by removing/adding cluster nodes via SGE commands. As you can see, i tend to [Option 1], since it does all the magic with pre-defined software commands(sge, ansible, ipa cli), instead of jumping around with additional scripts doing work, which can be done by "built-in" commands. For us, this works best. Regards, Gerald -- Gerald-Markus Zabos Web: http://www.gmzgames.de -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project