Re: [Freeipa-users] any tips or horror stories about automating dynamic enrollment and removal of IPA clients?

2017-04-13 Thread Chris Dagdigian
Hah! I've been deep into 
SGE (user, trainer, consultant) for years. 

Our setups are pretty similar but I'm hoping to use the AWS cfnCluster 
stack (https://github.com/awslabs/cfncluster) because it is officially 
blessed by AWS and since it's a cloudformation template at the end of 
the day it's both easy to support and extend. It also does all the hard 
work (auto-scaling etc.) that I don't want to have to code myself via 
ansible. Since I'm a consultant I need to hand off something to my users
 that is easy for them to operate moving forward without me. 

Your experience using IPA in an HPC environment is very helpful. We also
 use ansible to automate "ipa-client-install --unattended ..." so 
scripting the install and remove commands should be pretty 
straightforward. 
  
Just trying to my compare my  appreciation for IPA vs what I saw on the 
ground at massive HPC installations where the operators jumped through 
hoops to remove network services that could break user info or affect 
stablity. I lost count of how many sites I saw people dumping NIS maps 
and LDAP directories into plaintext files every 4-6hours that they'd 
spread across the cluster simply to remove any chance that a failed 
NIS/LDAP query could mess up a node, user or job. 
  
Thanks!
  
Chris
  


 
   	Gerald-Markus ZabosApril
 13, 2017 at 9:21 AM
  Hi Chris,we're
 facing a similar use case from day to day, but changed from AWS toanother
 cloud provider. Our use case works on both, so i am refering toAWS.We
 decided..to use SGE for our HPC infrastructure...recycle
 network ranges for 100 static IP addresses + 100 statichostnames...to
 use scripts & cronjobs & ansible (depending on "qstat" and 
"qhost"output) on the cluster head node to determine how many 
additionalcluster nodes have to be created as an additional reserve 
for"What-if-we-need-more-nodes?" scenarios...to create cluster 
nodes via ansible-playbook on AWS from apre-defined image, do 
software installation & configuration viaansible-playbook, do 
the IPA domain join via ansible-playbook("ipa-client-install 
--domain= --mkhomedir--hostname=.
 --ip-address=address> -p  
-w  --unattended")...to destroy cluster 
nodes in two steps: 1) ansible-playbook"ipa-client-install 
--uninstall", 2) ansible-playbook destroy clusternode on AWS via API(Right
 now, i am working on a bulk creation script of IPA users/groupsfor 
expanding our single HPC cluster into several ones, whereas we havethe
 same set of users (~65-100) with differing suffix in the usernamee.g.
 "it_ops01", "it_ops20", etc...)We're using 2x IPA-Servers (ESXi
 VMs, 4GB RAM, 2 CPU) in replicationwith another 2x IPA Servers 
(same dimensions) on our main physicaldatacenter. Didn't see much 
impact on the IPA servers duringenrollment/removal of domain hosts. 
So far after three months ofoperations, we had several "bad box" 
scenarios, all of them because ofproblems with SGE. We solved these 
problems manually, by removing/addingcluster nodes via SGE commands.
 As you can see, i tend to [Option 1], since it does all the 
magic withpre-defined software commands(sge, ansible, ipa cli), 
instead of jumpingaround with additional scripts doing work, which 
can be done by"built-in" commands. For us, this works best.Regards,Gerald




-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Re: [Freeipa-users] any tips or horror stories about automating dynamic enrollment and removal of IPA clients?

2017-04-13 Thread Simo Sorce
On Thu, 2017-04-13 at 17:16 +0300, Alexander Bokovoy wrote:
> On to, 13 huhti 2017, Simo Sorce wrote:
> >On Thu, 2017-04-13 at 08:05 -0400, Chris Dagdigian wrote:
> >> Hi folks,
> >>
> >> I've got a high performance computing (HPC) use case that will need AD
> >> integration for user identity management. We've got a working IPA server
> >> in AWS that has 1-way trusts going to several remote AD forests and
> >> child domains. Works fine but so far all of the enrolled clients are
> >> largely static/persistent boxes.
> >>
> >> The issue is that the HPC cluster footprint is going to be elastic by
> >> design. We'll likely keep 3-5 nodes in the grid online 24x7 but the vast
> >> majority of the compute node fleet (hundreds of nodes quite likely) will
> >> be fired up on demand as a mixture of spot, RI and hourly-rate EC2
> >> instances. The cluster will automatically shrink in size as well when
> >> needed.
> >>
> >> Trying to think of which method I should use for managing users  (mainly
> >> UID and GID values) on the compute fleet:
> >>
> >> [Option 1]  Script the enrollment and de-install actions via existing
> >> hooks we have for running scripts at "first boot" as well as
> >> "pre-termination".  I think this seems technically pretty
> >> straightforward but I'm not sure I really need to stuff our IPA server
> >> with host information for boxes that are considered anonymous and
> >> disposable. We don't care about them really and don't need to implement
> >> RBAC controls on them. Also slightly worried that a large-scale
> >> enrollment or uninstall action may bog down the server or (worse)
> >> perhaps only partially complete leading to an HPC grid where jobs flow
> >> into a bad box and die en-mass because "user does not exist..."
> >>
> >> [Option 2]  Steal from the HPC ops playbook and minimize network
> >> services that can cause failures. Distribute static files to the worker
> >> fleet --  Bind the 24x7 persistent systems to the IPA server and force
> >> all HPC users to provide a public SSH key. Then use commands like "id
> >>  >> that we can manufacture static /etc/passwd, /etc/shadow and /etc/group
> >> files that can be pushed out to the compute node fleet. The main win
> >> here is that we can maintain consistent IPA-derived
> >> UID/GID/username/group data cluster wide while totally removing the need
> >> for an elastic set of anonymous boxes to be individually enrolled and
> >> removed from IPA all the time.
> >>
> >> Right now I'm leaning towards Option #2 but would love to hear
> >> experiences regarding moderate-scale automatic enrollment and removal of
> >> clients!
> >
> >One option could also be to keep a (set of) keytab(s) you can copy on
> >the elastic hosts and preconfigure their sssd daemon. At boot you copy
> >the keytab in the host and start sssd and everything should magically
> >work. They all are basically the same identity so using the same key for
> >all of them may be acceptable.
> It would be better to avoid using Kerberos authentication here at all.
> 
> Multiple hosts authenticating with the same key would cause a lot of
> updates in the LDAP entry representing this principal. This is going to
> break replication if this is the only key that is used by multiple hosts
> against multiple IPA masters.

If replication is a issue we should probably mask those attributes from
replication as well, just like we do for attributes for failed auth.

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc


-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project


Re: [Freeipa-users] any tips or horror stories about automating dynamic enrollment and removal of IPA clients?

2017-04-13 Thread Alexander Bokovoy

On to, 13 huhti 2017, Simo Sorce wrote:

On Thu, 2017-04-13 at 08:05 -0400, Chris Dagdigian wrote:

Hi folks,

I've got a high performance computing (HPC) use case that will need AD
integration for user identity management. We've got a working IPA server
in AWS that has 1-way trusts going to several remote AD forests and
child domains. Works fine but so far all of the enrolled clients are
largely static/persistent boxes.

The issue is that the HPC cluster footprint is going to be elastic by
design. We'll likely keep 3-5 nodes in the grid online 24x7 but the vast
majority of the compute node fleet (hundreds of nodes quite likely) will
be fired up on demand as a mixture of spot, RI and hourly-rate EC2
instances. The cluster will automatically shrink in size as well when
needed.

Trying to think of which method I should use for managing users  (mainly
UID and GID values) on the compute fleet:

[Option 1]  Script the enrollment and de-install actions via existing
hooks we have for running scripts at "first boot" as well as
"pre-termination".  I think this seems technically pretty
straightforward but I'm not sure I really need to stuff our IPA server
with host information for boxes that are considered anonymous and
disposable. We don't care about them really and don't need to implement
RBAC controls on them. Also slightly worried that a large-scale
enrollment or uninstall action may bog down the server or (worse)
perhaps only partially complete leading to an HPC grid where jobs flow
into a bad box and die en-mass because "user does not exist..."

[Option 2]  Steal from the HPC ops playbook and minimize network
services that can cause failures. Distribute static files to the worker
fleet --  Bind the 24x7 persistent systems to the IPA server and force
all HPC users to provide a public SSH key. Then use commands like "id


One option could also be to keep a (set of) keytab(s) you can copy on
the elastic hosts and preconfigure their sssd daemon. At boot you copy
the keytab in the host and start sssd and everything should magically
work. They all are basically the same identity so using the same key for
all of them may be acceptable.

It would be better to avoid using Kerberos authentication here at all.

Multiple hosts authenticating with the same key would cause a lot of
updates in the LDAP entry representing this principal. This is going to
break replication if this is the only key that is used by multiple hosts
against multiple IPA masters.

--
/ Alexander Bokovoy

--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project


Re: [Freeipa-users] any tips or horror stories about automating dynamic enrollment and removal of IPA clients?

2017-04-13 Thread Simo Sorce
On Thu, 2017-04-13 at 08:05 -0400, Chris Dagdigian wrote:
> Hi folks,
> 
> I've got a high performance computing (HPC) use case that will need AD 
> integration for user identity management. We've got a working IPA server 
> in AWS that has 1-way trusts going to several remote AD forests and 
> child domains. Works fine but so far all of the enrolled clients are 
> largely static/persistent boxes.
> 
> The issue is that the HPC cluster footprint is going to be elastic by 
> design. We'll likely keep 3-5 nodes in the grid online 24x7 but the vast 
> majority of the compute node fleet (hundreds of nodes quite likely) will 
> be fired up on demand as a mixture of spot, RI and hourly-rate EC2 
> instances. The cluster will automatically shrink in size as well when 
> needed.
> 
> Trying to think of which method I should use for managing users  (mainly 
> UID and GID values) on the compute fleet:
> 
> [Option 1]  Script the enrollment and de-install actions via existing 
> hooks we have for running scripts at "first boot" as well as 
> "pre-termination".  I think this seems technically pretty 
> straightforward but I'm not sure I really need to stuff our IPA server 
> with host information for boxes that are considered anonymous and 
> disposable. We don't care about them really and don't need to implement 
> RBAC controls on them. Also slightly worried that a large-scale 
> enrollment or uninstall action may bog down the server or (worse) 
> perhaps only partially complete leading to an HPC grid where jobs flow 
> into a bad box and die en-mass because "user does not exist..."
> 
> [Option 2]  Steal from the HPC ops playbook and minimize network 
> services that can cause failures. Distribute static files to the worker 
> fleet --  Bind the 24x7 persistent systems to the IPA server and force 
> all HPC users to provide a public SSH key. Then use commands like "id 
>  that we can manufacture static /etc/passwd, /etc/shadow and /etc/group 
> files that can be pushed out to the compute node fleet. The main win 
> here is that we can maintain consistent IPA-derived 
> UID/GID/username/group data cluster wide while totally removing the need 
> for an elastic set of anonymous boxes to be individually enrolled and 
> removed from IPA all the time.
> 
> Right now I'm leaning towards Option #2 but would love to hear 
> experiences regarding moderate-scale automatic enrollment and removal of 
> clients!

One option could also be to keep a (set of) keytab(s) you can copy on
the elastic hosts and preconfigure their sssd daemon. At boot you copy
the keytab in the host and start sssd and everything should magically
work. They all are basically the same identity so using the same key for
all of them may be acceptable.

>From the IPA side it will look like suddenly the same host has multiple
IP addresses and is opening one connection from each of them, but that
is ok.

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc


-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project


Re: [Freeipa-users] any tips or horror stories about automating dynamic enrollment and removal of IPA clients?

2017-04-13 Thread Gerald-Markus Zabos
Am Donnerstag, den 13.04.2017, 08:05 -0400 schrieb Chris Dagdigian:

> Right now I'm leaning towards Option #2 but would love to hear 
> experiences regarding moderate-scale automatic enrollment and removal of 
> clients!
> 
> -Chris

Hi Chris,

we're facing a similar use case from day to day, but changed from AWS to
another cloud provider. Our use case works on both, so i am refering to
AWS.

We decided...

...to use SGE for our HPC infrastructure
...recycle network ranges for 100 static IP addresses + 100 static
hostnames
...to use scripts & cronjobs & ansible (depending on "qstat" and "qhost"
output) on the cluster head node to determine how many additional
cluster nodes have to be created as an additional reserve for
"What-if-we-need-more-nodes?" scenarios
...to create cluster nodes via ansible-playbook on AWS from a
pre-defined image, do software installation & configuration via
ansible-playbook, do the IPA domain join via ansible-playbook
("ipa-client-install --domain= --mkhomedir
--hostname=. --ip-address= -p  -w  --unattended")
...to destroy cluster nodes in two steps: 1) ansible-playbook
"ipa-client-install --uninstall", 2) ansible-playbook destroy cluster
node on AWS via API

(Right now, i am working on a bulk creation script of IPA users/groups
for expanding our single HPC cluster into several ones, whereas we have
the same set of users (~65-100) with differing suffix in the username
e.g. "it_ops01", "it_ops20", etc...)

We're using 2x IPA-Servers (ESXi VMs, 4GB RAM, 2 CPU) in replication
with another 2x IPA Servers (same dimensions) on our main physical
datacenter. Didn't see much impact on the IPA servers during
enrollment/removal of domain hosts. So far after three months of
operations, we had several "bad box" scenarios, all of them because of
problems with SGE. We solved these problems manually, by removing/adding
cluster nodes via SGE commands. 

As you can see, i tend to [Option 1], since it does all the magic with
pre-defined software commands(sge, ansible, ipa cli), instead of jumping
around with additional scripts doing work, which can be done by
"built-in" commands. For us, this works best.

Regards,

Gerald
-- 
Gerald-Markus Zabos 
Web: http://www.gmzgames.de

-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project