Re: Gentoo for many servers (was: Re: [gentoo-user] executing commands on lots of servers at once)

2009-11-15 Thread Joshua Murphy
On Sat, Nov 14, 2009 at 5:09 PM, Alex Schuster wo...@wonkology.org wrote:
 Alan McKinnon writes:

 On Saturday 14 November 2009 19:36:06 Alex Schuster wrote:
 Alan McKinnon wrote:

 clusterssh will let you log into many machines at once and run emerge
 -avuND world everywhere
 This is way cool. I just started using it on eight Fedora servers I am
 administrating. Nice, now this is an improvement over my 'for $h in
 $HOSTS; do ssh $h yum install foo; done' approach.

 I feel your pain :-)

 We used to have the same problem adding new admins to 87 machines. Now
 we have a bespoke provisioner that does it all.

 Sorry, I just do not get 'bespoke provisioner'. Some sort of software,
 like clusterssh? Or a person, one admin instead of many?


 What do you guys think about using Gentoo for servers? At the institute
 I partially work we chose Fedora. There is no special reason for that -
 we already had some Fedora machines, the setup seemed to work, the
 reputation was good, so we kept it. That was okay for me, why choose
 many different environments and learn everything again. I mentioned
 Gentoo, but did not really suggest to actually use it. Maybe I should
 have.

 I'm a huge fan of Gentoo

 Now who would have thought of that!

 and all my personal machines (except the new netbook have run it for the
 last 5 years.

 But I will never install Gentoo on a production server at work.

 Why?

 Because it is too time consuming, because no two machines are set up the
 same, because I can't trust that other admins used the flags they should
 have. So updates become a case of logging into 80+ machines individually
 and doing emerge world by hand. Gentoo allows you to customize things to
 the nth degree - that is it's strength - so people WILL use this one
 discriminating factor.

 If OTOH I had a server farm of 80+ machines, all identical, I'd put
 Gentoo on them in a flash. But I don't have that

 Of our 8 machines, 7 are essentially the same and differ only in hard
 drive space and CPU speed. The other machine is Intel, not AMD, and needs
 different IDE drivers. At the moment it has a different initrd (I set up a
 minimal fedora install to generate it after the cloned system did not
 boot), the rest is - apart from some config files - identical.

 So I would make sure that about everything is exactly the same, well,
 maybe except for hostnames, udev net-persistent-rules, ssh keys... what
 more?
 The last, a little different machine is a problem though. With optimized
 CFLAGS, this one would have to compile all stuff again, while for the
 others I could use binpkgs. Updating them all with clusterssh should not
 be much more work than updating a single one. Well, not completely true, I
 would have the double work, as I would upgrade one server first to test if
 there are problems, and then do it for the others. Maybe I could use the
 special machine to test stuff, and then update all the others.

 If they would differ, Gentoo would of course be too much work. I already
 have this problem now... there is my desktop machine, my notebook running
 a Gentoo VM, a second desktop machine at my other home, the living-room
 machine of my flat share, the machine of a fried I also administrate, the
 server of my flat share I need to set up again... and clusterssh is no
 option here.

My potentially ill informed thoughts on the above issues/ideas:

1) Pick one machine to host both your make.conf as well as your
portage tree and distfiles, potentially splitting them into separate
nfs mounts shared out for the rest of the hosts (having the portage
tree itself ro on all but its owning machine forces centralization of
syncing).

2) /etc/make.conf should simply be a symlink to the centrally located
copy. If you must use binpackages, set march to something that will
run on every machine involved, then set mcpu to whatever machine is
most common if you want to get just a bit more performance here or
there. If you don't mind compiling on every host, though, set portage
niceness to something friendly to your users and march to native (if
you plan to use distcc, this is a BAD idea, use the binpackages).

3) use a replaceable (otherwise identical to the others, and therefore
able to be brought back online by just cloning it over) system for
your testing and keep frequent scheduled backups of whichever system
plays host to your portage tree, binpackages, and distfiles.

4) build your kernel with built in drivers for every piece of
boot-time essential hardware in your systems. You'll still be on a far
cleaner setup than a mass produced distro provided kernel, you'll only
need to maintain one for all your systems, and you'll only have one
kernel to worry about building against if you need any out-of-kernel
modules as well.

5) script the changing of ssh host keys (or even redistribution of
them, if you ), removal of persistent net rules, and prompting for the
setting of host name and you'll have a nice, tiny, postinstall tool
for the rare 

Gentoo for many servers (was: Re: [gentoo-user] executing commands on lots of servers at once)

2009-11-14 Thread Alex Schuster
Alan McKinnon wrote:

 clusterssh will let you log into many machines at once and run emerge
  -avuND world everywhere

This is way cool. I just started using it on eight Fedora servers I am 
administrating. Nice, now this is an improvement over my 'for $h in 
$HOSTS; do ssh $h yum install foo; done' approach.

What do you guys think about using Gentoo for servers? At the institute I 
partially work we chose Fedora. There is no special reason for that - we 
already had some Fedora machines, the setup seemed to work, the reputation 
was good, so we kept it. That was okay for me, why choose many different 
environments and learn everything again. I mentioned Gentoo, but did not 
really suggest to actually use it. Maybe I should have.

These 8 servers I mentioned are basically clones of the one I installed 
manually. Instead of doing this again, I boot a live-cd on a new one, 
create partitions, and extract tar files of the first server's partitions. 
Then I do some extra configuration, like hostname and network setup. Done.

My plan for updating them is to take the first server down, and upgrade 
the installation (if that works - I had some trouble with that before, so 
maybe it will be better to reinstall from scratch). Then I will create a 
snapshot of the new setup, transfer that to the other hosts, and unpack it 
in new logical volumes. I plan to script this so I do not have to do it 
manually every time - but that was before I knew ClusterSSH. When all is 
done and there is some time to take the servers down, I will reboot into 
the new system.

Now I am thinking about a Gentoo installation instead.

Pros:
 - Continuous updates, no downtime for upgrading, only when I decide to 
install a new kernel. This is really really cool. I fear the upgrade from 
Fedora 10 to 12 which has to be done soon.
 - Some improvement in speed. Those machines do A LOT of numbercrunching, 
which jobs often lasting for days, so even small improvements would be 
nice.
 - Easier debugging. When things do not work, I think it's easier to dig 
into the problem. No fancy, but sometimes buggy GUIs hiding basic 
functionality.
 - Heck, Gentoo is _cooler_ than typical distributions. And emerging with 
distcc on about 8*4 cores would be fun :)
 - I am probably the only one who can administrate them.

Cons:
 - If something will not work with this not so common (meta)distribution, 
people will say always trouble with your Gentoo Schmentoo, it works fine 
in Fedora. Fedora is more mainstream, if something does not work there, 
then it's okay for the people to accept it.
 - I fear that big packages like Matlab are made for and tested on the 
typical distributions, and may have problems with the not-so-common 
Gentoo. I think someone here just had such a problem with Mathematica 
(which we do currently not use).
 - I am probably the only one who can administrate them. I think Gentoo is 
easier to maintain in the long run, but only when you take the time to 
learn it. With Fedora, you do not need much more than the 'yum install' 
command. There is no need to read complicated X.org upgrade guides and 
such.

I think I already made my decision, but I am still interested in your 
opinions, maybe some of you are in a similar position and like to share 
your experiences. Whether I will be allowed to use Gentoo is another 
question, I guess my boss will not like my idea at first, and I am not 
even sure if he is right. But maybe I can test-install Gentoo on one 
machine in a chroot, and see if things work fine.

Wonko



Re: Gentoo for many servers (was: Re: [gentoo-user] executing commands on lots of servers at once)

2009-11-14 Thread Alan McKinnon
On Saturday 14 November 2009 19:36:06 Alex Schuster wrote:
 Alan McKinnon wrote:
  clusterssh will let you log into many machines at once and run emerge
   -avuND world everywhere
 
 This is way cool. I just started using it on eight Fedora servers I am
 administrating. Nice, now this is an improvement over my 'for $h in
 $HOSTS; do ssh $h yum install foo; done' approach.

I feel your pain :-)

We used to have the same problem adding new admins to 87 machines. Now we have 
a bespoke provisioner that does it all.

 What do you guys think about using Gentoo for servers? At the institute I
 partially work we chose Fedora. There is no special reason for that - we
 already had some Fedora machines, the setup seemed to work, the reputation
 was good, so we kept it. That was okay for me, why choose many different
 environments and learn everything again. I mentioned Gentoo, but did not
 really suggest to actually use it. Maybe I should have.

I'm a huge fan of Gentoo and all my personal machines (except the new netbook) 
have run it for the last 5 years.

But I will never install Gentoo on a production server at work.

Why?

Because it is too time consuming, because no two machines are set up the same, 
because I can't trust that other admins used the flags they should have. So 
updates become a case of logging into 80+ machines individually and doing 
emerge world by hand. Gentoo allows you to customize things to the nth degree 
- that is it's strength - so people WILL use this one discriminating factor.

If OTOH I had a server farm of 80+ machines, all identical, I'd put Gentoo on 
them in a flash. But I don't have that
 
 These 8 servers I mentioned are basically clones of the one I installed
 manually. Instead of doing this again, I boot a live-cd on a new one,
 create partitions, and extract tar files of the first server's partitions.
 Then I do some extra configuration, like hostname and network setup. Done.
 
 My plan for updating them is to take the first server down, and upgrade
 the installation (if that works - I had some trouble with that before, so
 maybe it will be better to reinstall from scratch). Then I will create a
 snapshot of the new setup, transfer that to the other hosts, and unpack it
 in new logical volumes. I plan to script this so I do not have to do it
 manually every time - but that was before I knew ClusterSSH. When all is
 done and there is some time to take the servers down, I will reboot into
 the new system.
 
 Now I am thinking about a Gentoo installation instead.
 
 Pros:
  - Continuous updates, no downtime for upgrading, only when I decide to
 install a new kernel. This is really really cool. I fear the upgrade from
 Fedora 10 to 12 which has to be done soon.

Do not upgrade, especially not with a version jump of 2 or more. If you have a 
lot of machines, I assume you are a decent shop, and that you have some form 
of formal process for upgrades and changes.

What you do instead is a formal migration - copy the data off, reinstall, 
restore data. If you can't afford to do that every six or twleve months, then 
I have to ask - what the hell is the organization doing using a distro that is 
unsupported after 12 months?

  - Some improvement in speed. Those machines do A LOT of numbercrunching,
 which jobs often lasting for days, so even small improvements would be
 nice.

Don't fool yourself. Unless you need what Google needs, there is very little 
speed difference between Gentoo and Fedora. I/O improvements you need can be 
easily gotten by fiddling the kernel tuning knobs.

  - Easier debugging. When things do not work, I think it's easier to dig
 into the problem. No fancy, but sometimes buggy GUIs hiding basic
 functionality.

Emm, Fedora does not require a GUI :-)

  - Heck, Gentoo is _cooler_ than typical distributions. And emerging with
 distcc on about 8*4 cores would be fun :)

Can't argue with that.

But that is your ego talking and the machines do not belong to you but to the 
institute. Your ego has no place in that.

  - I am probably the only one who can administrate them.

This is not a benefit. It is a severe liability.

Where I work, I get fired for trying that :-(

 Cons:
  - If something will not work with this not so common (meta)distribution,
 people will say always trouble with your Gentoo Schmentoo, it works fine
 in Fedora. Fedora is more mainstream, if something does not work there,
 then it's okay for the people to accept it.

Those same people are likely to say the same about linux vs windows.

  - I fear that big packages like Matlab are made for and tested on the
 typical distributions, and may have problems with the not-so-common
 Gentoo. I think someone here just had such a problem with Mathematica
 (which we do currently not use).

One or two persons had problems. Many many more replied that they had no 
problems at all. In Fedora-land, the ratio is the same.

  - I am probably the only one who can administrate them. I 

Re: Gentoo for many servers (was: Re: [gentoo-user] executing commands on lots of servers at once)

2009-11-14 Thread Alex Schuster
Alan McKinnon writes:

 On Saturday 14 November 2009 19:36:06 Alex Schuster wrote:
 Alan McKinnon wrote:

 clusterssh will let you log into many machines at once and run emerge
 -avuND world everywhere
 This is way cool. I just started using it on eight Fedora servers I am
 administrating. Nice, now this is an improvement over my 'for $h in
 $HOSTS; do ssh $h yum install foo; done' approach.
 
 I feel your pain :-)
 
 We used to have the same problem adding new admins to 87 machines. Now
 we have a bespoke provisioner that does it all.

Sorry, I just do not get 'bespoke provisioner'. Some sort of software, 
like clusterssh? Or a person, one admin instead of many?


 What do you guys think about using Gentoo for servers? At the institute
 I partially work we chose Fedora. There is no special reason for that -
 we already had some Fedora machines, the setup seemed to work, the
 reputation was good, so we kept it. That was okay for me, why choose
 many different environments and learn everything again. I mentioned
 Gentoo, but did not really suggest to actually use it. Maybe I should
 have.
 
 I'm a huge fan of Gentoo

Now who would have thought of that!

 and all my personal machines (except the new netbook have run it for the
 last 5 years.
 
 But I will never install Gentoo on a production server at work.
 
 Why?
 
 Because it is too time consuming, because no two machines are set up the 
 same, because I can't trust that other admins used the flags they should
 have. So updates become a case of logging into 80+ machines individually
 and doing emerge world by hand. Gentoo allows you to customize things to
 the nth degree - that is it's strength - so people WILL use this one
 discriminating factor.
 
 If OTOH I had a server farm of 80+ machines, all identical, I'd put 
 Gentoo on them in a flash. But I don't have that

Of our 8 machines, 7 are essentially the same and differ only in hard 
drive space and CPU speed. The other machine is Intel, not AMD, and needs 
different IDE drivers. At the moment it has a different initrd (I set up a 
minimal fedora install to generate it after the cloned system did not 
boot), the rest is - apart from some config files - identical.

So I would make sure that about everything is exactly the same, well, 
maybe except for hostnames, udev net-persistent-rules, ssh keys... what 
more?
The last, a little different machine is a problem though. With optimized 
CFLAGS, this one would have to compile all stuff again, while for the 
others I could use binpkgs. Updating them all with clusterssh should not 
be much more work than updating a single one. Well, not completely true, I 
would have the double work, as I would upgrade one server first to test if 
there are problems, and then do it for the others. Maybe I could use the 
special machine to test stuff, and then update all the others.

If they would differ, Gentoo would of course be too much work. I already 
have this problem now... there is my desktop machine, my notebook running 
a Gentoo VM, a second desktop machine at my other home, the living-room 
machine of my flat share, the machine of a fried I also administrate, the 
server of my flat share I need to set up again... and clusterssh is no 
option here.


 Now I am thinking about a Gentoo installation instead.

 Pros:
  - Continuous updates, no downtime for upgrading, only when I decide to
 install a new kernel. This is really really cool. I fear the upgrade
 from Fedora 10 to 12 which has to be done soon.
 
 Do not upgrade, especially not with a version jump of 2 or more. If you 
 have a  lot of machines, I assume you are a decent shop, and that you
 have some form of formal process for upgrades and changes.

Not really, I think. We are not very professional I must admit. We have 
two capable admins, but one is specialized in network stuff and Windows, 
the other has to do with our big Sun servers, huuge storage systems and 
such. They do not much about the Linux cluster. Another user sometimes 
installs a package on a machine, but usually I do this. For me, it is not 
my main job, I work only about ten hours per week there, mostly being some 
100 km away.
We are a research institute. We do neurological research, PET and MRI 
tomography. The Linux servers do number crunching, and of course they 
should work and have good uptimes, but it is not as important as if we 
were an ISP.

 What you do instead is a formal migration - copy the data off, 
 reinstall, restore data. 

Advice noted. Yes, this sounds like the better idea, giving a cleaner 
setup. And if some things break I do not have to wonder if it was some 
strange side effect from the upgrade process.

 If you can't afford to do that every six or twleve months, then 
 I have to ask - what the hell is the organization doing using a distro 
 that is unsupported after 12 months?

Well, I do not think this was considered much. One machine was set up with 
Fedora for no specific reason, and we kept this distro then.