Re: [Beowulf] Bright Cluster Manager

2018-05-02 Thread Jörg Saßmannshausen
Dear Chris,

further to your email:

> - And if miracles occur and they do have expert level linux people then
> more often than not these people are overworked or stretched in many
> directions

This is exactly what has happened to me at the old work place: pulled into too 
many different directions. 

I am a bit surprised about the ZFS experiences. Although I did not have 
petabyte of storage and I did not generate 300 TB per week, I did have a 
fairly large storage space running on xfs and ext4 for backups and 
provisioning of file space. Some of it was running on old hardware (please sit 
down, I am talking about me messing around with SCSI cables) and I gradually 
upgraded to newer one. So, I am not quite sure what went wrong with the ZFS 
storage here. 

However, there is a common trend, at least what I observe here in the UK, to 
out-source problems: pass the bucket to somebody else and we pay for it. 
I am personally still  more of an in-house expert than an out-sourced person 
who may or may not be able to understand what you are doing. 
I should add I am working in academia and I know little about the commercial 
world here. Having said that, my friends in commerce are telling me that the 
company likes to outsource as it is 'cheaper'. 
I agree with the Linux expertise. I think I am one of the two who are Linux 
admins in the present work place. The official line is: we do not support Linux 
(but we teach it). 

Anyhow, I don't want to digress here too much. However, "..do HPC work in 
commercial environments where the skills simply don't exist onsite."
Are we a dying art?

My 1 shilling here from a still cold and dark London.

Jörg



Am Mittwoch, 2. Mai 2018, 16:19:48 BST schrieb Chris Dagdigian:
> Jeff White wrote:
> > I never used Bright.  Touched it and talked to a salesperson at a
> > conference but I wasn't impressed.
> > 
> > Unpopular opinion: I don't see a point in using "cluster managers"
> > unless you have a very tiny cluster and zero Linux experience.  These
> > are just Linux boxes with a couple applications (e.g. Slurm) running
> > on them.  Nothing special. xcat/Warewulf/Scyld/Rocks just get in the
> > way more than they help IMO.  They are mostly crappy wrappers around
> > free software (e.g. ISC's dhcpd) anyway.  When they aren't it's
> > proprietary trash.
> > 
> > I install CentOS nodes and use
> > Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs and
> > software.  This also means I'm not suck with "node images" and can
> > instead build everything as plain old text files (read: write
> > SaltStack states), update them at will, and push changes any time.  My
> > "base image" is CentOS and I need no "baby's first cluster" HPC
> > software to install/PXEboot it.  YMMV
> 
> Totally legit opinion and probably not unpopular at all given the user
> mix on this list!
> 
> The issue here is assuming a level of domain expertise with Linux,
> bare-metal provisioning, DevOps and (most importantly) HPC-specific
> configStuff that may be pervasive or easily available in your
> environment but is often not easily available in a
> commercial/industrial  environment where HPC or "scientific computing"
> is just another business area that a large central IT organization must
> support.
> 
> If you have that level of expertise available then the self-managed DIY
> method is best. It's also my preference
> 
> But in the commercial world where HPC is becoming more and more
> important you run into stuff like:
> 
> - Central IT may not actually have anyone on staff who knows Linux (more
> common than you expect; I see this in Pharma/Biotech all the time)
> 
> - The HPC user base is not given budget or resource to self-support
> their own stack because of a drive to centralize IT ops and support
> 
> - And if they do have Linux people on staff they may be novice-level
> people or have zero experience with HPC schedulers, MPI fabric tweaking
> and app needs (the domain stuff)
> 
> - And if miracles occur and they do have expert level linux people then
> more often than not these people are overworked or stretched in many
> directions
> 
> 
> So what happens in these environments is that organizations will
> willingly (and happily) pay commercial pricing and adopt closed-source
> products if they can deliver a measurable reduction in administrative
> burden, operational effort or support burden.
> 
> This is where Bright, Univa etc. all come in -- you can buy stuff from
> them that dramatically reduces that onsite/local IT has to manage the
> care and feeding of.
> 
> Just having a vendor to call for support on Grid Engine oddities makes
> the cost of Univa licensing worthwhile. Just having a vendor like Bright
> be on the hook for "cluster operations" is a huge win for an overworked
> IT staff that does not have linux or HPC specialists on-staff or easily
> available.
> 
> My best example of "paying to reduce operational burden in HPC" comes
> from a massive well known genome 

Re: [Beowulf] Bright Cluster Manager

2018-05-02 Thread Jörg Saßmannshausen
Dear all,

at least something I can contribute here: at the new work place the small 
cluster I am looking after is using Bright Cluster Manager to manage the 20 
nodes and the 10 or so GPU nodes.

I was not around when it all got installed so I cannot comment on how quickly 
it can be done or how easily. 

I used to do larger installations with up to 112 compute nodes which have 
different physical hardware. So I needed at least 2 images. I done all of that 
with a bit of scripting and not with a GUI. I did not use LDAP and 
authentication was done locally. It all provided a robust system. Maybe not as 
easy to manage as a system which got a GUI which does it all for you but on 
the flip side I knew exactly what the scripts were doing and what I need to do 
if there was a problem. 

By enlarge I agree with what John  Hearns said for example. To be frank: I 
still consider the Bright Cluster Manager tool to be good for people who do 
not know about HPC (I stick to that for this argument), don't know much about 
Linux etc. So in my personal opinion it is good for those who's day-to-day job 
is not HPC but something different. People who are coming from a GUI world (I 
don't mean that nasty here). For situations where it does not reckon to have a 
dedicated support. So for this it is fantastic: it works, there is a good 
support if things go wrong. 
We are using SLURM and the only issue I had when I first started at the new 
place a year ago that during a routine update SLRUM got re-installed and all 
the configurations were gone. This could be as it was not installed properly in 
the first place, it could be a bug, we don't know as the support did not manage 
to reproduce this. 
I am having some other minor issues with the authentication (we are 
authenticating against external AD) but again that could be the way it was 
installed at the time. I don't know who done that. 

Having said all of that: I am personally more a hands-on person so I know what 
the system is doing. This usually gets obscured by a GUI which does things in 
the background you may or may not want it to do. I had some problems at the 
old work place with ROCKS which lead me to removing it and install Debian on 
the clusters. They were working rock solid, even on hardware which had issues 
with the ROCKS installation. 

So, for me the answer to the question is: it depends: If you got a capable HPC 
admin who is well networked and you got a larger, specialized cluster, you 
might be better off to use the money and buy some additional compute nodes. 
For a installation where you do not have a dedicated admin, and you might have 
a smaller, homogeneous installation, you might be better off with a cluster 
management tool light the one Bright is offering. 
If money is an issue, you need to carefully balance the two: a good HPC admin 
does more than installing software, they do user support as well for example 
and make sure users can work. If you are lucky, you get one who actually 
understands what the users are doing. 

I think that is basically what everybody here says in different words: your 
mileage will vary.

My to shillings from a rather cold London! :-)

Jörg

Am Dienstag, 1. Mai 2018, 16:57:40 BST schrieb Robert Taylor:
> Hi Beowulfers.
> Does anyone have any experience with Bright Cluster Manager?
> My boss has been looking into it, so I wanted to tap into the collective
> HPC consciousness and see
> what people think about it.
> It appears to do node management, monitoring, and provisioning, so we would
> still need a job scheduler like lsf, slurm,etc, as well. Is that correct?
> 
> If you have experience with Bright, let me know. Feel free to contact me
> off list or on.

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] Bright Cluster Manager

2018-05-02 Thread Chris Dagdigian

Jeff White wrote:


I never used Bright.  Touched it and talked to a salesperson at a 
conference but I wasn't impressed.


Unpopular opinion: I don't see a point in using "cluster managers" 
unless you have a very tiny cluster and zero Linux experience.  These 
are just Linux boxes with a couple applications (e.g. Slurm) running 
on them.  Nothing special. xcat/Warewulf/Scyld/Rocks just get in the 
way more than they help IMO.  They are mostly crappy wrappers around 
free software (e.g. ISC's dhcpd) anyway.  When they aren't it's 
proprietary trash.


I install CentOS nodes and use 
Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs and 
software.  This also means I'm not suck with "node images" and can 
instead build everything as plain old text files (read: write 
SaltStack states), update them at will, and push changes any time.  My 
"base image" is CentOS and I need no "baby's first cluster" HPC 
software to install/PXEboot it.  YMMV





Totally legit opinion and probably not unpopular at all given the user 
mix on this list!


The issue here is assuming a level of domain expertise with Linux, 
bare-metal provisioning, DevOps and (most importantly) HPC-specific 
configStuff that may be pervasive or easily available in your 
environment but is often not easily available in a 
commercial/industrial  environment where HPC or "scientific computing" 
is just another business area that a large central IT organization must 
support.


If you have that level of expertise available then the self-managed DIY 
method is best. It's also my preference


But in the commercial world where HPC is becoming more and more 
important you run into stuff like:


- Central IT may not actually have anyone on staff who knows Linux (more 
common than you expect; I see this in Pharma/Biotech all the time)


- The HPC user base is not given budget or resource to self-support 
their own stack because of a drive to centralize IT ops and support


- And if they do have Linux people on staff they may be novice-level 
people or have zero experience with HPC schedulers, MPI fabric tweaking 
and app needs (the domain stuff)


- And if miracles occur and they do have expert level linux people then 
more often than not these people are overworked or stretched in many 
directions



So what happens in these environments is that organizations will 
willingly (and happily) pay commercial pricing and adopt closed-source 
products if they can deliver a measurable reduction in administrative 
burden, operational effort or support burden.


This is where Bright, Univa etc. all come in -- you can buy stuff from 
them that dramatically reduces that onsite/local IT has to manage the 
care and feeding of.


Just having a vendor to call for support on Grid Engine oddities makes 
the cost of Univa licensing worthwhile. Just having a vendor like Bright 
be on the hook for "cluster operations" is a huge win for an overworked 
IT staff that does not have linux or HPC specialists on-staff or easily 
available.


My best example of "paying to reduce operational burden in HPC" comes 
from a massive well known genome shop in the cambridge, MA area. They 
often tell this story:


- 300 TB of new data generation per week (many years ago)
- One of the initial storage tiers was ZFS running on commodity server 
hardware
- Keeping the DIY ZFS appliances online and running took the FULL TIME 
efforts of FIVE STORAGE ENGINEERS


They realized that staff support was not scalable with DIY/ZFS at 
300TB/week of new data generation so they went out and bought a giant 
EMC Isilon scale-out NAS platform


And you know what? After the Isilon NAS was deployed the management of 
*many* petabytes of single-namespace storage was now handled by the IT 
Director in his 'spare time' -- And the five engineers who used to do 
nothing but keep ZFS from falling over were re-assigned to more 
impactful and presumably more fun/interesting work.



They actually went on stage at several conferences and told the story of 
how Isilon allowed senior IT leadership to manage petabyte volumes of 
data "in their spare time" -- this was a huge deal and really resonated 
. Really reinforced for me how in some cases it's actually a good idea 
to pay $$$ for commercial stuff if it delivers gains in 
ops/support/management.



Sorry to digress! This is a topic near and dear to me. I often have to 
do HPC work in commercial environments where the skills simply don't 
exist onsite. Or more commonly -- they have budget to buy software or 
hardware but they are under a hiring freeze and are not allowed to bring 
in new Humans.


Quite a bit of my work on projects like this is helping people make 
sober decisions regarding "build" or "buy" -- and in those environments 
it's totally clear that for some things it makes sense for them to pay 
for an expensive commercially supported "thing" that they don't have to 
manage or support themselves



My $.02 ...







Re: [Beowulf] Bright Cluster Manager

2018-05-02 Thread Jeff White
I never used Bright.  Touched it and talked to a salesperson at a 
conference but I wasn't impressed.


Unpopular opinion: I don't see a point in using "cluster managers" 
unless you have a very tiny cluster and zero Linux experience.  These 
are just Linux boxes with a couple applications (e.g. Slurm) running on 
them.  Nothing special. xcat/Warewulf/Scyld/Rocks just get in the way 
more than they help IMO.  They are mostly crappy wrappers around free 
software (e.g. ISC's dhcpd) anyway.  When they aren't it's proprietary 
trash.


I install CentOS nodes and use 
Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs and 
software.  This also means I'm not suck with "node images" and can 
instead build everything as plain old text files (read: write SaltStack 
states), update them at will, and push changes any time.  My "base 
image" is CentOS and I need no "baby's first cluster" HPC software to 
install/PXEboot it.  YMMV



Jeff White

On 05/01/2018 01:57 PM, Robert Taylor wrote:

Hi Beowulfers.
Does anyone have any experience with Bright Cluster Manager?
My boss has been looking into it, so I wanted to tap into the 
collective HPC consciousness and see

what people think about it.
It appears to do node management, monitoring, and provisioning, so we 
would still need a job scheduler like lsf, slurm,etc, as well. Is that 
correct?


If you have experience with Bright, let me know. Feel free to contact 
me off list or on.




___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.beowulf.org_mailman_listinfo_beowulf=DwIGaQ=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw=DhM5WMgdrH-xWhI5BzkRTzoTvz8C-BRZ05t9kW9SXZk=2km_EqLvNf2v9rNf8LphAYkJ-Sc_azfEyHqyDIzpLOc=kq0wdhy80VqcBCwcQAAQa0RbsgWIekhd0qU0zC81g1Q=


___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] Bright Cluster Manager

2018-05-02 Thread Andrew Holway
On 1 May 2018 at 22:57, Robert Taylor  wrote:

> Hi Beowulfers.
> Does anyone have any experience with Bright Cluster Manager?
>

I used to work for ClusterVision from which Bright Cluster Manager was
born. Although my experience is now quite some years out of date I would
still recommend it mainly because Martijn de Vries is still CTO after 8
years and they have a very stable team of gifted developers. The company
has a single focus and they have been at it for a long time.

Back in the day I was able to deploy a complete cluster within a couple of
hours using BCM. All the nodes would boot over PXE and perform an
interesting "pivot root" operation to switch to the freshly installed HDD
from the PXE target. The software supported roles which would integrate
with SLURM allowing GPU node pools for instance. It was quite impressive
that people were able to get their code running so quickly.

I would say that, as a package, its definitely worth the money unless you
have a team of engineers kicking around. The CLI and API were a bit rough
and ready but its been 6 years since I last used it.

They also managed to successfully integrate OpenStack which is a bit of a
feat in its self.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] Bright Cluster Manager

2018-05-02 Thread John Hearns via Beowulf
Chris Samuel says:
>I've not used it, but I've heard from others that it can/does supply
> schedulers like Slurm, but (at least then) out of date versions.

Chris, this is true to some extent. When a new release of Slurm or, say,
Singularity is out you need to wait for Bright to package it up and test it
works with their setup.
This makes sense if you think about it - Bright is a supported product and
no company worh their salt would rush out a bleeding edge version of X
without testing.
I can say that the versions tend to be up to date but not bleeding edge - I
cannot give a specific example at the moment, sorry.

But as I say above, if it really matters to you, you can install your own
version on the master and the node images and create a Module file which
brings it into the users PATH.











On 2 May 2018 at 09:32, John Hearns  wrote:

> Robert,
> I have had a great deal of experience with Bright Cluster Manager and
> I am happy to share my thoughts.
>
>
> My experience with Bright has been as a system integrator in the UK, where
> I deployed Bright for a government defence client,
> for a university in London and on our in-house cluster for benchmarking
> and demos.
> I have a good relationship with the Bright employees in the UK and in
> Europe.
>
> Over the last year I have worked with a very big high tech company in the
> Netherlands, who use Bright to manage their clusters
> which run a whole range of applications.
>
> I would say that Bright is surprisingly easy to install - you should be
> going from bare metal to a functioning cluster within an hour.
> The node discovery mecahnism is either to switch on each node in turn and
> confirm the name.
> Or to note down which port in your Ethernet switch a node is connected to
> and Bright will do a MAC address lookup on that port.
> Hint - do the Ethernet port mapping. Make a sensible choice of node to
> port numbering on each switch.
> You of course have to identify the switches also to Bright.
> But it is then a matter of switching all the nodes on at once, then go off
> for well deserved coffee. Happy days.
>
> Bright can cope with most network topologies, including booting over
> Infiniband.
> If you run into problems their support guys are pretty responsive and very
> clueful. If you get stuck they will schedule a Webex
> and get you out of whatever hole you have dug for yourself. There is even
> a reverse ssh tunnel built in to their software,
> so you can 'call home' and someone can log in to help diagnose your
> problem.
>
> I back up what Chris Dagdidian says.  You pays your money and you takes
> your choice.
>
> Regarding the job scheduler, Brigh comes with pre-packaged and integrated
> Slurm, PBSpro,  Gridengine and I am sure LSF.
> So right out of the box you have a default job scheduler set up. All you
> have to do is choose which one at install time.
> Bright rather like Slurm, as I do also. But I stress that it works
> perfectly well with PBSPro, as I have worked in that environment over the
> last year.
> Should you wish to install your own version of Slurm/PBSPro etc. you can
> do that, again I know this works.
>
> I also stress PBSPro - this is now on a dual support model, so it is open
> source if you dont need the formal support from Altair.
>
> Please ask some more questions - I will tune in later.
>
> Also it should be said that if you choose not to go with Bright a good
> open source alternative is OpenHPC.
> But that is a different beast, and takes a lot more care and feeding.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 2 May 2018 at 01:24, Christopher Samuel  wrote:
>
>> On 02/05/18 06:57, Robert Taylor wrote:
>>
>> It appears to do node management, monitoring, and provisioning, so we
>>> would still need a job scheduler like lsf, slurm,etc, as well. Is
>>> that correct?
>>>
>>
>> I've not used it, but I've heard from others that it can/does supply
>> schedulers like Slurm, but (at least then) out of date versions.
>>
>> I've heard from people who like Bright and who don't, so YMMV. :-)
>>
>> --
>>  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>> ___
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] Bright Cluster Manager

2018-05-02 Thread John Hearns via Beowulf
Robert,
I have had a great deal of experience with Bright Cluster Manager and I
am happy to share my thoughts.


My experience with Bright has been as a system integrator in the UK, where
I deployed Bright for a government defence client,
for a university in London and on our in-house cluster for benchmarking and
demos.
I have a good relationship with the Bright employees in the UK and in
Europe.

Over the last year I have worked with a very big high tech company in the
Netherlands, who use Bright to manage their clusters
which run a whole range of applications.

I would say that Bright is surprisingly easy to install - you should be
going from bare metal to a functioning cluster within an hour.
The node discovery mecahnism is either to switch on each node in turn and
confirm the name.
Or to note down which port in your Ethernet switch a node is connected to
and Bright will do a MAC address lookup on that port.
Hint - do the Ethernet port mapping. Make a sensible choice of node to port
numbering on each switch.
You of course have to identify the switches also to Bright.
But it is then a matter of switching all the nodes on at once, then go off
for well deserved coffee. Happy days.

Bright can cope with most network topologies, including booting over
Infiniband.
If you run into problems their support guys are pretty responsive and very
clueful. If you get stuck they will schedule a Webex
and get you out of whatever hole you have dug for yourself. There is even a
reverse ssh tunnel built in to their software,
so you can 'call home' and someone can log in to help diagnose your problem.

I back up what Chris Dagdidian says.  You pays your money and you takes
your choice.

Regarding the job scheduler, Brigh comes with pre-packaged and integrated
Slurm, PBSpro,  Gridengine and I am sure LSF.
So right out of the box you have a default job scheduler set up. All you
have to do is choose which one at install time.
Bright rather like Slurm, as I do also. But I stress that it works
perfectly well with PBSPro, as I have worked in that environment over the
last year.
Should you wish to install your own version of Slurm/PBSPro etc. you can do
that, again I know this works.

I also stress PBSPro - this is now on a dual support model, so it is open
source if you dont need the formal support from Altair.

Please ask some more questions - I will tune in later.

Also it should be said that if you choose not to go with Bright a good open
source alternative is OpenHPC.
But that is a different beast, and takes a lot more care and feeding.























































On 2 May 2018 at 01:24, Christopher Samuel  wrote:

> On 02/05/18 06:57, Robert Taylor wrote:
>
> It appears to do node management, monitoring, and provisioning, so we
>> would still need a job scheduler like lsf, slurm,etc, as well. Is
>> that correct?
>>
>
> I've not used it, but I've heard from others that it can/does supply
> schedulers like Slurm, but (at least then) out of date versions.
>
> I've heard from people who like Bright and who don't, so YMMV. :-)
>
> --
>  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf