Re: [Beowulf] Bright Cluster Manager
Dear Chris, further to your email: > - And if miracles occur and they do have expert level linux people then > more often than not these people are overworked or stretched in many > directions This is exactly what has happened to me at the old work place: pulled into too many different directions. I am a bit surprised about the ZFS experiences. Although I did not have petabyte of storage and I did not generate 300 TB per week, I did have a fairly large storage space running on xfs and ext4 for backups and provisioning of file space. Some of it was running on old hardware (please sit down, I am talking about me messing around with SCSI cables) and I gradually upgraded to newer one. So, I am not quite sure what went wrong with the ZFS storage here. However, there is a common trend, at least what I observe here in the UK, to out-source problems: pass the bucket to somebody else and we pay for it. I am personally still more of an in-house expert than an out-sourced person who may or may not be able to understand what you are doing. I should add I am working in academia and I know little about the commercial world here. Having said that, my friends in commerce are telling me that the company likes to outsource as it is 'cheaper'. I agree with the Linux expertise. I think I am one of the two who are Linux admins in the present work place. The official line is: we do not support Linux (but we teach it). Anyhow, I don't want to digress here too much. However, "..do HPC work in commercial environments where the skills simply don't exist onsite." Are we a dying art? My 1 shilling here from a still cold and dark London. Jörg Am Mittwoch, 2. Mai 2018, 16:19:48 BST schrieb Chris Dagdigian: > Jeff White wrote: > > I never used Bright. Touched it and talked to a salesperson at a > > conference but I wasn't impressed. > > > > Unpopular opinion: I don't see a point in using "cluster managers" > > unless you have a very tiny cluster and zero Linux experience. These > > are just Linux boxes with a couple applications (e.g. Slurm) running > > on them. Nothing special. xcat/Warewulf/Scyld/Rocks just get in the > > way more than they help IMO. They are mostly crappy wrappers around > > free software (e.g. ISC's dhcpd) anyway. When they aren't it's > > proprietary trash. > > > > I install CentOS nodes and use > > Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs and > > software. This also means I'm not suck with "node images" and can > > instead build everything as plain old text files (read: write > > SaltStack states), update them at will, and push changes any time. My > > "base image" is CentOS and I need no "baby's first cluster" HPC > > software to install/PXEboot it. YMMV > > Totally legit opinion and probably not unpopular at all given the user > mix on this list! > > The issue here is assuming a level of domain expertise with Linux, > bare-metal provisioning, DevOps and (most importantly) HPC-specific > configStuff that may be pervasive or easily available in your > environment but is often not easily available in a > commercial/industrial environment where HPC or "scientific computing" > is just another business area that a large central IT organization must > support. > > If you have that level of expertise available then the self-managed DIY > method is best. It's also my preference > > But in the commercial world where HPC is becoming more and more > important you run into stuff like: > > - Central IT may not actually have anyone on staff who knows Linux (more > common than you expect; I see this in Pharma/Biotech all the time) > > - The HPC user base is not given budget or resource to self-support > their own stack because of a drive to centralize IT ops and support > > - And if they do have Linux people on staff they may be novice-level > people or have zero experience with HPC schedulers, MPI fabric tweaking > and app needs (the domain stuff) > > - And if miracles occur and they do have expert level linux people then > more often than not these people are overworked or stretched in many > directions > > > So what happens in these environments is that organizations will > willingly (and happily) pay commercial pricing and adopt closed-source > products if they can deliver a measurable reduction in administrative > burden, operational effort or support burden. > > This is where Bright, Univa etc. all come in -- you can buy stuff from > them that dramatically reduces that onsite/local IT has to manage the > care and feeding of. > > Just having a vendor to call for support on Grid Engine oddities makes > the cost of Univa licensing worthwhile. Just having a vendor like Bright > be on the hook for "cluster operations" is a huge win for an overworked > IT staff that does not have linux or HPC specialists on-staff or easily > available. > > My best example of "paying to reduce operational burden in HPC" comes > from a massive well known genome
Re: [Beowulf] Bright Cluster Manager
Dear all, at least something I can contribute here: at the new work place the small cluster I am looking after is using Bright Cluster Manager to manage the 20 nodes and the 10 or so GPU nodes. I was not around when it all got installed so I cannot comment on how quickly it can be done or how easily. I used to do larger installations with up to 112 compute nodes which have different physical hardware. So I needed at least 2 images. I done all of that with a bit of scripting and not with a GUI. I did not use LDAP and authentication was done locally. It all provided a robust system. Maybe not as easy to manage as a system which got a GUI which does it all for you but on the flip side I knew exactly what the scripts were doing and what I need to do if there was a problem. By enlarge I agree with what John Hearns said for example. To be frank: I still consider the Bright Cluster Manager tool to be good for people who do not know about HPC (I stick to that for this argument), don't know much about Linux etc. So in my personal opinion it is good for those who's day-to-day job is not HPC but something different. People who are coming from a GUI world (I don't mean that nasty here). For situations where it does not reckon to have a dedicated support. So for this it is fantastic: it works, there is a good support if things go wrong. We are using SLURM and the only issue I had when I first started at the new place a year ago that during a routine update SLRUM got re-installed and all the configurations were gone. This could be as it was not installed properly in the first place, it could be a bug, we don't know as the support did not manage to reproduce this. I am having some other minor issues with the authentication (we are authenticating against external AD) but again that could be the way it was installed at the time. I don't know who done that. Having said all of that: I am personally more a hands-on person so I know what the system is doing. This usually gets obscured by a GUI which does things in the background you may or may not want it to do. I had some problems at the old work place with ROCKS which lead me to removing it and install Debian on the clusters. They were working rock solid, even on hardware which had issues with the ROCKS installation. So, for me the answer to the question is: it depends: If you got a capable HPC admin who is well networked and you got a larger, specialized cluster, you might be better off to use the money and buy some additional compute nodes. For a installation where you do not have a dedicated admin, and you might have a smaller, homogeneous installation, you might be better off with a cluster management tool light the one Bright is offering. If money is an issue, you need to carefully balance the two: a good HPC admin does more than installing software, they do user support as well for example and make sure users can work. If you are lucky, you get one who actually understands what the users are doing. I think that is basically what everybody here says in different words: your mileage will vary. My to shillings from a rather cold London! :-) Jörg Am Dienstag, 1. Mai 2018, 16:57:40 BST schrieb Robert Taylor: > Hi Beowulfers. > Does anyone have any experience with Bright Cluster Manager? > My boss has been looking into it, so I wanted to tap into the collective > HPC consciousness and see > what people think about it. > It appears to do node management, monitoring, and provisioning, so we would > still need a job scheduler like lsf, slurm,etc, as well. Is that correct? > > If you have experience with Bright, let me know. Feel free to contact me > off list or on. ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Re: [Beowulf] Bright Cluster Manager
Jeff White wrote: I never used Bright. Touched it and talked to a salesperson at a conference but I wasn't impressed. Unpopular opinion: I don't see a point in using "cluster managers" unless you have a very tiny cluster and zero Linux experience. These are just Linux boxes with a couple applications (e.g. Slurm) running on them. Nothing special. xcat/Warewulf/Scyld/Rocks just get in the way more than they help IMO. They are mostly crappy wrappers around free software (e.g. ISC's dhcpd) anyway. When they aren't it's proprietary trash. I install CentOS nodes and use Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs and software. This also means I'm not suck with "node images" and can instead build everything as plain old text files (read: write SaltStack states), update them at will, and push changes any time. My "base image" is CentOS and I need no "baby's first cluster" HPC software to install/PXEboot it. YMMV Totally legit opinion and probably not unpopular at all given the user mix on this list! The issue here is assuming a level of domain expertise with Linux, bare-metal provisioning, DevOps and (most importantly) HPC-specific configStuff that may be pervasive or easily available in your environment but is often not easily available in a commercial/industrial environment where HPC or "scientific computing" is just another business area that a large central IT organization must support. If you have that level of expertise available then the self-managed DIY method is best. It's also my preference But in the commercial world where HPC is becoming more and more important you run into stuff like: - Central IT may not actually have anyone on staff who knows Linux (more common than you expect; I see this in Pharma/Biotech all the time) - The HPC user base is not given budget or resource to self-support their own stack because of a drive to centralize IT ops and support - And if they do have Linux people on staff they may be novice-level people or have zero experience with HPC schedulers, MPI fabric tweaking and app needs (the domain stuff) - And if miracles occur and they do have expert level linux people then more often than not these people are overworked or stretched in many directions So what happens in these environments is that organizations will willingly (and happily) pay commercial pricing and adopt closed-source products if they can deliver a measurable reduction in administrative burden, operational effort or support burden. This is where Bright, Univa etc. all come in -- you can buy stuff from them that dramatically reduces that onsite/local IT has to manage the care and feeding of. Just having a vendor to call for support on Grid Engine oddities makes the cost of Univa licensing worthwhile. Just having a vendor like Bright be on the hook for "cluster operations" is a huge win for an overworked IT staff that does not have linux or HPC specialists on-staff or easily available. My best example of "paying to reduce operational burden in HPC" comes from a massive well known genome shop in the cambridge, MA area. They often tell this story: - 300 TB of new data generation per week (many years ago) - One of the initial storage tiers was ZFS running on commodity server hardware - Keeping the DIY ZFS appliances online and running took the FULL TIME efforts of FIVE STORAGE ENGINEERS They realized that staff support was not scalable with DIY/ZFS at 300TB/week of new data generation so they went out and bought a giant EMC Isilon scale-out NAS platform And you know what? After the Isilon NAS was deployed the management of *many* petabytes of single-namespace storage was now handled by the IT Director in his 'spare time' -- And the five engineers who used to do nothing but keep ZFS from falling over were re-assigned to more impactful and presumably more fun/interesting work. They actually went on stage at several conferences and told the story of how Isilon allowed senior IT leadership to manage petabyte volumes of data "in their spare time" -- this was a huge deal and really resonated . Really reinforced for me how in some cases it's actually a good idea to pay $$$ for commercial stuff if it delivers gains in ops/support/management. Sorry to digress! This is a topic near and dear to me. I often have to do HPC work in commercial environments where the skills simply don't exist onsite. Or more commonly -- they have budget to buy software or hardware but they are under a hiring freeze and are not allowed to bring in new Humans. Quite a bit of my work on projects like this is helping people make sober decisions regarding "build" or "buy" -- and in those environments it's totally clear that for some things it makes sense for them to pay for an expensive commercially supported "thing" that they don't have to manage or support themselves My $.02 ...
Re: [Beowulf] Bright Cluster Manager
I never used Bright. Touched it and talked to a salesperson at a conference but I wasn't impressed. Unpopular opinion: I don't see a point in using "cluster managers" unless you have a very tiny cluster and zero Linux experience. These are just Linux boxes with a couple applications (e.g. Slurm) running on them. Nothing special. xcat/Warewulf/Scyld/Rocks just get in the way more than they help IMO. They are mostly crappy wrappers around free software (e.g. ISC's dhcpd) anyway. When they aren't it's proprietary trash. I install CentOS nodes and use Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs and software. This also means I'm not suck with "node images" and can instead build everything as plain old text files (read: write SaltStack states), update them at will, and push changes any time. My "base image" is CentOS and I need no "baby's first cluster" HPC software to install/PXEboot it. YMMV Jeff White On 05/01/2018 01:57 PM, Robert Taylor wrote: Hi Beowulfers. Does anyone have any experience with Bright Cluster Manager? My boss has been looking into it, so I wanted to tap into the collective HPC consciousness and see what people think about it. It appears to do node management, monitoring, and provisioning, so we would still need a job scheduler like lsf, slurm,etc, as well. Is that correct? If you have experience with Bright, let me know. Feel free to contact me off list or on. ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://urldefense.proofpoint.com/v2/url?u=http-3A__www.beowulf.org_mailman_listinfo_beowulf=DwIGaQ=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw=DhM5WMgdrH-xWhI5BzkRTzoTvz8C-BRZ05t9kW9SXZk=2km_EqLvNf2v9rNf8LphAYkJ-Sc_azfEyHqyDIzpLOc=kq0wdhy80VqcBCwcQAAQa0RbsgWIekhd0qU0zC81g1Q= ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Re: [Beowulf] Bright Cluster Manager
On 1 May 2018 at 22:57, Robert Taylorwrote: > Hi Beowulfers. > Does anyone have any experience with Bright Cluster Manager? > I used to work for ClusterVision from which Bright Cluster Manager was born. Although my experience is now quite some years out of date I would still recommend it mainly because Martijn de Vries is still CTO after 8 years and they have a very stable team of gifted developers. The company has a single focus and they have been at it for a long time. Back in the day I was able to deploy a complete cluster within a couple of hours using BCM. All the nodes would boot over PXE and perform an interesting "pivot root" operation to switch to the freshly installed HDD from the PXE target. The software supported roles which would integrate with SLURM allowing GPU node pools for instance. It was quite impressive that people were able to get their code running so quickly. I would say that, as a package, its definitely worth the money unless you have a team of engineers kicking around. The CLI and API were a bit rough and ready but its been 6 years since I last used it. They also managed to successfully integrate OpenStack which is a bit of a feat in its self. ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Re: [Beowulf] Bright Cluster Manager
Chris Samuel says: >I've not used it, but I've heard from others that it can/does supply > schedulers like Slurm, but (at least then) out of date versions. Chris, this is true to some extent. When a new release of Slurm or, say, Singularity is out you need to wait for Bright to package it up and test it works with their setup. This makes sense if you think about it - Bright is a supported product and no company worh their salt would rush out a bleeding edge version of X without testing. I can say that the versions tend to be up to date but not bleeding edge - I cannot give a specific example at the moment, sorry. But as I say above, if it really matters to you, you can install your own version on the master and the node images and create a Module file which brings it into the users PATH. On 2 May 2018 at 09:32, John Hearnswrote: > Robert, > I have had a great deal of experience with Bright Cluster Manager and > I am happy to share my thoughts. > > > My experience with Bright has been as a system integrator in the UK, where > I deployed Bright for a government defence client, > for a university in London and on our in-house cluster for benchmarking > and demos. > I have a good relationship with the Bright employees in the UK and in > Europe. > > Over the last year I have worked with a very big high tech company in the > Netherlands, who use Bright to manage their clusters > which run a whole range of applications. > > I would say that Bright is surprisingly easy to install - you should be > going from bare metal to a functioning cluster within an hour. > The node discovery mecahnism is either to switch on each node in turn and > confirm the name. > Or to note down which port in your Ethernet switch a node is connected to > and Bright will do a MAC address lookup on that port. > Hint - do the Ethernet port mapping. Make a sensible choice of node to > port numbering on each switch. > You of course have to identify the switches also to Bright. > But it is then a matter of switching all the nodes on at once, then go off > for well deserved coffee. Happy days. > > Bright can cope with most network topologies, including booting over > Infiniband. > If you run into problems their support guys are pretty responsive and very > clueful. If you get stuck they will schedule a Webex > and get you out of whatever hole you have dug for yourself. There is even > a reverse ssh tunnel built in to their software, > so you can 'call home' and someone can log in to help diagnose your > problem. > > I back up what Chris Dagdidian says. You pays your money and you takes > your choice. > > Regarding the job scheduler, Brigh comes with pre-packaged and integrated > Slurm, PBSpro, Gridengine and I am sure LSF. > So right out of the box you have a default job scheduler set up. All you > have to do is choose which one at install time. > Bright rather like Slurm, as I do also. But I stress that it works > perfectly well with PBSPro, as I have worked in that environment over the > last year. > Should you wish to install your own version of Slurm/PBSPro etc. you can > do that, again I know this works. > > I also stress PBSPro - this is now on a dual support model, so it is open > source if you dont need the formal support from Altair. > > Please ask some more questions - I will tune in later. > > Also it should be said that if you choose not to go with Bright a good > open source alternative is OpenHPC. > But that is a different beast, and takes a lot more care and feeding. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 2 May 2018 at 01:24, Christopher Samuel wrote: > >> On 02/05/18 06:57, Robert Taylor wrote: >> >> It appears to do node management, monitoring, and provisioning, so we >>> would still need a job scheduler like lsf, slurm,etc, as well. Is >>> that correct? >>> >> >> I've not used it, but I've heard from others that it can/does supply >> schedulers like Slurm, but (at least then) out of date versions. >> >> I've heard from people who like Bright and who don't, so YMMV. :-) >> >> -- >> Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC >> ___ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Re: [Beowulf] Bright Cluster Manager
Robert, I have had a great deal of experience with Bright Cluster Manager and I am happy to share my thoughts. My experience with Bright has been as a system integrator in the UK, where I deployed Bright for a government defence client, for a university in London and on our in-house cluster for benchmarking and demos. I have a good relationship with the Bright employees in the UK and in Europe. Over the last year I have worked with a very big high tech company in the Netherlands, who use Bright to manage their clusters which run a whole range of applications. I would say that Bright is surprisingly easy to install - you should be going from bare metal to a functioning cluster within an hour. The node discovery mecahnism is either to switch on each node in turn and confirm the name. Or to note down which port in your Ethernet switch a node is connected to and Bright will do a MAC address lookup on that port. Hint - do the Ethernet port mapping. Make a sensible choice of node to port numbering on each switch. You of course have to identify the switches also to Bright. But it is then a matter of switching all the nodes on at once, then go off for well deserved coffee. Happy days. Bright can cope with most network topologies, including booting over Infiniband. If you run into problems their support guys are pretty responsive and very clueful. If you get stuck they will schedule a Webex and get you out of whatever hole you have dug for yourself. There is even a reverse ssh tunnel built in to their software, so you can 'call home' and someone can log in to help diagnose your problem. I back up what Chris Dagdidian says. You pays your money and you takes your choice. Regarding the job scheduler, Brigh comes with pre-packaged and integrated Slurm, PBSpro, Gridengine and I am sure LSF. So right out of the box you have a default job scheduler set up. All you have to do is choose which one at install time. Bright rather like Slurm, as I do also. But I stress that it works perfectly well with PBSPro, as I have worked in that environment over the last year. Should you wish to install your own version of Slurm/PBSPro etc. you can do that, again I know this works. I also stress PBSPro - this is now on a dual support model, so it is open source if you dont need the formal support from Altair. Please ask some more questions - I will tune in later. Also it should be said that if you choose not to go with Bright a good open source alternative is OpenHPC. But that is a different beast, and takes a lot more care and feeding. On 2 May 2018 at 01:24, Christopher Samuelwrote: > On 02/05/18 06:57, Robert Taylor wrote: > > It appears to do node management, monitoring, and provisioning, so we >> would still need a job scheduler like lsf, slurm,etc, as well. Is >> that correct? >> > > I've not used it, but I've heard from others that it can/does supply > schedulers like Slurm, but (at least then) out of date versions. > > I've heard from people who like Bright and who don't, so YMMV. :-) > > -- > Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC > ___ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf