(I know this is the SLURM list, but many of the folks here use NHC with SLURM, so I'm hoping it's not a problem. If it is, please accept my humble apologies!)
To all users of Warewulf NHC: [TL;DR: NHC is now its own project with a new name, new Git repository (GitHub AND BitBucket), new mailing lists, and new real-time chat resources! New version 1.4.2 has been released as well. See below for details and URLs!] Since its initial public release in early 2012, our work on the Node Health Check (NHC) tools has been performed and published under the umbrella of the Warewulf Project (http://warewulf.lbl.gov/). This was done for a number of reasons -- sharing of resources, mutual promotion, etc. However, this created one very large and unexpected downside: user confusion. You see, many users interpreted "Warewulf Node Health Check" to mean that it was only intended/suitable for use on cluster nodes managed by Warewulf, or that it required Warewulf in order to function properly, or any number of other misunderstandings, the end result of which was that potential users opted not to give it a try! Our #1 primary goal since the very start was to build a community of users around NHC to share and exchange ideas, health checks, tools, and other code to maximize our collective ability to deliver excellent service and unsurpassed availability to our customers -- the scientists and researchers tasked with no less than literally changing and saving our world, and anything that gets in the way of that goal is a problem. A big problem. So today I am thrilled to announce the creation of an independent project: Lawrence Berkeley National Laboratory (LBNL) Node Health Check. All development work on what used to be Warewulf NHC has now moved over to LBNL NHC and will continue under that identity going forward. As a result, NHC will no longer be making use of Warewulf project resources for its primary development activities. As such, new discussion forums have been created, and users interested in following or discussing the ongoing development of LBNL Node Health Check will want to join these groups, either by subscribing to them as mailing lists or by using the online web forums. Users of NHC should join the Users' list ([email protected] or https://groups.google.com/a/lbl.gov/forum/?hl=en#!forum/nhc), and those interested in following development or contributing code should also join the Developers' list ([email protected] or https://groups.google.com/a/lbl.gov/forum/?hl=en#!forum/nhc-devel). The Users' list will likely be very low-traffic; the Developers' list receives development activity notifications, and will therefore see a bit more traffic, but still no more than a handful of messages each day. Perhaps the most significant and most user-visible change is that the source code repository has been moved over to GitHub. The front page for the repository (which also doubles as the project home page and documentation page) is now at https://github.com/mej/nhc. (For the repository URL, just add ".git" on the end, or use git+ssh://[email protected]/mej/nhc.git if you're a GitHub user.) For those who prefer Atlassian's Bitbucket instead, we have an equivalent site for NHC at https://bitbucket.org/mej0/nhc (git repo https://bitbucket.org/mej0/nhc.git or git+ssh://[email protected]:mej0/nhc.git). This allows anyone wishing to contribute to NHC development -- by fixing bugs, adding new checks, updating documentation, writing unit tests, or even enhancing the default example configuration file -- can now do so quickly and easily using the facilities of Git and GitHub/Bitbucket rather than having to e-mail patches around everywhere. We can finally take advantage of modern development technologies and collaboration features such as Issues, Pull Requests, Forks, and more...and I hope many of you will! Last, but certainly not least, we will be offering multiple options for those who like to use real-time chat for communications, Q&A, and troubleshooting assistance. For the old-school folks who prefer strictly text-based chat, traditional IRC will continue to be available; we have created the channel #lbnl-nhc on irc.freenode.net. Users may also elect to use Gitter instead; GitHub users can access https://gitter.im/mej/nhc with their GitHub credentials. For those familiar with Slack, invitations are available to the NHC Slack instance (at https://mej.slack.com/messages/nhc/) by contacting me privately. And finally we are testing out Ryver (ryver.com) as a possible alternative to Slack, so if you're interested in using/helping test our Ryver instance (at https://lbnl.ryver.com/index.html#channel/4), let me know! To top it all off, we've released version 1.4.2 of LBNL Node Health Check to kick things off right! This release offers a couple new checks, new features for some of the existing checks, and of course completely updated/refactored MarkDown-based documentation for the move to GitHub. As you might expect, the packages are now named "lbnl-nhc" instead of "warewulf-nhc;" triggers have been added to the RPMs which will seamlessly handle renaming the upstream scripts files (in /etc/nhc/scripts/*.nhc), but users installing from the tarball may need to rename some files by hand. To access the source tarballs and/or RPMs for this release, you can download them either from GitHub (https://github.com/mej/nhc/releases/tag/1.4.2) or from JFrog's Artifactory instance, BinTray (https://bintray.com/lbnl/nhc-src and https://bintray.com/lbnl/nhc-rpm). Whew, that's a lot of information! If you have any questions, please feel free to reply to this e-mail or join one of the above resources to discuss! And for those attending the Supercomputing 2015 conference in Austin next week, I'll be there as well and plan to attend the SLURM BoF, and I look forward to the opportunity to chat with you there! Best regards, Michael LBNL NHC Project Lead Developer Warewulf Project Developer -- Michael Jennings <[email protected]> Senior HPC Systems Engineer High-Performance Computing Services Lawrence Berkeley National Laboratory Bldg 50B-3209E W: 510-495-2687 MS 050B-3209 F: 510-486-8615
