[Beowulf] ADMIN: Beowulf mailing list outage - now resolved

2023-09-15 Thread Chris Samuel

Hi all,

Unfortunately we had an unexpected outage for the Beowulf mailing list 
from last weekend through to today. As you may know this list was until 
a few years ago run by Don Becker at Penguin Computing and I took it 
over after he'd left and it had been running on autopilot.


It now runs on a VM I provide and Penguin had kindly delegated the DNS 
to the hosting company I use, but it appears there was confusion over 
who owned the domain and its registration lapsed (I suspect on the 
weekend). I realised on Wednesday and after some frantic emailing of 
people at both Penguin and in the community (thanks Doug, Lara!) contact 
was re-established and Penguin have kindly renewed the domain for 5 
years and we're back in business.


Apologies for this!  I should have realised that I wasn't getting the 
usual spam that gets sent to the admin address earlier but I was out at 
the Slurm Users Group. :-D


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


[Beowulf] xCAT closing up shop

2023-09-02 Thread Chris Samuel

Hi all,

Sad to hear that the xCAT developers are moving on and having to call it 
a day:


https://sourceforge.net/p/xcat/mailman/xcat-user/thread/MW4PR15MB51826CAD47B7E44D0D808F01F7E4A%40MW4PR15MB5182.namprd15.prod.outlook.com/#msg37890495


Mark Gurevich, Peter Wong, and I have been the primary xCAT maintainers for the 
past few years. This year, we have moved on to new roles unrelated to xCAT and 
can no longer continue to support the project. As a result, we plan to archive 
the project on December 1, 2023. xCAT 2.16.5, released on March 7, 2023, is our 
final planned release.

We would consider transitioning responsibility for the project to a new group 
of maintainers if members of the xCAT community can develop a viable proposal 
for future maintenance.

Thank you all for you support of the project over the past 20+ years.


I first came across xCAT in the mid-2000s when IBM brought Egan Ford to 
Melbourne to talk to folks from VPAC (where I was) and Monash Uni about 
it in light of a cluster that Monash had bought and we were going to 
help run.


The first system I brought up with it from scratch was at the start of 
2010 when I'd moved to VLSCI and was bringing up our first machine, an 
SGI Altix XE cluster. I was very pleasantly surprised to find that it 
didn't really care that it wasn't IBM hardware. :-)  We used it on the 
rest of our systems from then on, both for the HPC systems (IBM 
iDataplex's and for deploying all the LPARs needed to run and use our 
BlueGene/Q) as well as the infrastructure side (GPFS NSD servers and TSM 
servers for backup and HSM use).


I do seem to remember it took a bit of persuading to get the statelite 
configs to work with the iDataplex nodes that had Knights Corner Xeon 
Phi cards in them, but because it was open source and written in Perl we 
got it to work.


It'll be interesting to see if others do step up to keep it going, I 
know on the ACM SIGHPC SysPro Slack there's been some noises from folks 
who seem interested.


All the best,
Chris
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Your thoughts on the latest RHEL drama?

2023-06-27 Thread Chris Samuel

On 26/6/23 11:38, Joe Landman wrote:

This was likely aimed at the other folks like Oracle who are making 
money off of rebuilds and not so much at Alma/Rocky.  Those are 
collateral damage.


From memory (insert sirens, klaxons and other warnings sounds here) 
Oracle was the target for Red Hat's obfuscating of their kernel sources 
to make it harder for them to do their kernel variants.


It would horribly ironic if this move pushed more people towards using 
OL given Oracle seem to give that away for free. :-/


I've not had to use RHEL since leaving Australia for the US but my 
experience with their support was pretty poor up to that point. They had 
a nasty habit of breaking Mellanox drivers and not being able to fix 
them for extended periods of time (we had one in RHEL5 that was still 
unresolved when we went to RHEL6, and one in RHEL6 that would crash our 
PPC64 BG/Q management node in RHEL 6.2 through 6.4 before it got fixed - 
we were stuck on a RHEL 6.1 kernel until they finally fixed it). 
Hopefully things have improved since then.


All the best,
Chris
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Understanding environments and libraries caching on a beowulf cluster

2022-06-28 Thread Chris Samuel

On 28/6/22 11:44 am, leo camilo wrote:

My time indeed has a cost, hence I will favour a "cheap and dirty" 
solution to get the ball rolling and try something fancy later.


One thing I'd add is the use of some sort of cluster management system 
can be very handy to let you manage things as a whole. I've never used 
Qlustar that Tony mentioned but it does look interesting from a quick 
scan of the website.


I'm also a big fan of booting nodes from a standard image as a ramdisk 
and then mounting the filesystems you need containing apps and user 
files from some sort of shared storage.


The benefit with using standard images is that it's very easy to keep 
everything in step, you don't get gradual configuration drift as changes 
are made to some nodes and not others (perhaps one was down for some 
hardware work at one point and so a change couldn't be applied, etc, etc).


Best of luck!
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


[Beowulf] Administrivia: disabling of monthly password reminders

2022-04-11 Thread Chris Samuel

Hi all,

We've had some issues with some provider mistakenly marking Mailman 
password reminders as spam and (from the limited info I can glean) also 
causing us to get marked as being of poor reputation for a while (though 
the last case appeared to have cleared up relatively quickly).


Because of this I've taken the liberty of disabling the monthly password 
reminders for the list itself. You should still be able to request one 
via the web interface should you need it.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] List archives

2021-08-17 Thread Chris Samuel
Hi John,

On Monday, 16 August 2021 12:57:20 AM PDT John Hearns wrote:

> The Beowulf list archives seem to end in July 2021.
> I was looking for Doug Eadline's post on limiting AMD power and the results
> on performance.

I just went through the archives for July and compared them with what I have 
in my inpile and as far as I can tell there's nothing missing. There was a 
thread from June with the subject "AMD and AVX512", perhaps that's what you're 
thinking of?

https://www.beowulf.org/pipermail/beowulf/2021-June/thread.html

Your email from today & my earlier reply are in the archives for August.

All the best!
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


[Beowulf] Administrivia: fixed up issue with some people being unable to email beowulf.org

2020-12-10 Thread Chris Samuel
Hi all,

I had two separate people contact me today via mutual friends about problems 
either contacting the list owner address here when trying to subscribe or 
sending to the list.

I went digging and found that I'd missed creating a directory for the greylist 
software during the transition from the old system to the new VM which meant 
that some folks were getting temporary failures back blocking their email 
until it eventually bounced (as the greylist software was unable to create its 
database).

Most subscribers were not affected as I'd modified the code to check whether 
the 
address was subscribed to the list already and bypass the greylist if they 
were, but there appear to be some edge cases it wasn't catching.

I believe I've fixed this now, apologies to those affected!

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


[Beowulf] RIP CentOS 8

2020-12-08 Thread Chris Samuel

Hi folks,

It looks like the CentOS project has announced the end of CentOS 8 as a 
version that tracked RHEL for the end of 2021, it will be replaced by 
the CentOS stream which will run ahead of RHEL8. CentOS 7 is unaffected 
(though RHEL7 only has 3 more years of life left).


https://blog.centos.org/2020/12/future-is-centos-stream/

> The future of the CentOS Project is CentOS Stream, and over the
> next year we’ll be shifting focus from CentOS Linux, the rebuild
> of Red Hat Enterprise Linux (RHEL), to CentOS Stream, which
> tracks just ahead of a current RHEL release. CentOS Linux 8, as
> a rebuild of RHEL 8, will end at the end of 2021. CentOS Stream
> continues after that date, serving as the upstream (development)
> branch of Red Hat Enterprise Linux.
>
> Meanwhile, we understand many of you are deeply invested in
> CentOS Linux 7, and we’ll continue to produce that version through
> the remainder of the RHEL 7 life cycle.

I always thought that Fedora was meant to be that upstream for RHEL, but 
perhaps the arrangement now will be Fedora -> CentOS -> RHEL.


I wonder where this leaves the Lustre project, currently they only 
support RHEL7/CentOS7 as the server, and more interestingly, people who 
build Lustre appliances on top of CentOS.


Then there's the question of projects like OpenHPC who've only just 
announced support for CentOS8 (and OpenSuSE15). They could choose to 
track CentOS Stream instead, probably without too much effort.


I do wonder if this opens the door for the return of something like 
Scientific Linux.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] pdsh

2020-11-29 Thread Chris Samuel
Hi Jim,

On Sunday, 29 November 2020 2:31:18 PM PST Lux, Jim (US 7140) via Beowulf 
wrote:

> Today,
> https://code.google.com/archive/p/pdsh/
> is where to go.

I think code.google.com is a read-only archive now, it stopped in 2016 (see 
https://killedbygoogle.com/ for more info).  It looks like pdsh is now on 
Github here:

https://github.com/chaos/pdsh

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] RoCE vs. InfiniBand

2020-11-26 Thread Chris Samuel
On Thursday, 26 November 2020 3:14:05 AM PST Jörg Saßmannshausen wrote:

> Now, traditionally I would say that we are going for InfiniBand. However,
> for reasons I don't want to go into right now, our existing file storage
> (Lustre) will be in a different location. Thus, we decided to go for RoCE
> for the file storage and InfiniBand for the HPC applications.

I think John hinted at this, but is there a reason for not going for IB for 
the cluster and then using Lnet routers to connect out to the Lustre storage 
via ethernet (with RoCE) ?

https://wiki.lustre.org/LNet_Router_Config_Guide

We use Lnet routers on our Cray system to bridge between the Aries 
interconnect inside the XC to the IB fabric our Lustre storage sits on.

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


[Beowulf] Administrivia: Beowulf list moved to new server

2020-11-22 Thread Chris Samuel

Hi all,

Today I've moved the Beowulf VM across to its new home at Rimuhosting 
(an NZ based hosting company, using their Dallas DC). I'm hoping that it 
should be transparent to everyone (though it'll hopefully perform a 
little better as it's got 3 times the memory).


A number of you have asked about contributing to costs, it's very kind 
of you all but the monthly cost is less than I'd pay a week for coffee 
were I not working from home, so please don't worry. If that ever gets 
to be a problem then I think I'll have more to worry about than the 
list. :-)


Please do let me know if you come across problems! A message directly to 
me would be best rather than spamming the list.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Administrivia: update on the beowulf list

2020-11-21 Thread Chris Samuel

On 21/11/20 12:27 pm, Chris Samuel wrote:

As part of that I'll be doing some upgrades on the current VM over the 
weekend to bring it up to the same Debian version as on the new VM to 
ease the transition so there may be some disruption and downtime to the 
list & associated webserver, I'll try and keep that to a minimum.


This work is done and we're now on Debian Buster, the current release.

I'm hoping to transition the VM over to the new hosting service tomorrow 
and have reduced the TTL on our DNS records so we can cut over quickly.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


[Beowulf] Administrivia: update on the beowulf list

2020-11-21 Thread Chris Samuel

Hi all,

A quick update on where we are with the list, I've had very useful 
discussions with my contact at Penguin about the DNS issue and it seems 
that it's not resolvable, and so we've agreed that I'll be moving the 
list to a VM I will provide at my current hosting provider where I will 
have DNS control so we don't have this happen again.


As part of that I'll be doing some upgrades on the current VM over the 
weekend to bring it up to the same Debian version as on the new VM to 
ease the transition so there may be some disruption and downtime to the 
list & associated webserver, I'll try and keep that to a minimum.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Administrivia: chasing down DNS problems for the beowulf list

2020-11-08 Thread Chris Samuel

Hi Jonathan,

On 11/8/20 9:47 pm, Jonathan Aquilina via Beowulf wrote:

I am wondering if there is a way we can have a backup dns provider in 
the sense if there are issues like this dns resolution can be done 
through another provider?


The problem is in the authoritative server for the reverse lookups zone, 
and so the existing backup (secondary) DNS server just has the same 
incorrect info.


I'll update once it's fixed (Penguin are the only ones who can).

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Administrivia: chasing down DNS problems for the beowulf list

2020-11-08 Thread Chris Samuel

Hi Darren,

On 11/8/20 2:09 pm, Darren Wise wrote:

No worries at all, have you dug with dig and compared to the server 
records. Hate sticking my nose in but could be why they will fix 
tomorrow while awaiting for said wherever hosting provider is to have a 
record refresh.


I'm not sure, it might just be down to the availability of the folks at 
Penguin who admin their DNS. As this hosting is their gift to the list 
(thank you!) I don't think we can expect 24x7 support. :-)


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Administrivia: chasing down DNS problems for the beowulf list

2020-11-08 Thread Chris Samuel

Hi John,

On 11/8/20 3:40 pm, Jonathan Engwall wrote:


I see nothing wrong with the website.


Nothing to do with the website, it's a DNS issue. The PTR record for the 
IP address is missing from the in-addr.arpa domain.


Penguin will hopefully fix this tomorrow.

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Administrivia: chasing down DNS problems for the beowulf list

2020-11-08 Thread Chris Samuel

On 11/8/20 1:06 pm, Chris Samuel wrote:

I've had no response back from them sadly, and we've started shedding a 
fair number of subscribers from the list because of this issue.  I've 
sent a query on to their postmaster to see if they can help establish 
contact.


Had a response already (my contact has been crazy busy, which I can 
sympathise with a lot!), issue will get looked at tomorrow.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Administrivia: chasing down DNS problems for the beowulf list

2020-11-08 Thread Chris Samuel

On 10/18/20 10:07 am, Chris Samuel wrote:


Just a quick heads up that some folks will be having issues receiving email
from the list as beowulf.org seems to have lost its reverse DNS entry and many
subscribers email systems won't accept email from sites without that.

I've just emailed the person I had contact with last at Penguin to see if this
can be resolved (ahem), in the meantime I've disabled Mailman's automatic
processing of bounce messages so we don't lose people because of this issue.


I've had no response back from them sadly, and we've started shedding a 
fair number of subscribers from the list because of this issue.  I've 
sent a query on to their postmaster to see if they can help establish 
contact.


I'm really sorry about this! :-(

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


[Beowulf] Administrivia: chasing down DNS problems for the beowulf list

2020-10-18 Thread Chris Samuel
Hi all,

Just a quick heads up that some folks will be having issues receiving email 
from the list as beowulf.org seems to have lost its reverse DNS entry and many 
subscribers email systems won't accept email from sites without that.

I've just emailed the person I had contact with last at Penguin to see if this 
can be resolved (ahem), in the meantime I've disabled Mailman's automatic 
processing of bounce messages so we don't lose people because of this issue.

chris@quad:~$ host beowulf.org
beowulf.org has address 12.53.5.80
beowulf.org mail is handled by 10 beowulf.org.

chris@quad:~$ host 12.53.5.80
Host 80.5.53.12.in-addr.arpa. not found: 3(NXDOMAIN)

Looking at my home email inpile I can see this started happening a little 
while ago but work and taxes has occupied all my time so I've only just 
noticed. :-(

Apologies for this!

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] experience with HPC running on OpenStack

2020-06-30 Thread Chris Samuel

On 29/6/20 5:09 pm, Jörg Saßmannshausen wrote:


we are currently planning a new cluster and this time around the idea was to
use OpenStack for the HPC part of the cluster as well.

I was wondering if somebody has some first hand experiences on the list here.


At $JOB-2 I helped a group set up a cluster on OpenStack (they were 
resource constrained, they had access to OpenStack nodes and that was 
it).  In my experience it was just another added layer of complexity for 
no added benefit and resulted in a number of outages due to failures in 
the OpenStack layers underneath.


Given that Slurm which was being used there already had mature cgroups 
support there really was no advantage to them to having a layer of 
virtualisation on top of the hardware, especially as (if I'm remembering 
properly) in the early days the virtualisation layer didn't properly 
understand the Intel CPUs we had and so didn't reflect the correct 
capabilities to the VM.


All that said, these days it's likely improved, and I know then people 
were thinking about OpenStack "Ironic" which was a way for it to manage 
bare metal nodes.


But I do know the folks in question eventually managed to go to purely a 
bare metal solution and seemed a lot happier for it.


As for IB, I suspect that depends on the capabilities of your 
virtualisation layer, but I do believe that is quite possible. This 
cluster didn't have IB (when they started getting bare metal nodes they 
went RoCE instead).


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Neocortex unreal supercomputer

2020-06-13 Thread Chris Samuel

On 13/6/20 10:11 pm, Jonathan Engwall wrote:


There is the strange part. How to utilize such a vast cpu?
Storage should be the back end, unless the use is an api. In this case a 
gargantuan cpu sits in back, or so it seems.


My guess is that this sits connected to the server, they load an 
algorithm on to it and they shovel data at it over the vast number of 
network cards and eventually it comes back with an answer. Hopefully 
their acceptance test will say "42".


--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Neocortex unreal supercomputer

2020-06-13 Thread Chris Samuel

On 13/6/20 7:58 pm, Fischer, Jeremy wrote:

It’s my understanding that NeoCortex is going to have a petabyte or two 
of NVME disk sitting in front of it with some HPE hardware and then 
it’ll utilize the queues and lustre file system on Bridges2 as its front 
end.


There's more information here:

https://www.psc.edu/3206-nsf-funds-neocortex-a-groundbreaking-ai-supercomputer-at-psc-2

# Neocortex will use the HPE Superdome Flex, an extremely powerful,
# user-friendly front-end high-performance computing (HPC) solution
# for the Cerebras CS-1 servers. This will enable flexible pre- and
# post-processing of data flowing in and out of the attached WSEs,
# preventing bottlenecks and taking full advantage of the WSE
# capability. HPE Superdome Flex will be robustly provisioned with
# 24 terabytes of memory, 205 terabytes of high-performance flash
# storage, 32 powerful Intel Xeon CPUs, and 24 network interface
# cards for 1.2 terabits per second of data bandwidth to each
# Cerebras CS-1.

The way it reads both of these CS-1's will sit behind that single Flex.

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] NFS over IPoIB

2020-06-12 Thread Chris Samuel
On Friday, 12 June 2020 2:36:26 PM PDT John McCulloch wrote:

> It is my understanding that setting MTU to 9000 is recommended but that
> seems to be applicable for 10GbE.

That depends how you're running your IB fabric.  In datagram mode (which I 
think is the default these days) you're limited to an MTU of 2044 bytes, but 
in connected mode (which used to be the default) you could get (from memory) a 
64KB MTU.

Back at VLSCI we ran with connected mode and 64KB MTUs with GPFS running on 
IPoIB (it was before you could run GPFS multi-homed on different IB fabrics 
with RDMA on both so we just ran it over TCP/IP instead).

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


[Beowulf] Sad news - RIP Rich Brueckner

2020-05-25 Thread Chris Samuel
Hi all,

I've learned via @HPC_Guru on Twitter tonight that Rich Brueckner (the guy in
the red hat), who ran InsideHPC and InsideBigData, passed away in Portland,
Oregon on Wednesday (20th May).

https://obits.oregonlive.com/obituaries/oregon/obituary.aspx?n=richard-a-brueckner=196230633

I don't think I ever had the pleasure of meeting Rich (though I certainly
saw him around SC a lot, busily interviewing people for InsideHPC), but
I know many people on the list will likely have done so.

His obituary sasy "Rich's family asks that donations be made to
Multnomah County Animal Services.".   Two years ago Rich made a
fundraising film for them as well.

http://www.oncetherewasagiant.com/

The Multnomah County Animal Services website is here:

https://multcopets.org/

You can make a donation via that site should you wish.

All the best
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


[Beowulf] Reframe (was Re: [External] Re: Intel Cluster Checker)

2020-04-30 Thread Chris Samuel

On 4/30/20 12:14 pm, John Hearns wrote:

Thanks Chris.  I worked in one place which was setting up Reframe. It 
looked to be complicated to get running.

Has this changed?


To be honest I am not sure, another team at NERSC set it up so I just 
check out our local git repo and run it with:


reframe.py -c checkout

and it automagically figures out which system it's on and runs the 
appropriate checkout tests.


It used to be more complicated to start but they spent time configuring 
that to avoid the need to specify the system name, etc.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] [External] Re: Intel Cluster Checker

2020-04-30 Thread Chris Samuel

On 4/30/20 6:54 am, John Hearns wrote:


That is a four letter abbreviation...


Ah you mean an ETLA (Extended TLA).

I've not used ICC but we do use Reframe (from CSCS) at work for testing 
both between maintenances on our test system for changes we're making 
and also after the maintenance as a checkout before opening the system 
back up to users. It's proved very useful.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Illegal instruction (signal 4)

2020-03-24 Thread Chris Samuel

On 24/3/20 7:55 pm, Jonathan Engwall wrote:

Building it was not a problem, it install a binary in /usr/local/bin, 
mpich makes a handshake...then I see Illegal instruction (signal 4).


That usually means the application is trying to execute an instruction 
that's not supported on your CPU.  I don't know if the BSD's overload 
that in any way, but I'd be surprised if they did.


I've not touched the *BSD's since the 90's, so I don't think there's 
much useful advice I could offer other than to try their mailing lists 
(unless someone here has better ideas).


Which BSD are you using?

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Have machine, will compute: ESXi or bare metal?

2020-02-09 Thread Chris Samuel

On 9/2/20 10:36 pm, Benson Muite wrote:


Take a look at the bootable cluster CD here:
http://www.littlefe.net/


From what I can see BCCD hasn't been updated for just over 5 years, and 
the last email on their developer list was Feb 2018, so it's likely a 
little out of date now.


http://bccd.net/downloads

http://bccd.net/pipermail/bccd-developers/

On the other hand their TRAC does list some ticket updates a few months 
ago, so perhaps there are things going on but Skylar needs more hands?


https://cluster.earlham.edu/trac/bccd-ng/report/1?sort=created=0=1

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] [EXTERNAL] Re: Interactive vs batch, and schedulers

2020-01-16 Thread Chris Samuel

On 16/1/20 9:35 pm, Lux, Jim (US 337K) via Beowulf wrote:

And I suppose there’s no equivalent of “timeslicing” where the cores run 
job A for 99% of the time and job B, C, D, E, F, for 1% of the time.


Slurm has a gang scheduling mode which sounds a little like what you're 
asking for (though it looks like each job will get an equal slice of 
time defined by the "SchedulerTimeSlice" parameter).


https://slurm.schedmd.com/gang_scheduling.html

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Interactive vs batch, and schedulers

2020-01-16 Thread Chris Samuel

On 16/1/20 3:24 pm, Lux, Jim (US 337K) via Beowulf wrote:

What I’m interested in is the idea of jobs that, if spread across many 
nodes (dozens) can complete in seconds (<1 minute) providing essentially 
“interactive” access, in the context of large jobs taking days to 
complete.   It’s not clear to me that the current schedulers can 
actually do this – rather, they allocate M of N nodes to a particular 
job pulled out of a series of queues, and that job “owns” the nodes 
until it completes.  Smaller jobs get run on (M-1) of the N nodes, and 
presumably complete faster, so it works down through the queue quicker, 
but ultimately, if you have a job that would take, say, 10 seconds on 
1000 nodes, it’s going to take 20 minutes on 10 nodes.


But doesn't that depend a lot on what the user asks for, or am I 
misunderstanding?


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] [EXTERNAL] Re: Is Crowd Computing the Next Big Thing?

2019-11-30 Thread Chris Samuel

On 30/11/19 6:27 pm, Douglas Eadline wrote:


The most interesting thing I learned was how well
some laptops functioned for a "users needs" while technically
in a state of "brokenness" There is a larger lesson there.


This is why I'm a big big fan of compute nodes booting from a set image 
each time, we did it at VLSCI with xCAT and its "statelite" target (so 
we could keep GPFS metadata & other state on an NFS mount from the mgmt 
node for easy booting) with our SGI and IBM hardware and it worked 
really nicely.


At least then everything should be identically broken. ;-)
(and you only need to fix something in one place)

Similar approach here at NERSC with Cray ansible (convergent evolution). 
We keep our recipes/definitions/etc in git and reuse them across systems 
(as much as possible) with config information abstracted out to define 
personalities for image builds and for boot.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Is Crowd Computing the Next Big Thing?

2019-11-28 Thread Chris Samuel

On 28/11/19 5:08 am, Dernat Rémy wrote:

this works only when the phone is connected via WiFi, meaning that it 
doesn’t chew up data-plan data ever.


This assumes you live in a part of the world where you can have an 
unmetered Internet connection.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] [EXTERNAL] Re: Is Crowd Computing the Next Big Thing?

2019-11-27 Thread Chris Samuel

On 27/11/19 9:50 am, Lux, Jim (US 337K) via Beowulf wrote:


Wasn't there a minor scandal a year or so ago about websites mining bitcoin in 
the background using user resources? And some phone apps doing the same?


Javascript cryptominers are a thing, and Firefox tries to block them 
automatically.


https://blog.mozilla.org/firefox/block-cryptominers-with-firefox/

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] MS-DOS DOSBOX

2019-11-07 Thread Chris Samuel
Hi John,

On Tuesday, 5 November 2019 5:22:14 PM PST Jonathan Engwall wrote:

> Yesterday I raised DosBox cpu emulation to nearly 600 megahertz with frame
> skipping at 10, and found DosBox still useable. Can this cpu emulation be
> verified somehow?

I'm curious & have to know - what are you using DosBox for on your cluster?

It's not unheard of, over a decade ago when I was at VPAC in Melbourne we 
installed Wine on our clusters so that a group could run the Windows command 
line code "LatentGold" on our x86 Linux clusters.  Apparently worked a treat!

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Rsync - checksums

2019-09-30 Thread Chris Samuel

On 30/9/19 5:55 pm, Stu Midgley wrote:

That's pretty awesome, are you going to make it available?  or push it 
upstream?


If possible it'd be good to try and get it upstream, probably worth 
posting on the rsync list to get advice.


https://lists.samba.org/mailman/listinfo/rsync

All the best!
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Cluster stack based on Ansible. Looking for feedback.

2019-09-13 Thread Chris Samuel
On Thursday, 12 September 2019 2:00:00 PM PDT Oxedions wrote:

> Thank you for reading this very long and boring mail.

That wasn't boring at all, thanks so much for taking the time for putting it 
together!   It sounds like a fun project, I do hope you get some interest.

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Centos 7 news

2019-08-30 Thread Chris Samuel
On Friday, 30 August 2019 10:27:08 PM PDT Jonathan Engwall wrote:

> 1300+ packages marked for update, if this effects you.

It looks like CentOS 7.7 has just come out.

https://lists.centos.org/pipermail/centos/2019-August/173310.html

Heaps of announcements of the new packages here:

https://lists.centos.org/pipermail/centos-cr-announce/2019-August/thread.html

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] GPUs Nvidia C2050 w/OpenMP 4.5 in cluster

2019-08-12 Thread Chris Samuel
On Monday, 12 August 2019 9:44:43 AM PDT Tony Travis wrote:

> Well, I'm still using my nVidia C2050/75's under Ubuntu-MATE 18.04 LTS:

Ubuntu also has a gcc-offload-nvptx package to (apparently) make this work.

For instance on my Kubuntu 19.04 desktop here at home:

chris@quad:~$ apt show gcc-offload-nvptx
Package: gcc-offload-nvptx
Version: 4:8.3.0-1ubuntu3
Priority: optional
Section: universe/devel
Description: GCC offloading compiler to NVPTX
 This package contains libgomp plugin for offloading to NVidia
 PTX. The plugin needs libcuda.so.1 shared library that has to be
 installed separately.
 .
 This is a dependency package providing the default GNU Objective-C compiler.

I can't test it as I don't have an nvidia GPU. :-)

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Lustre on google cloud

2019-07-27 Thread Chris Samuel
On Saturday, 27 July 2019 10:07:14 PM PDT Jonathan Aquilina wrote:

> What would be the reason for getting such large data sets back on premise?
> Why not leave them in the cloud for example in an S3 bucket on amazon or
> google data store.

Provider independent backup?

-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Lustre on google cloud

2019-07-27 Thread Chris Samuel
On Friday, 26 July 2019 4:46:56 AM PDT John Hearns via Beowulf wrote:

> Terabyte scale data movement into or out of the cloud is not scary in 2019.
> You can move data into and out of the cloud at basically the line rate of
> your internet connection as long as you take a little care in selecting and
> tuning your firewalls and inline security devices.  Pushing  1TB/day etc. 
> into the cloud these days is no big deal and that level of volume is now
> normal for a ton of different markets and industries.

Whilst this is true as Chris points out this does not mean that there won't be 
data transport costs imposed by the cloud provider (usually for egress).

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] flatpack

2019-07-22 Thread Chris Samuel

On 22/7/19 10:40 pm, Jonathan Aquilina wrote:


So in a nut shell this is taking dockerization/ containerization and
making it more for the every day Linux user instead of the HPC user?


I don't think this goes as far as containers with isolation, as I think 
that's not what they're trying to do. But it does seem they're thinking 
along those lines.



It would be interesting to have a distro built around such a setup.


I think this is targeting cross-distro applications.  With all the 
duplication of libraries, etc, a distro using it would be quite bulky.


Also may you have a similar security as containers have, whereby when a 
vulnerability is found and patched in an application or library you end 
up with lots of people out there still running the vulnerable version.


This is why distros tend to discourage "vendoring" of libraries as that 
tends to fossilise vulnerabilities into an application whereas if people 
use the version provided in the distro the maintainers only need to fix 
it in that one package and everyone who links against it benefits.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Lustre on google cloud

2019-07-22 Thread Chris Samuel

On 22/7/19 10:31 pm, Jonathan Aquilina wrote:


I am sure though that with the GUI side of things through the console I am sure 
it makes things a lot easier to setup and manage no?


You would hope so!  Although I've got to say with my limited experience 
of Lustre when you're running it you pretty quickly end up poking 
through the entrails of the Linux kernel trying to figure out what's 
going on when it's not behaving right. :-)


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] flatpack

2019-07-22 Thread Chris Samuel

On 22/7/19 10:26 pm, Jonathan Aquilina wrote:

Hi Guys, I think I might be a bit tardy to the party here, but the way 
you describe flatpack is equivalent to the portable apps on windows is 
my understanding correct?


It seems that way, with an element of sandboxing to try and protect the 
user who is using these packages.  The Debian/Ubuntu package describes 
it thus:


 Flatpak installs, manages and runs sandboxed desktop application bundles.
 Application bundles run partially isolated from the wider system, using
 containerization techniques such as namespaces to prevent direct access
 to system resources. Resources from outside the sandbox can be accessed
 via "portal" services, which are responsible for access control; for
 example, the Documents portal displays an "Open" dialog outside the
 sandbox, then allows the application to access only the selected file.
 .
 Each application uses a specified "runtime", or set of libraries, which is
 available as /usr inside its sandbox. This can be used to run application
 bundles with multiple, potentially incompatible sets of dependencies 
within

 the same desktop environment.
 .
 This package contains the services and executables needed to install and
 launch sandboxed applications, and the portal services needed to provide
 limited access to resources outside the sandbox.

There's also more about it here:

http://docs.flatpak.org/en/latest/basic-concepts.html

The downside (from the HPC point of view) is that these binaries will 
need to be compiled for a relatively low common denominator of 
architecture (or with a compiler that can do optimisations selected at 
runtime depending on the architecture).


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Lustre on google cloud

2019-07-22 Thread Chris Samuel

On 22/7/19 10:12 pm, Jonathan Aquilina wrote:


I am aware of that as I follow their youtube channel.


Fair enough, others may not. :-)


I think my main query is compared to managing a cluster in house is this the 
way forward be it AWS or google cloud?


I think the answer there is likely "it depends".  The reasons may not 
all be technical either, you may be an organisation from outside the US 
that cannot allow your data to reside offshore, or be held by a US 
company subject to US law even if data is not held in the US.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


[Beowulf] IBM alert for GPFS crashes on RHEL 7.6

2019-06-12 Thread Chris Samuel

Hi folks,

A heads up for folks running GPFS on RHEL7.6 (and I guess derivatives) 
from a colleague on the Australian HPC Slack:


https://www-01.ibm.com/support/docview.wss?uid=ibm10887213=s033=OCSTXKQY=E_sp=s033-_-OCSTXKQY-_-E

"IBM has identified an issue in IBM Spectrum Scale (GPFS) version that 
support RHEL7.6 (4.2.3.13 or later and 5.0.2.2 or later), in which a 
RHEL7.6 node running kernel versions 3.10.0-957.19.1 or higher, 
including 3.10.0-957.21.2, may encounter a kernel crash while running an 
IO operations."


Basically it looks like they've fallen foul of this hardening fix:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9548906b

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] A careful exploit?

2019-06-11 Thread Chris Samuel

On 11/6/19 8:18 pm, Robert G. Brown wrote:


* Are these real hosts, each with their own network interface (wired or
wireless), or are these virtual hosts?


In addendum to RGB's excellent advice and questions I would add to this 
question the network engineers maxim of "start at layer 1 and work up".


In other words, first check your physical connectivity and then head up 
the layers.


Best of luck!
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


[Beowulf] Cray job in Canberra

2019-06-06 Thread Chris Samuel

Hi all,

A friend of mine at Cray in Australia (who I knew before they moved 
there) let me know that they're looking to recruit someone to work in 
Canberra.  You've got to be an Australian though and willing to get a 
security clearance.


https://www.cray.com/company/careers/job-details?Req_Code=19-0119

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


[Beowulf] Performance impacts of Zombieload mitigation

2019-05-18 Thread Chris Samuel

Hi folks,

There are initial benchmark results on the Zombieload mitigations up on 
Phoronix, unsurprisingly it looks like context switches & IPC take the 
brunt of the impact.


https://www.phoronix.com/scan.php?page=news_item=MDS-Zombieload-Initial-Impact

They've expanded their coverage here now (not had time to read yet).

https://www.phoronix.com/scan.php?page=article=mds-zombieload-mit=1

They don't seem to benchmark any CPU intensive code so it'll be 
interesting to see how this impacts MPI & multithreaded HPC codes.


All the best!
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] HPE to acquire Cray

2019-05-18 Thread Chris Samuel

On 17/5/19 12:37 pm, Jonathan Aquilina wrote:


That is my biggest fear for centos to be fair with the IBM RH acquisition.


I think IBM at least seems to get open source, especially around Linux. 
Now if it was Oracle who had bought them...


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] HPE to acquire Cray

2019-05-18 Thread Chris Samuel

On 17/5/19 8:31 am, Kilian Cavalotti wrote:


Such an acquisition surely can't be done without an impact on such
monumental projects, and I'm wondering what route HPE will follow
there.


I would guess there would be contractual obligations around these that 
HPE would inherit.  I'd also note that Blue Waters is a counter-example 
of a case where the initial vendor (IBM in this case) didn't get bought 
but it still didn't guarantee delivery.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Frontier Announcement

2019-05-12 Thread Chris Samuel
On Thursday, 9 May 2019 7:51:26 AM PDT Tony Brian Albers wrote:

> Please stop, both of you.

Sorry for not seeing this before, was away at the Cray User Group and so not 
keeping up with email (I know, what else is new).

I'm disturbed by this thread, distressed at the threats that have been 
mentioned and concerned by the way the thread developed.

I understand from close family experience how receiving threats of violence 
can set the person up for unexpected behaviours later on due to inoffensive 
triggers that can cause hurt, offence and anxiety amongst others and how 
distressing that can be to those on the receiving end of them. 

That said, I expect this to be the end of this branch of this thread.  No 
responses please, either to the list or privately, this is the end of the 
matter.

Thank you,
Chris (list administrator)
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

2019-05-02 Thread Chris Samuel

On 2/5/19 10:50 am, Faraz Hussain wrote:

Thanks John. I believe we purchased the enclosure from HPe with only 
hardware support. I am not aware of any support contract with Mellanox. 
We are running RHEL 7.5 ( I may have accidentally said it was Cent OS, 
but that was a typo )..


Red Hat do have documentation on setting up IB too.  This might be a 
good starting point:


https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/sec-infiniband_and_rdma_related_software_packages

You should also be able to call on Red Hat for support with this as well.

Best of luck!
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] GPFS question

2019-04-29 Thread Chris Samuel
On Monday, 29 April 2019 3:47:10 PM PDT Jörg Saßmannshausen wrote:

> thanks for the feedback. I guess it also depends how much meta-data you have
> and whether or not you have zillions of small or larger files.
> At least I got an idea how long it might take.

This thread might also be useful, it is a number of years old but it does have 
some
advice on placement of the filesystem manager before the scan and also on their
experience scanning a ~1PB filesystem.

https://www.ibm.com/developerworks/community/forums/html/topic?id=----14834266

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Beowulf in the news

2019-04-24 Thread Chris Samuel

On 17/4/19 4:14 pm, Lux, Jim (337K) via Beowulf wrote:

In other news, I note that the Event Horizon Telescope (EHT) used the 
well known (to beowulf list members)  “station wagon full of disk 
drives” approach to high bandwidth, high latency data comm.  This 
technique does have significant history in the VLBI field (station wagon 
full of digital or analog tapes).  The effective data rate from the 
telescope in Hawaii to MIT was 112 Gbps (700TB of data in 50,400 seconds).


Same trick the pulsar astronomers at Swinburne used to do with Apple 
Xserve RAID boxes taking empty drives from Melbourne to Parkes and 
returning with full drives with data.  These days they have connectivity 
out to the dish thankfully!


This also had an application in HEP in Australia when it was cheaper to 
fly someone to Japan to the KEK collider to recover data on tape than it 
was to ship it over the 'net back to Australian researchers.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Large amounts of data to store and process

2019-03-16 Thread Chris Samuel
On Friday, 15 March 2019 6:19:14 PM PDT Lawrence Stewart wrote:

> * Some punters argue that MPI memory use scales badly with huge numbers of
> ranks, so a hybrid approach is best, with OpenMP on node and MPI between
> nodes.  I am not convinced. You get the complexities of both.

I think the thing there is "it depends" - for instance on BlueGene/Q where you 
had 16 cores and 16 GB RAM you could run 16 ranks of an MPI application per 
node but only have 1GB RAM per rank, or a single rank per node with 16GB RAM 
(or some power of 2 in between).   So for some large molecular dynamics 
simulations (like NAMD) going hybrid could be the difference between failing 
due to not enough memory (usually on rank 0) and being able to run to 
completion.

Now that's not necessarily the case any more (especially as BlueGene has gone 
the way of the dodo) but it was pretty important where I used to be!

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] List returned to service (was Re: Administrivia: Beowulf down this weekend for OS upgrade)

2019-03-09 Thread Chris Samuel
On Saturday, 9 March 2019 7:29:10 PM PST Chris Samuel wrote:

> The list is working, and the archives are being updated, but there's a
> niggling issue that stops access to the archives via the web that I've not
> been able to solve yet.

Final, final email for the night.   This is fixed (Apache 2.4 doesn't like the 
old syntax but doesn't complain).   Archives are visible again.

-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] List returned to service (was Re: Administrivia: Beowulf down this weekend for OS upgrade)

2019-03-09 Thread Chris Samuel
On Saturday, 9 March 2019 7:29:10 PM PST Chris Samuel wrote:

> However, I don't want to have that stopping access so I've made it
> accessible again!

Final email for the night, the website now uses Lets Encrypt certificates so 
we finally have proper HTTPS.   It also forces browsers to HTTPS.

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


[Beowulf] List returned to service (was Re: Administrivia: Beowulf down this weekend for OS upgrade)

2019-03-09 Thread Chris Samuel
On Saturday, 9 March 2019 5:27:08 PM PST Chris Samuel wrote:

> Upgrade going well, please forgive this test message to check that the list
> is still working and archives are correctly updated.

The list is working, and the archives are being updated, but there's a 
niggling issue that stops access to the archives via the web that I've not 
been able to solve yet.

However, I don't want to have that stopping access so I've made it accessible 
again!  Please do let me know if anything else is broken.

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] Administrivia: Beowulf down this weekend for OS upgrade

2019-03-09 Thread Chris Samuel
On Saturday, 9 March 2019 3:36:28 PM PST Chris Samuel wrote:

> Just a heads up that I'll be taking beowulf.org down shortly to do a much
> needed OS upgrade.

Upgrade going well, please forgive this test message to check that the list is 
still working and archives are correctly updated.

-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


[Beowulf] Administrivia: Beowulf down this weekend for OS upgrade

2019-03-09 Thread Chris Samuel
Hi all,

Just a heads up that I'll be taking beowulf.org down shortly to do a much 
needed OS upgrade.   A side benefit for you folks for me being on call for 
NERSC this weekend and not being able to stray too far from home. :-)

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] Introduction and question

2019-02-28 Thread Chris Samuel
On Thursday, 28 February 2019 12:41:57 AM PST Bill Broadley wrote:

> * avoid installing/fixing things with vi/apt-get/dpkg/yum/dnf, use ansible
>   whenever possible.  Eventually you'll have to reinstall and it's painful
>   to manually apply months of changes.

Another approach is to build a RAM disk image that gets booted on each node, 
and then you only make changes to that image.   That way you know your nodes 
are in lockstep. 

At ${JOB-2} we used xCAT for that with its "statelite" method (so we could 
have some persistent state for things like GPFS config info on an NFS share), 
at ${JOB-1} we had an image on Lustre that was updated via some scripts from a 
master image that was kept in git, and where I am now we use Ansible to build 
boot images for our various systems.

All the best!
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] Password mining

2019-02-01 Thread Chris Samuel
On Saturday, 2 February 2019 11:42:26 AM AEDT Robert G. Brown wrote:

> There is an ancient Unix/Linux application called "crack" (it's still in
> at least Fedora, if not all the rest).  At this point it is usually used
> by sysadmins to run on their password file to detect terrible passwords
> when users pick easily crackable ones.

Well that's why Alec wrote it when he was at Aberystwyth, to try and find 
users with weak passwords. :-)

> One part of the (rather
> intelligent -- written by generations of mostly-white hat wizards)
> program checks for common passwords, unchanged passwords (like
> changeme), and then runs the entire dictionary(s) with all reasonable
> permutations of things like S -> 5, E -> 3, L -> 1.

Yeah, Crack has a rule based system to express all the types of munging you 
would want to try, as well as the ability to add dictionaries and split the 
run up over multiple machines.

ObHPC: the "John the Ripper" password cracker includes GPU support, at
${JOB-3} one of our HPC sysadmins was running it there to check our users 
passwords. We found that the OpenCL version was (then) faster than the 
straight CUDA version.

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] Thoughts on EasyBuild?

2019-01-17 Thread Chris Samuel

On 17/1/19 7:32 am, Faraz Hussain wrote:

Some folks I work with are using EasyBuild but I am not sure what to 
make of it.


I used easybuild for a number of years at VLSCI and at Swinburne and I 
can say that whilst the "easy" can be a bit of a misnomer it is very 
powerful and does simplify the job of managing HPC software.


You want OpenFOAM?  Pick the version, tell it to build it in robot mode 
and it'll go off and build it from the compiler all the way to the 
finished install.


What I really like is that it codifies a lot of the knowledge about 
building these applications so even if you don't use it yourself the 
easyconfigs can help for software you may be struggling to build.  The 
community around it is also strong and helpful.


They also use checksums for the files they download to confirm nothing 
went wrong, it also has the useful side-effect of catching projects who 
spot a bug in a release and then re-release it without updating the 
version number. :-)


One complication I did find is when you build things that you want to 
apply custom configuration to - like ensuring that all your OpenMPI 
versions are built with Slurm integration enabled for example - it means 
modifying all those easyconfigs in advance. I don't know if that's 
changed recently.


One other nit is that it can lead to an explosion of versions of GCC, 
various MPI's, etc because the easyconfigs encode all those.  What we 
did was use custom versions and pick a GCC v6, v7, etc version to use as 
well as a version of Python2, Python3, Perl, etc for those modules.


What I liked is that it is very flexible - I really liked the way I 
could create an easyconfig for a bundle of Python modules for a group 
and they would just load that one bundle to get everything they wanted. 
It's not embedded into the existing Python install so you can do this 
over and over again and not accidentally upgrade one groups versions of 
a module simply because another group wants a module that depends on a 
later version.


They also have a sense of humour, Kenneth Hoste the lead developer has a 
great talk called "How to make package managers cry" which goes into 
detail about how to make your software hard to install, with examples.


https://www.youtube.com/watch?v=NSemlYagjIU

Hope that's useful!

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] USB flash drive bootable distro to check cluster health.

2019-01-11 Thread Chris Samuel

On 11/1/19 4:59 am, Richard Chang wrote:

Anyone knows of any pre-existing distribution that will do the job ? Or 
know how to do it with Centos or Ubuntu ?


I've used Advanced Clustering's "Breakin" bootable image for this in the 
past - it's open source and freely downloadable.


http://www.advancedclustering.com/products/software/breakin/

It looks like their code is up on their own git server too:

http://git.advancedclustering.com/cgi-bin/gitweb.cgi

It looks like the downloads haven't been updated for several years, 
whilst the git repos are more recent.


I've also just noticed a tool called "stressant" but that's not been 
touched for a year:


https://gitlab.com/anarcat/stressant

No idea what that's like!

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] ipoib routing

2018-12-12 Thread Chris Samuel

On 11/12/18 10:58 pm, John Hearns via Beowulf wrote:


Chris, we are talking about exactly the same hardware here.
If you opened one up there was a SATA DOM which contained the OS and the 
configuration script.


Interesting, I had been pretty sure the ones we had were just spinning 
rust, but it was a long time ago now!


--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] ipoib routing

2018-12-11 Thread Chris Samuel

On 11/12/18 2:40 pm, John Hearns via Beowulf wrote:

Michael, yes.  Panasas engineered IPOIB to Ethernet routers for their 
storage platform. Remember that until the latest generation of their kit 
they ran on BSD, which had no Infiniband capability. Panasas IB routers 
booted from an onboard SATA DOM, which was quite a neat solution.


Sounds like they've changed a bit since we had them at VLSCI (circa 
2010), they were just two SuperMicro 2-in-1U boxes, each with a QDR IB 
and 10gigE and with a relatively vanilla CentOS 5 install and a script 
to configure them.


Mind you they didn't need to be complicated, they just ran and ran. We 
lost 1 at one point due to a hardware problem and the solution was to 
replace the whole unit of 2 nodes. From memory we just needed to take 
the routes out of the Panasas config and the cluster they were routing 
for (and put them back when the replacement arrived).


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] No HTTPS for the mailman interface

2018-12-03 Thread Chris Samuel

On 2/12/18 10:43 pm, jaquil...@eagleeyet.net wrote:

I know Chris is away, but dont you guys feel like there should be an SSL 
certificate on the mailman interface as right now it is sending all 
credentials over http.


Don't worry, I've been wanting to do this for ages. From memory (no
access at the moment) port 443 is blocked by some firewall config I
don't have access to.

Now I'll be in the Bay area making contact with someone at Penguin
who can help me with this should hopefully be easier.. (crosses fingers)

Thanks for the reminder!

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


[Beowulf] Administrivia: list admin traveling for a while

2018-12-01 Thread Chris Samuel

Hi folks,

This won't be news to those who stalk^Wfollow me on Twitter, but today 
is my last day in Australia, I fly this evening to the US where I'll be 
starting work at NERSC at LBL on December 10th.  I'm taking a couple of 
days beforehand to visit my partner in Philadelphia and so I'll have 
random email access (though it's been pretty random since they packed 
all my computers into a container and I've been madly tidying and 
cleaning) for some time.


Any emails that need moderation and any new subscription requests may 
take longer than usual to process, sorry about that.


I really want to catch up on that HPC workflows thread soon!

All the best,
Chris (back to cleaning)
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] If you can help ...

2018-11-20 Thread Chris Samuel
Hi Doug,

On Tuesday, 20 November 2018 9:08:49 AM AEDT Douglas Eadline wrote:

> Thanks again to all those who helped. It was a great success.

Good to hear that the community helped, sorry it couldn't cover the whole 
extra cost. :-(

Look forward to seeing the film!

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] PMIX and Julia?

2018-11-20 Thread Chris Samuel
On Tuesday, 20 November 2018 7:53:30 AM AEDT Prentice Bisbal via Beowulf 
wrote:

> Conceptually, I don't think there's
> anything preventing it from being used for Julia. In reality, it depends
> on how general the API is.

Yeah, if you're launched via Slurm (for instance) you could have access to 
PMIx (various versions), PMI2, PMI1 (if you're earlier than 18.08) or none of 
the above..   For other resource managers the choices may be different.

Much easier to layer yourself on top of MPI and let it handle that for you! 
:-)

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] PMIX and Julia?

2018-11-18 Thread Chris Samuel
On Monday, 19 November 2018 6:35:56 AM AEDT John Hearns via Beowulf wrote:

> What I am really asking is will pmix be totally necessary when running on
> near-exascale systems, or am I missing something? My thoughts are should
> the Julia world be looking at mpix adaptations? If someone with a clue
> about pmix could enlighten me I would be grateful.

My (limited) understanding of this is that PMI* is an MPI wire-up protocol, in 
other words a mechanism for the MPI ranks to discover each other, set up 
communications and also talk to the resource scheduler (if present).

There's a handy little description of PMI (v2 in this case) here:

https://wiki.mpich.org/mpich/index.php/PMI_v2_API

The PMIx website (along with standards documents etc) here:

https://pmix.org/

My instinct is that it might be better for Julia to sit on MPI and let it 
handle this for it, rather than have to know about PMI2/PMIx itself..

All the best!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] If you can help ...

2018-11-09 Thread Chris Samuel
On Saturday, 10 November 2018 2:21:42 PM AEDT Adam DeConinck wrote:

> I won’t be able to make it this year, but just kicked in $50. Good luck, and
> I’m sad to miss it!

Likewise, donated and shared on LinkedIn and Twitter.  Fingers crossed Doug!

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] SC18

2018-11-08 Thread Chris Samuel
On Thursday, 8 November 2018 4:16:46 AM AEDT Ryan J. Negri wrote:

> I'd love to meet anyone from the Beowulf list if you're also in town.

Sadly I can't be there this year, but please do plug the list to people who 
you think might benefit! :-)

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] Oh.. IBM eats Red Hat

2018-11-04 Thread Chris Samuel
On Monday, 5 November 2018 2:14:50 AM AEDT Gerald Henriksen wrote:

> The biggest threat to RHEL isn't lost sales to CentOS but losing
> customers and mindshare to Ubuntu (which certainly appears to have
> been an issue the last number of years based on the number of software
> projects that support Ubuntu but not Red Hat).

I don't think that's surprising, and I don't think that's going to change no 
matter what happens with Red Hat and IBM.  From what I've seen in my time 
people tend to develop on their desktops and those tend to run Ubuntu (either 
natively or in a VM), not CentOS/RHEL.

This is why tools like EasyBuild, Spack and containers, are important, we need 
to be able to cater for these wide ranging dependencies.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] More about those underwater data centers

2018-11-04 Thread Chris Samuel
On Monday, 5 November 2018 3:13:29 AM AEDT John Hearns via Beowulf wrote:

> Have we faced up to the environmental impact of this? 

Where I've been has always tried to reuse/resell/recycle systems.  Our 
alphacluster was snapped up by folks in the US, our first Intel cluster went 
to another university, our Power5 cluster went somewhere I can't remember. At 
${JOB-1} we had Intel clusters redirected to other parts of the university 
(including one to the LHC ATLAS folks there). BlueGene - well not so much as 
far as I could tell. :-(

> It is often said that CPUs can be upgraded - I have only once seen an
> upgrade in place in my career.

Only been offered (and did) this once about a decade ago at ${JOB-2} where we 
upgraded a system from dual core to quad core Opteron (Barcelona).

That could have gone better...  First of all it was all delayed because of 
the TLB errata (we eventually got affected chips and ran with the kernel patch 
before getting the rev'd silicon) and then we started to see random lock ups.

Turned out (after a lot of chasing) that whilst the mainboard was meant to be 
OK the layout in the box meant the RAM next to the CPUs would sometimes 
overheat and take down the box.   They added some heatsinks to those DIMMs and 
the problem went away!

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC


___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] "NNSA’s first really big heterogeneous supercomputer"

2018-11-01 Thread Chris Samuel
On Thursday, 1 November 2018 4:58:30 AM AEDT Prentice Bisbal via Beowulf wrote:

> I read a publication from LANL on it's architecture years ago, and I believe
> to had to program for all 3 different processors to take advantage of it's
> architecture. (If soneone knows for sure, please correct me if I'm wrong.)  
> I'd say that's even more heterogeneous than what we are seeing today
> (CPU + GPU).

I think you're bang on the money, here's a presentation from LANL
about Roadrunner which includes programming on slide 25.

https://www.lanl.gov/conferences/salishan/salishan2007/Roadrunner-Salishan-Ken%20Koch.pdf

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] Oh.. IBM eats Red Hat

2018-10-31 Thread Chris Samuel
On Tuesday, 30 October 2018 2:58:18 AM AEDT Joe Landman wrote:

> Python 2.x is dead, 3.x should be used/shipped everywhere.

[Looks at folks running Python2 apps that rely on no-longer maintained Python2 
only modules, then looks at others running 32-bit IRAF binaries. Goes and 
cries in corner...]

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] Oh.. IBM eats Red Hat

2018-10-30 Thread Chris Samuel
On Tuesday, 30 October 2018 5:04:39 AM AEDT John Hearns via Beowulf wrote:

> I just realised...  I will now need an account on the IBM Support Site, a
> SiteID AND an Entitlement to file bugs on any Redhat packages.

I suspect that won't be the case, from what IBM are saying they're basically 
going to let them carry on with how Red Hat are doing things.

A couple of points now I've had some time to think further on this:

1) IBM has always required you to run either Red Hat or SLES for hardware 
support on xSeries hardware. Having better links into one of those means it 
becomes easier to track down issues when Red Hat stuff up a kernel feature on 
a point release (like breaking Mellanox IB for several releases in RHEL6 for 
BG/Q Power systems, it panics your service node when you boot 4 racks at once, 
had to run RHEL 6.2 kernel until it was fixed in 6.5).

I would be a bit nervous if I was SuSE on that front on that potential for 
more tie-in.  On Power I'd be worried if I was Canonical as they had gone in 
hard with partnerships with IBM for Power around 2015.

2) IBM does have techies (as others have mentioned); from my local perspective 
they hired most/all of the OzLabs folks in Canberra in 2001 (who were stranded 
after LinuxCare folded) to join the Linux Technology Centre there, and were 
doing a lot of PPC kernel & firmware hacking.

They brought up Linux on Power5 before AIX (for the first time - AIX needed 
firmware support whilst they could get Linux to boot on the bare metal). Some 
of them you may have heard of :-) (Andrew Tridgell, Rusty Russell, Paul 
Mackerras, Chris Yeoh).  I had the privilege of working with some IBM folks 
seconded to VLSCI and they were a very smart bunch (Mark Nelson moved from the 
LTC down to Melbourne & did a bunch of work on Slurm for us).   There's a heap 
more information here:

https://ozlabs.org/about.html

3) xSeries support has always been a pain point for folks dealing with IBM, 
but the pSeries (POWER) support has been a lot better in general.  As long as 
your IBM account manager doesn't muck up your support schedule. ;-)

Red Hat's support has been not that great in my experience, and there are 
signs of a lack of testing in their release cycle (they released a point 
release of RHEL6 where rsync's parsing of remote source/destinations was 
broken so you couldn't rsync to/from a remote source, plus of course the 
recent release of a kernel where RDMA was completely broken due to a single 
character typo).

4) Sierra (and presumably other IBM CORAL systems) runs RHEL7.  See point 1.

All the best!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC


___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] Oh.. IBM eats Red Hat

2018-10-29 Thread Chris Samuel
On Monday, 29 October 2018 6:42:48 PM AEDT Tony Brian Albers wrote:

> I wonder where that places us in the not too distant future..

Yeah, it's certainly a case of interesting times.  At least they do say:

https://investors.redhat.com/news-and-events/press-releases/2018/10-28-2018-184027500

# Upon closing of the acquisition, Red Hat will join IBM's Hybrid Cloud team
# as a distinct unit, preserving the independence and neutrality of Red Hat's
# open source development heritage and commitment, current product
# portfolio and go-to-market strategy, and unique development culture.
# Red Hat will continue to be led by Jim Whitehurst and Red Hat's current
# management team. Jim Whitehurst also will join IBM's senior management
# team and report to Ginni Rometty. IBM intends to maintain Red Hat's
# headquarters, facilities, brands and practices.

So (at least at first) it seems they intend to stay pretty hands off,
which I think is a good thing.   Fingers crossed for the future..

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] It is time ...

2018-10-24 Thread Chris Samuel
On Tuesday, 23 October 2018 9:48:23 PM AEDT Jeffrey Layton wrote:

> Is there a "consumption game" around the word blockchain?

Power consumption?

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] Contents of Compute Nodes Images vs. Login Node Images

2018-10-24 Thread Chris Samuel
On Wednesday, 24 October 2018 5:47:19 AM AEDT Prentice Bisbal via Beowulf 
wrote:

> For the Blue Gene/Q, they did start supporting dynamically linked
> executables, but I don't know what changed to the OS to allow that.

The CNK (from memory) just passed all I/O over to the I/O nodes anyway, so if 
your code dlopen()d a library it was just reading it from the image on the I/O 
nodes.  BG/Q of course had the extra core for the kernel threads to avoid 
getting in the way of the application on the 16 cores for compute.

This was one reason that there was a preference to static link on BG/P and BG/
Q as it would put less load on the I/O nodes when starting large jobs.  But of 
course that's non-negotiable for some applications!

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] Contents of Compute Nodes Images vs. Login Node Images

2018-10-24 Thread Chris Samuel
On Wednesday, 24 October 2018 3:15:51 AM AEDT Ryan Novosielski wrote:

> I realize this may not apply to all cluster setups, but I’m curious what
> other sites do with regard to software (specifically distribution packages,
> not a shared software tree that might be remote mounted) for their login
> nodes vs. their compute nodes.

At VLSCI we had separate xCAT package lists for both, but basically the login 
node was a superset of the compute node list.  These built RAMdisk images so 
keeping them lean (on top of what xCAT automatically strips out for you) was 
important.

Here at Swinburne we run the same image on both, but that's a root filesystem 
chroot on Lustre so size doesn't impact memory usage  (the node boots a 
patched oneSIS RAMdisk that brings up OPA and mounts Lustre then pivots over 
onto the image there for the rest of the boot).  The kernel has a patched 
overlayfs2 module that does clever things for that part of the tree to avoid 
constantly stat()ing Lustre for things it has already cached (IIRC, that's a 
colleagues code).

We install things into the master for the chroot (tracked with git) then have 
a script that turns the cache mode off across the cluster, rsync's things into 
the actual chroot area, does a drop_caches and then turns the cache mode on 
again.

Hope that helps!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] Hacked MBs It was only a matter of time

2018-10-20 Thread Chris Samuel
On Thursday, 4 October 2018 11:47:17 PM AEDT Douglas Eadline wrote:

> https://www.bloomberg.com/news/features/2018-10-04/the-big-hack-how-china-used-a-tiny-chip-to-infiltrate-america-s-top-companies

So two weeks on and it looks like this wasn't real, and I've read somewhere
(though I can't find the reference now) that this isn't the first time for the
person who wrote that article.   A lot of people wrote about how this sort
of attack doesn't really make sense, there are far easier ways to do this
sort of thing (nobbled BMC firmware probably being one of the easiest)
and without the problems of possibly thousands of SM boxes trying to
ping back to a CnC server to set off alarms in a host of companies.

This sums it up nicely..

https://twitter.com/SwiftOnSecurity/status/1053102057245286401

Two weeks since Bloomberg claimed Supermicro servers were backdoored by Chinese 
spying chips.
No Evidence Whatsoever shows these claims real.
All companies angrily deny it to Congress.
Senior US intelligence including Rob Joyce refute it.
It’s time.
It’s over.
This is not true.

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] If I were specifying a new custer...

2018-10-16 Thread Chris Samuel
On Saturday, 13 October 2018 12:38:15 AM AEDT Gerald Henriksen wrote:

> If ARM, or Power, want to move from their current positions in the
> market they really need to provide affordable developer machines,

Not sure if this comes in at a price point that makes sense for this, but 
there is now an ATX Power9 mainboard available.

https://raptorcs.com/TALOSIILITE/

They claim:

https://twitter.com/RaptorCompSys/status/1020371675316215809

# TalosIILite in stock and ready to ship! #POWER9 mainboard + CPU + RAM + HSF
# for under $2,000 USD, what's not to like? Supports all of our Sforza CPU
# options, from 4 core to the high end 22 core CPUs.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] If I were specifying a new custer...

2018-10-11 Thread Chris Samuel

On 12/10/18 08:50, Scott Atchley wrote:


Perhaps Power9 or Naples with 8 memory channels? Also, Cavium ThunderX2.


I'm not sure if Power or ARM (yet) qualify for a general HPC workload 
that Doug mentions; sadly a lot of the commercial codes are only 
available for x86-64 these days. MATLAB dropped PowerPC support back in 
2007 for instance.


All the best,
Chris (still in the UK)
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


[Beowulf] Administrivia: list admin travelling

2018-10-02 Thread Chris Samuel
Hi folks,

Just a heads up that I've got to return to the UK for family matters and will 
be back in Melbourne late on the 14th October (and giving a short talk at a 
workshop at eResearch on the 16th, eep!).

I should still have access to email whilst travelling, but likely with a 
higher latency than usual.

The list has been quiet recently; please remember it is your list and everyone 
is welcome to initiate and participate in discussions no matter your level of 
experience.

If you know people who might benefit from being on the list please let them 
know about it, HPC is a fairly small community in the wider IT world and so 
often we can only make connections with our peers through things like this.

If you are going to be at events like SC (sorry, don't think I'll be there 
this year) please do promote the list to those who may not know about it if it 
has been of benefit to you.

Finally if you, or people you know, have had problems getting or sending 
emails to this list please do let me know by emailing me directly.   The 
advent of sometimes misguided or over-enthusiastic anti-spam filters is often 
a problem these days [looks pointedly at current work email system].

All the best!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC


___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] Tiered RAM

2018-09-26 Thread Chris Samuel
On Wednesday, 26 September 2018 1:38:15 PM AEST John Hearns via Beowulf wrote:

> I had a look at the Intel pages on Optane memory. It is definitely
> being positioned as a fast file cache, ie for block oriented devices.

Interestingly there was a (non-block device) filesystem presented this year
called NOVA targeting these sorts of NVMM devices.  LWN has a nice little
article on it (which in turn links to earlier articles on it, and the original
paper).

https://lwn.net/Articles/754505/

It's still going through rapid development and (learning from the btrfs
experience) the kernel folks aren't going to let it in without a working fsck
from the sound of things.

There's also some NVDIMM documentation in the kernel tree which is pretty
heavy going, I've just tried to skim it and I think I know less now than when
I started. ;-)

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/nvdimm/nvdimm.txt

There's also the more readable documentation for the "block translation
table" which seems to be intended to provide a way to give some atomicity
to storage transactions to NVDIMMs which are not present given the nature
of the hardware:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/nvdimm/btt.txt

There is also the persistent memory wiki:

https://nvdimm.wiki.kernel.org/

> This worked by tiering RAM memory - ie a driver in the Linux
> kernel would move little used pages to the slower but higher capacity
> device.
> I though the same thing would apply to Optane, but it seems not.

Well the simplest way to get what you describe there might be to use
the Optane as a swap partition. :-)

Red Hat have some docs about using NVDIMMs.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/ch-persistent-memory-nvdimms

Not sure I helped much there! :-)

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] SIMD exception kernel panic on Skylake-EP triggered by OpenFOAM?

2018-09-12 Thread Chris Samuel
On Monday, 10 September 2018 2:23:18 PM AEST Jonathan Engwall wrote:

> If it is helpful there are a few similar bugs, generally considered
> unreproducible. One thread calls it bogus xcomp_bv...the kernel clobbers
> itself writing zeroes when that is not the state. And spectre came up. One
> suggestion is to disable IBRS; according to other sources IBRS is dangerous
> to disable and should protect against Spectre. Maybe the OpenFOAM is to
> blame.

Yeah, I suspect what we're seeing is different to that, it looks like 
something manages to generate a SIMD exception whilst the kernel is dealing 
with an APIC timer interrupt.   A colleague has backported this patch that I 
found to our CentOS kernel in case it helps.

https://lore.kernel.org/patchwork/patch/953364/

For now we've constrained this users workload on to a handful of nodes as they 
are trying to get some project work done.

All the best!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC


___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA

2018-09-10 Thread Chris Samuel
On Tuesday, 11 September 2018 10:41:24 AM AEST Kilian Cavalotti wrote:

> Last I heard, the fix will be in 862.14.1 to be released on the 25th

Ah interesting, I wonder if that fix is already in the 3.10.0-933 kernel 
that's meant to be in the RHEL 7.6 beta?

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA

2018-09-10 Thread Chris Samuel
On Tuesday, 11 September 2018 9:17:21 AM AEST Ryan Novosielski wrote:

> So we’ve learned what, here, that RedHat doesn’t test the RDMA stack at all?

It certainly does seem to be the case.  Unlike other issues I've hit in the 
past with bugs introduced in the IB stack in 6.x -> 6.y transitions where 
they've needed more hardware than you could reasonably expect them to have to 
be able to spot the bug this is a pretty fundamental failure.

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA

2018-09-10 Thread Chris Samuel
On Tuesday, 11 September 2018 1:25:55 AM AEST Peter St. John wrote:

> I had wanted to say that such a bug would be caught by compiling with some
> reasonalbe warning level; but I think I was wrong.

Interesting - looks like it depends on your GCC version, 7.3.0 catches it with 
-Wall here:

chris@quad:/tmp$ gcc -Wall test.c -o test
test.c: In function ‘main’:
test.c:6:2: warning: this ‘if’ clause does not guard... 
[-Wmisleading-indentation]
  if ( test );
  ^~
test.c:7:3: note: ...this statement, but the latter is misleadingly indented as 
if it were guarded by the ‘if’
   printf ( "hello\n" );
   ^~

> So I guess I have to forgive the software engineer who fat-fingered that
> semicolon. Of course I've done worse.

Oh yes, same here too!   There but for... and all that. :-)

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA

2018-09-10 Thread Chris Samuel
On Friday, 17 August 2018 2:47:37 PM AEST Chris Samuel wrote:

> Just a heads up that the 3.10.0-862.11.6.el7.x86_64 kernel from RHEL/CentOS
> that was released to address the most recent Intel CPU problem "L1TF" seems
> to break RDMA (found by a colleague here at Swinburne).

So this CentOS bug has a one line bug fix for this problem!

https://bugs.centos.org/view.php?id=15193

It's a corker - basically it looks like someone typo'd a ; into an if 
statement, the fix is:

-   if (!rdma_is_port_valid_nospec(device, _attr->port_num));
+   if (!rdma_is_port_valid_nospec(device, _attr->port_num))
return -EINVAL;

So it always returns -EINVAL when checking the port as the if becomes a noop.. 
:-(

Patch attached...

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>From 6353587a7efa488a4064f3661cf64bd4d74eaa73 Mon Sep 17 00:00:00 2001
From: Pablo Greco 
Date: Mon, 20 Aug 2018 06:39:55 -0300
Subject: [PATCH] OMG

---
 drivers/infiniband/core/verbs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index debe718..c080eb2 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1232,7 +1232,7 @@ int ib_resolve_eth_dmac(struct ib_device *device,
 	int   ret = 0;
 	struct ib_global_route *grh;
 
-	if (!rdma_is_port_valid_nospec(device, _attr->port_num));
+	if (!rdma_is_port_valid_nospec(device, _attr->port_num))
 		return -EINVAL;
 
 	if (ah_attr->type != RDMA_AH_ATTR_TYPE_ROCE)
-- 
1.8.3.1

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] About Torque maillist

2018-08-23 Thread Chris Samuel
On Wednesday, 22 August 2018 12:56:21 AM AEST Dmitri Chubarov wrote:

> It looks like the list at Cluster Resources still exists but has not been
> particularly active for quite some time now.

That's pretty sad to see really, and from the archives it looks like they had 
an outage earlier in the year for the lists as well.

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA

2018-08-21 Thread Chris Samuel
On Tuesday, 21 August 2018 3:27:59 AM AEST Lux, Jim (337K) wrote:

> I'd find it hard to believe that Intel's CPU designers sat around
> implementing deliberate flaws ( the Bosch engine controller for VW model).

Not to mention that Spectre variants affected AMD, ARM & IBM (at least).

This publicly NSA funded research ("The Intel 80x86 processor architecture: 
pitfalls for secure systems") from 1995 has an interesting section:

https://ieeexplore.ieee.org/document/398934/
https://pdfs.semanticscholar.org/2209/42809262c17b6631c0f6536c91aaf7756857.pdf

Section 3.10 - Cache and TLB timing channels

which warns (in generalities) about the use of MSRs and the use of instruction 
timing as side channels.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA

2018-08-19 Thread Chris Samuel
On Monday, 20 August 2018 6:32:26 AM AEST Jonathan Engwall wrote:

> I am not shocked that my previous message may have been removed.

To clarify: nothing has been removed to my knowledge.  Your email is in the 
list archives.

http://beowulf.org/pipermail/beowulf/2018-August/035219.html

All the best,
Chris (just woken up)
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA

2018-08-18 Thread Chris Samuel
On Sunday, 19 August 2018 5:19:07 AM AEST Jeff Johnson wrote:

> With the spate of security flaws over the past year and the impacts their
> fixes have on performance and functionality it might be worthwhile to just
> run airgapped.

For me none of the HPC systems I've been involved with here in Australia would 
have had that option.  Virtually all have external users and/or reliance on 
external data for some of the work they are used for (and the sysadmins don't 
usually have control over the projects & people who get to use them).

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA

2018-08-18 Thread Chris Samuel
On Saturday, 18 August 2018 11:55:22 PM AEST Jörg Saßmannshausen wrote:

> So I don't really understand about "Cannot make this public, as the patch
> that caused it was due to embargo'd security fix." issue.

I don't think any of us do, unless there's another fix there that is for an 
undisclosed CVE (which seems unlikely).

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA

2018-08-18 Thread Chris Samuel

On 18/8/18 8:47 pm, Jörg Saßmannshausen wrote:


Hi Chris,


Hiya,


these are bad news if InfiniBand will be affected here as well as
that is what we need to use for parallel calculations. They make use
of RMDA and if that has a problem. well, you get the idea I
guess.


Oh yes, this is why I wanted to bring it to everyones attention, this
isn't just about Lustre, it's much more widespread.


Has anybody contacted the vendors like Mellanox or Intel regarding
this?


As Kilian wrote in the Lustre bug quoting his RHEL bug:

https://bugzilla.redhat.com/show_bug.cgi?id=1618452

— Comment #3 from Don Dutile  —
Already reported and being actively fixed.

Cannot make this public, as the patch that caused it was due to 
embargo'd

security fix.

This issue has highest priority for resolution.
Revert to 3.10.0-862.11.5.el7 in the mean time.

This bug has been marked as a duplicate of bug 1616346

--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA

2018-08-17 Thread Chris Samuel
On Saturday, 18 August 2018 12:54:03 AM AEST Kilian Cavalotti wrote:

> That's true: RH mentioned an "embargo'd security fix" but didn't refer
> to L1TF explicitly (which I think is not under embargo anymore).

Agreed, though I'm not sure any of the listed fixes are embargoed now.

> As the reporter of the issue on the Whamcloud JIRA, I also have to
> apologize for initially pointing fingers at Lustre, it didn't cross my
> mind that this kind of whole RDMA stack breakage would have slipped
> past Red Hat's QA.

Oh I didn't read that as pointing any fingers at Lustre at all, just that the 
kernel update broke Lustre for you (and for us!).

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA

2018-08-16 Thread Chris Samuel
On Friday, 17 August 2018 2:47:37 PM AEST Chris Samuel wrote:

> Just a heads up that the 3.10.0-862.11.6.el7.x86_64 kernel from RHEL/CentOS
> that was released to address the most recent Intel CPU problem "L1TF" seems
> to break RDMA (found by a colleague here at Swinburne).

There's 6 CVE's addressed in that update from the look of it, so it might not 
be the L1TF fix itself that has triggered it.

https://access.redhat.com/errata/RHSA-2018:2384

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


  1   2   3   4   5   >