Re: [Beowulf] cursed (and perhaps blessed) Intel microcode

2018-05-09 Thread Christopher Samuel

Hi Mark,

On 30/03/18 16:28, Chris Samuel wrote:


I'll try and nudge a person I know there on that...


They did some prodding, and finally new firmware emerged at the end of
last month.

/tmp/microcode-20180425$ iucode_tool -L intel-ucode-with-caveats/06-4f-01
microcode bundle 1: intel-ucode-with-caveats/06-4f-01
  01/001: sig 0x000406f1, pf mask 0xef, 2018-03-21, rev 0xb2c, size 
27648


note the *with-caveats* part.

The releasenote file says:

---8< snip snip 8<---

-- intel-ucode-with-caveats/ --
This directory holds microcode that might need special handling.
BDX-ML microcode is provided in directory, because it need special 
commits in

the Linux kernel, otherwise, updating it might result in unexpected system
behavior.

OS vendors must ensure that the late loader patches (provided in
linux-kernel-patches\) are included in the distribution before packaging the
BDX-ML microcode for late-loading.

---8< snip snip 8<---

Here be dragons..

Good luck!
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


[Beowulf] Alternatives To MPI Workshop

2018-05-09 Thread John Hearns via Beowulf
As a fan of the Julia language, I jsut saw this announcement on the Julia
forum.
Sounds mighty interesting!

https://discourse.julialang.org/t/cfp-parallel-applications-workshop-alternatives-to-mpi-supercomputing-2018/10762

http://sourceryinstitute.github.io/PAW/

Higher-level parallel programming models offer rich sets of abstractions
that feel natural in the intended applications. Such languages and tools
include (Fortran, UPC, Julia), systems for large-scale data processing and
analytics (Spark, Tensorflow, Dask), and frameworks and libraries that
extend existing languages (Charm++, Unified Parallel C++ (UPC++), Coarray
C++, HPX, Legion, Global Arrays). While there are tremendous differences
between these approaches, all strive to support better programmer
abstractions for concerns such as data parallelism, task parallelism,
dynamic load balancing, and data placement across the memory hierarchy.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] Bright Cluster Manager

2018-05-09 Thread John Hearns via Beowulf
> All of a sudden simple “send the same command to all nodes” just doesn’t
work.  And that’s what will inevitably be the case as we scale up in the
HPC world – there will always be dead or malfunctioning nodes.

Jim, this is true. And 'we' should be looking to the webscale generation
for the answers. They thought about computing at scale from the beginning.

Regarding hardware failures, I heard a shaggy dog story that
Microsoft/Amazon/Google order servers ready racked in shipping containers.
When a certain proportion of servers are dead, they simply close it down
and move on.
Can anyone confirm or deny this story?

Which brings me to another one of my hobby horses - the environmental costs
of HPC. When pitching HPC clusters you often put in an option for a
mid-life upgrade. I think upping the RAM is quite common, but processors
and interconnect much less so.

So kit is hopefulyl worked hard for five years, till the cost of power and
cooling is outweighed by the performance of a new generation. But where
does the kit get recycled? Again when pitching clusters you have to put in
guarantees about WEE (or the equivalent in the USA)


On 8 May 2018 at 18:34, Lux, Jim (337K)  wrote:

>
>
>
>
> *From: *Beowulf  on behalf of "
> beowulf@beowulf.org" 
> *Reply-To: *John Hearns 
> *Date: *Thursday, May 3, 2018 at 6:54 AM
> *To: *"beowulf@beowulf.org" 
> *Subject: *Re: [Beowulf] Bright Cluster Manager
>
>
>
> I agree with Doug. The way forward is a lightweight OS with containers for
> the applications.
>
> I think we need to learn from the new kids on the block - the webscale
> generation.
>
> They did not go out and look at how massive supercomputer clusters are put
> together.
>
> No, they went out and build scale out applications built on public clouds.
>
> We see 'applications designed to fail' and 'serverless'
>
>
>
> Yes, I KNOW that scale out applications like these are Web type
> applications, and all application examples you
>
> see are based on the load balancer/web server/database (or whatever style)
> paradigm
>
>
>
> The art of this will be deploying the more tightly coupled applications
> with HPC has,
>
> which depend upon MPI communications over a reliable fabric, which depend
> upon GPUs etc.
>
> The other hat I will toss into the ring is separating parallel tasks which
> require computation on several
>
> servers and MPI communication between them versus 'embarrassingly
> parallel' operations which may run on many, many cores
>
> but do not particularly need communication between them.
>
> The best successes I have seen on clusters is where the heavy parallel
> applications get exclusive compute nodes.
>
> Cleaner, you get all the memory and storage bandwidth and easy to clean
> up. Hell, reboot the things after each job. You got an exclusive node.
>
> I think many designs of HPC clusters still try to cater for all workloads
> - Oh Yes, we can run an MPI weather forecasting/ocean simulation
>
> But at the same time we have this really fast IO system and we can run
> your Hadoop jobs.
>
> I wonder if we are going to see a fork in HPC. With the massively parallel
> applications being deployed, as Doug says, on specialised
>
> lightweight OSes which have dedicated high speed, reliable fabrics and
> with containers.
>
> You won't really be able to manage those systems like individual Linux
> servers. Will you be able to ssh in for instance?
>
> ssh assumes there is an ssh daemon running. Does a lightweight OS have
> ssh? Authentication Services? The kitchen sink?
>
>
>
> The less parallel applications being run more and more on cloud type
> installations, either on-premise clouds or public clouds.
>
> I confound myself here, as I cant say what the actual difference between
> those two types of machines is, as you always needs
>
> an interconnect fabric and storage, so why not have the same for both
> types of tasks.
>
> Maybe one further quip to stimulate some conversation. Silicon is cheap.
> No, really it is.
>
> Your friendly Intel salesman may wince when you say that. After all those
> lovely Xeon CPUs cost north of 1000 dollars each.
>
> But again I throw in some talking points:
>
> power and cooling costs the same if not more than your purchase cost over
> several years
>
> are we exploiting all the capabilities of those Xeon CPUs
>
> And another aspect of this -  I’ve been doing stuff with “loose clusters”
> of low capability processors (Arduino, Rpi, Beagle) doing distributed
> sensing kinds of tasks – leaving aside the Arduino (no OS) – the other two
> wind up with some flavor of Debian but often with lots of stuff you don’t
> need (i.e. Apache). Once you’ve fiddled with one node to get the
> configuration right, you want to replicate it across a bunch of nodes –
> right now that means sneakernet of SD cards - although in theory, one
> should be able to push an image out to