Re: [Beowulf] HPC workflows

2018-12-09 Thread Douglas O'Flaherty


> On Dec 9, 2018, at 7:26 AM, Gerald Henriksen  wrote:
> 
>> On Fri, 7 Dec 2018 16:19:30 +0100, you wrote:
>> 
>> Perhaps for another thread:
>> Actually I went t the AWS USer Group in the UK on Wednesday. Ver
>> impressive, and there are the new Lustre filesystems and MPI networking.
>> I guess the HPC World will see the same philosophy of building your setup
>> using the AWS toolkit as Uber etc. etc. do today.
>> Also a lot of noise is being made at the moment about the convergence of
>> HPC and Machine Learning workloads.
>> Are we going to see the MAchine Learning folks adapting their workflows to
>> run on HPC on-premise bare metal clusters?
>> Or are we going to see them go off and use AWS (Azure, Google ?)
> 
> I suspect that ML will not go for on-premise for a number of reasons.
> 
> First, ignoring cost, companies like Google, Amazon and Microsoft are
> very good at ML because not only are they driving the research but
> they need it for their business.  So they have the in house expertise
> not only to implement cloud systems that are ideal for ML, but to
> implement custom hardware - see Google's Tensor Processor Unit.
> 
> Second, setting up a new cluster isn't going to be easy.  Finding
> physical space, making sure enough utilities can be supplied to
> support the hardware, staffing up, etc.  are not only going to be
> difficult but inherently takes time when instead you can simply sign
> up to a cloud provider and have the project running within 24 hours.
> Would HPC exist today as we know it if the ability to instantly turn
> on a cluster existed at the beginning?
> 
> Third, albeit this is very speculative.  I suspect ML learning is
> heading towards using custom hardware.  It has had a very good run
> using GPU's, and a GPU will likely always be the entry point for
> desktop ML, but unless Nvidia is holding back due to a lack of
> competition is does appear the GPU is reaching and end to its
> development much like CPUs have.  The latest hardware from Nvidia is
> getting lacklustre reviews, and the bolting on of additional things
> like raytracing is perhaps an indication that there are limits to how
> much further the GPU architecture can be pushed.  The question then is
> the ML market big enough to have that custom hardware as a OEM product
> like a GPU or will it remain restricted to places like Google who can
> afford to build it without the necessary overheads of a consumer
> product.
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf

My data points are the opposite. 

1. As it progresses from experiment to real use, most AI/ML/DL is taking place 
near where the data is. Since for many that data is on-premises, that is 
on-premises.  For cloud services, it stays on the cloud.

2. The investment isn’t huge and is incremental, so there isn’t a strong 
barrier to buying the kit. 
Models never get ‘finished’ and require regular retesting on historical and new 
data, so they can keep it busy. The GPUs are plenty good enough because most of 
the frameworks parallelize (scale-out) easily.   There is also a desire to test 
models on other similar data, but that data takes prep and a common data 
source. The cost of this sized dedicated storage is not prohibitive, but moving 
from/to the cloud can be.  Most projects start very small to prove 
effectiveness. It isn’t a big tender to get started - unless you are doing 
Autonomous Driving... 

3. There will be specialized solutions for inference, but that isn’t the same 
as training. IMHO, the specialized silicon or designs will be driven by using 
the AI near the edge within the constraints of power, footprint, etc. Training 
will still be scale-out & centralized. GPUs will still work for a long time, 
just like CPUs did. 

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-12-09 Thread John Hearns via Beowulf
 > but for now just expecting to get something good without an effort is
probably premature.

Nothing good every came easy.

Who said that? My Mum. And she was a very wise woman.





On Sun, 9 Dec 2018 at 21:36, INKozin via Beowulf 
wrote:

> While I agree with many points made so far I want to add that one aspect
> which used to separate a typical HPC setup from some IT infrastructure is
> complexity. And I don't mean technological complexity (because
> technologically HPC can be fairly complex) but the diversity and the
> interrelationships between various things. Typically HPC is relatively
> homogeneous and straightforward. But everything is changing including HPC
> so modularisation is a natural approach to make systems more manageable so
> containers, conda, kubernetes etc are solutions to fight complexity. Yes,
> these solutions can be fairly complex too but the impact is generally
> intentionally restricted. For example, a conda environment can be rather
> bloated but then flexibility for size is a reasonable trade-off.
> One of the points Werner Vogels, Amazon CTO kept coming back over and over
> again in his keynote at the recent reInvent is modular (cellular)
> architecture at different levels (lambdas, firecracker, containers, VMs and
> up) because working with redundant, replaceable modules makes services
> scalable and resilient.
> And I'm pretty sure the industry will continue on its path to embrace
> microVMs as it did containers before that.
> This modular approach may work quite well for on prem IT, cloud or HTC
> (High Throughout Computing) but may still be a challenge for HPC because
> you can argue that true HPC system must be tightly coupled (e.g. remember
> OS jitter?)
> As for ML and more specifically deep learning, it depends on what you do.
> If you are doing inferencing ie production setup ie more like HTC then
> everything works fine. But if you want to train a model on on ImageNet or
> larger and do it very quickly (hours) then you will benefit from a tightly
> coupled setup (although there are tricks such as asynchronous parameter
> updates to alleviate latency)
> Two points in case here: Kubeflow whose scaling seems somewhat deficient
> and Horovod library which made many people rather excited because it allows
> using Tensorflow and MPI.
> While Docker and Singularity can be used with MPI, you'd probably want to
> trim as much as you can if you want to push the scaling limit. But I think
> we've already discussed many times on this list the topic of "heroic" HPC
> vs "democratic" HPC (top vs tail).
>
> Just on last thing regarding using GPUs in the cloud. Last time I checked
> even the spot instances were so expensive you'd be so much better off if
> you buy them even if for a month. Obviously if you have place to host them.
> And obviously in your DC you can use a decent network for faster training.
> As for ML services provided by AWS and others, my experience rather
> limited. I helped one of our students with ML service on AWS. Initially he
> was excited that he could just through his data set at it and get something
> out. Alas, he quickly found out that he needs to do quite a bit more so
> back to our HPC. Perhaps AutoML will be significantly improved in the
> coming years but for now just expecting to get something good without an
> effort is probably premature.
>
>
> On Sun, 9 Dec 2018 at 15:26, Gerald Henriksen  wrote:
>
>> On Fri, 7 Dec 2018 16:19:30 +0100, you wrote:
>>
>> >Perhaps for another thread:
>> >Actually I went t the AWS USer Group in the UK on Wednesday. Ver
>> >impressive, and there are the new Lustre filesystems and MPI networking.
>> >I guess the HPC World will see the same philosophy of building your setup
>> >using the AWS toolkit as Uber etc. etc. do today.
>> >Also a lot of noise is being made at the moment about the convergence of
>> >HPC and Machine Learning workloads.
>> >Are we going to see the MAchine Learning folks adapting their workflows
>> to
>> >run on HPC on-premise bare metal clusters?
>> >Or are we going to see them go off and use AWS (Azure, Google ?)
>>
>> I suspect that ML will not go for on-premise for a number of reasons.
>>
>> First, ignoring cost, companies like Google, Amazon and Microsoft are
>> very good at ML because not only are they driving the research but
>> they need it for their business.  So they have the in house expertise
>> not only to implement cloud systems that are ideal for ML, but to
>> implement custom hardware - see Google's Tensor Processor Unit.
>>
>> Second, setting up a new cluster isn't going to be easy.  Finding
>> physical space, making sure enough utilities can be supplied to
>> support the hardware, staffing up, etc.  are not only going to be
>> difficult but inherently takes time when instead you can simply sign
>> up to a cloud provider and have the project running within 24 hours.
>> Would HPC exist today as we know it if the ability to instantly turn
>> on a cluster existed at the beginning?

Re: [Beowulf] HPC workflows

2018-12-09 Thread INKozin via Beowulf
While I agree with many points made so far I want to add that one aspect
which used to separate a typical HPC setup from some IT infrastructure is
complexity. And I don't mean technological complexity (because
technologically HPC can be fairly complex) but the diversity and the
interrelationships between various things. Typically HPC is relatively
homogeneous and straightforward. But everything is changing including HPC
so modularisation is a natural approach to make systems more manageable so
containers, conda, kubernetes etc are solutions to fight complexity. Yes,
these solutions can be fairly complex too but the impact is generally
intentionally restricted. For example, a conda environment can be rather
bloated but then flexibility for size is a reasonable trade-off.
One of the points Werner Vogels, Amazon CTO kept coming back over and over
again in his keynote at the recent reInvent is modular (cellular)
architecture at different levels (lambdas, firecracker, containers, VMs and
up) because working with redundant, replaceable modules makes services
scalable and resilient.
And I'm pretty sure the industry will continue on its path to embrace
microVMs as it did containers before that.
This modular approach may work quite well for on prem IT, cloud or HTC
(High Throughout Computing) but may still be a challenge for HPC because
you can argue that true HPC system must be tightly coupled (e.g. remember
OS jitter?)
As for ML and more specifically deep learning, it depends on what you do.
If you are doing inferencing ie production setup ie more like HTC then
everything works fine. But if you want to train a model on on ImageNet or
larger and do it very quickly (hours) then you will benefit from a tightly
coupled setup (although there are tricks such as asynchronous parameter
updates to alleviate latency)
Two points in case here: Kubeflow whose scaling seems somewhat deficient
and Horovod library which made many people rather excited because it allows
using Tensorflow and MPI.
While Docker and Singularity can be used with MPI, you'd probably want to
trim as much as you can if you want to push the scaling limit. But I think
we've already discussed many times on this list the topic of "heroic" HPC
vs "democratic" HPC (top vs tail).

Just on last thing regarding using GPUs in the cloud. Last time I checked
even the spot instances were so expensive you'd be so much better off if
you buy them even if for a month. Obviously if you have place to host them.
And obviously in your DC you can use a decent network for faster training.
As for ML services provided by AWS and others, my experience rather
limited. I helped one of our students with ML service on AWS. Initially he
was excited that he could just through his data set at it and get something
out. Alas, he quickly found out that he needs to do quite a bit more so
back to our HPC. Perhaps AutoML will be significantly improved in the
coming years but for now just expecting to get something good without an
effort is probably premature.


On Sun, 9 Dec 2018 at 15:26, Gerald Henriksen  wrote:

> On Fri, 7 Dec 2018 16:19:30 +0100, you wrote:
>
> >Perhaps for another thread:
> >Actually I went t the AWS USer Group in the UK on Wednesday. Ver
> >impressive, and there are the new Lustre filesystems and MPI networking.
> >I guess the HPC World will see the same philosophy of building your setup
> >using the AWS toolkit as Uber etc. etc. do today.
> >Also a lot of noise is being made at the moment about the convergence of
> >HPC and Machine Learning workloads.
> >Are we going to see the MAchine Learning folks adapting their workflows to
> >run on HPC on-premise bare metal clusters?
> >Or are we going to see them go off and use AWS (Azure, Google ?)
>
> I suspect that ML will not go for on-premise for a number of reasons.
>
> First, ignoring cost, companies like Google, Amazon and Microsoft are
> very good at ML because not only are they driving the research but
> they need it for their business.  So they have the in house expertise
> not only to implement cloud systems that are ideal for ML, but to
> implement custom hardware - see Google's Tensor Processor Unit.
>
> Second, setting up a new cluster isn't going to be easy.  Finding
> physical space, making sure enough utilities can be supplied to
> support the hardware, staffing up, etc.  are not only going to be
> difficult but inherently takes time when instead you can simply sign
> up to a cloud provider and have the project running within 24 hours.
> Would HPC exist today as we know it if the ability to instantly turn
> on a cluster existed at the beginning?
>
> Third, albeit this is very speculative.  I suspect ML learning is
> heading towards using custom hardware.  It has had a very good run
> using GPU's, and a GPU will likely always be the entry point for
> desktop ML, but unless Nvidia is holding back due to a lack of
> competition is does appear the GPU is reaching and end to its
> development much like CPUs have.  

Re: [Beowulf] HPC workflows

2018-12-09 Thread Gerald Henriksen
On Fri, 7 Dec 2018 16:19:30 +0100, you wrote:

>Perhaps for another thread:
>Actually I went t the AWS USer Group in the UK on Wednesday. Ver
>impressive, and there are the new Lustre filesystems and MPI networking.
>I guess the HPC World will see the same philosophy of building your setup
>using the AWS toolkit as Uber etc. etc. do today.
>Also a lot of noise is being made at the moment about the convergence of
>HPC and Machine Learning workloads.
>Are we going to see the MAchine Learning folks adapting their workflows to
>run on HPC on-premise bare metal clusters?
>Or are we going to see them go off and use AWS (Azure, Google ?)

I suspect that ML will not go for on-premise for a number of reasons.

First, ignoring cost, companies like Google, Amazon and Microsoft are
very good at ML because not only are they driving the research but
they need it for their business.  So they have the in house expertise
not only to implement cloud systems that are ideal for ML, but to
implement custom hardware - see Google's Tensor Processor Unit.

Second, setting up a new cluster isn't going to be easy.  Finding
physical space, making sure enough utilities can be supplied to
support the hardware, staffing up, etc.  are not only going to be
difficult but inherently takes time when instead you can simply sign
up to a cloud provider and have the project running within 24 hours.
Would HPC exist today as we know it if the ability to instantly turn
on a cluster existed at the beginning?

Third, albeit this is very speculative.  I suspect ML learning is
heading towards using custom hardware.  It has had a very good run
using GPU's, and a GPU will likely always be the entry point for
desktop ML, but unless Nvidia is holding back due to a lack of
competition is does appear the GPU is reaching and end to its
development much like CPUs have.  The latest hardware from Nvidia is
getting lacklustre reviews, and the bolting on of additional things
like raytracing is perhaps an indication that there are limits to how
much further the GPU architecture can be pushed.  The question then is
the ML market big enough to have that custom hardware as a OEM product
like a GPU or will it remain restricted to places like Google who can
afford to build it without the necessary overheads of a consumer
product.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-12-07 Thread Lux, Jim (337K) via Beowulf
Monolithic static binaries - better have a fat pipe to the server to load the 
container on your target.

On 12/7/18, 10:47 AM, "Beowulf on behalf of Jan Wender" 
 wrote:


> Am 07.12.2018 um 17:34 schrieb John Hanks :
> In my view containers are little more than incredibly complex static 
binaries

Thanks for this! I was wondering if I am the only one thinking it. 

- Jan
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-12-07 Thread Jan Wender

> Am 07.12.2018 um 17:34 schrieb John Hanks :
> In my view containers are little more than incredibly complex static binaries

Thanks for this! I was wondering if I am the only one thinking it. 

- Jan
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-12-07 Thread Lux, Jim (337K) via Beowulf


On 12/7/18, 8:46 AM, "Beowulf on behalf of Michael Di Domenico" 
 wrote:

On Fri, Dec 7, 2018 at 11:35 AM John Hanks  wrote:
>
>  But, putting it in a container wouldn't make my life any easier and 
would, in fact, just add yet another layer of something to keep up to date.

i think the theory behind this is the containers allow the sysadmins
to kick the can down the road and put the onus of updates on the
container developer.  but then you get into a circle of trust issue,
whereby now you have to trust the container developers are doing
something sane and in a timely manner.

a perfect example that we pitched up to our security team was (this
was few year ago mind you); what happens when someone embeds openssl
libraries in the container.  who's responsible for updating them?
what happens when that container gets abandoned by the dev?  and those
containers are running with some sort of docker/root privilege
menagire.  this was back when openssl had bugs coming up left and
right.  yeah, that conversation stopped dead in its tracks and we put
a moratorium on docker.

but i don't think the theory lines up with the practice, and that's
why dev's shouldn't be doing ops


this is a generic problem in areas other than HPC.  Over the past few years, a 
fair amount of the software I'm working with is targeted to spacecraft 
platforms - We had an interesting exercise over the past couple years.  I was 
porting a standard orbit propagation package (SGP4, see 
http://www.celestrak.com/ for the Pascal version from 2000), which is available 
in many different languages. I happened to be implementing the C version in 
RTEMS running on a SPARC V8 processor (the LEON2 and LEON3, as it happens).  
The software itself is quite compact, has no dependencies other than math.h, 
stdio.h, stdlib.h, and derives from an original Fortran version.  RTEMS is a 
real time operating system that exposes POSIX API, so it's easy to work with.  
What we did is create a wrapper for SGP that matches a standardized set of APIs 
for software radios (Space Telecommunications Radio System, STRS).

But here's the problem - There are really 4 different target hardware 
platforms, all theoretically the same, but not. In the space flight software 
business, one chooses a toolchain and development environment at the beginning 
of the project (Phase A - Formulation) and you stay with that for the life of 
the mission, unless there's a compelling reason to change.   In the course of 
the last 10 years, we've gone through 5 versions of RTEMS 
(4.8.4.10,4.11,4.12,5.0), 3 different source management tools (cvs,svn,git), an 
IDE that came and went (Eclipse), not to mention a variety of versions of the 
gcc toolchain.  Each mission has its own set of all of this. And, a bunch of 
homegrown make files and related build processes. And, of course, it's a 
hodgepodge of CentOS, Scientific Linux, Ubuntu, Debian, and RH, depending on 
what was the "most supported distro" at the time the mission picked it (which 
might depend on who the SysAdmin on the project was). 

10 years is *forever* in the software development world. I've not yet had the 
experience of a developer born after the first version of the flight software 
they're working on was created - but I know that other people at JPL have (when 
it takes 7 years to get to where you're going, and the mission lasts 10-15 
years after that...).  And this is perfectly reasonable - SGP4, for instance, 
basically implements the laws of physics as a numerical model - it worked fine 
in 2000, it works fine now, it's going to work just fine in 2030, with no 
significant changes. "The SGP4 and SDP4 models were published along with sample 
code in FORTRAN IV in 1988 with refinements over the original model to handle 
the larger number of objects in orbit since" (Wikipedia article on SGP)

So, "inheriting" the SGP4 propagator from one project into another is not just 
a matter of moving the source code for SGP. You have to compile it with all the 
other stuff, and there are myriad hidden dependencies - does this platform have 
hardware floating point or software emulated floating point, and if the latter, 
which of several flavors.  Where in the source tree (for that project) does it 
sit? What's the permissions strategy? Where do you add it in the build process?

And then contemplate propagating a bug fix over all those platforms.  You might 
make a decision to propagate a change to some, but not all platforms - Maybe 
the spacecraft you're contemplating is getting towards the end of its life, and 
you'll never use the function you developed 4 years ago again. Do you put that 
bug fix to address the incorrect gravitation parameter at Mars into the systems 
that are orbiting Earth?

Yes - folks have said "put it in containers" and in the last few years, folks 
have started spinning up VMs to manage this. Historically, we keep 

Re: [Beowulf] HPC workflows

2018-12-07 Thread John Hanks
On Fri, Dec 7, 2018 at 7:20 AM John Hearns via Beowulf 
wrote:

> Good points regarding packages shipped with distributions.
> One of my pet peeves (only one? Editor) is being on mailiing lists for HPC
> software such as OpenMPI and Slurm and seeing many requests along the lines
> of
> "I installed PackageX on my cluster" and then finding fromt he replies
> that the versiion is a very out of date one delivered by the distribution's
> repositories.
>
> The other day I Was interacting with someone who was using a CentOS 6.5
> cluster on the Julia discussion list. His cluster uses the original SGE
> version.
>

This is one of my long term pet peeves, I call it the "IT Drone Stable
Software Release Delusion". It manifests itself as "version compatibility
matrices" and a side effect are things like RHEL 5 clusters that "can't be
updated" but have a massive software stack where pretty much everything in
the OS has been manually rebuilt by hand, up to and including the kernel,
to get the latest versions. These clusters become ideal places to run
containers because that's the only way to get a modern OS past the sysadmin
and onto the cluster.

The delusion is usually strongest in places where there are people who
justify their existence via "Change Management/Change Control" meetings,
but it can creep into any environment in subtle ways like "we never run .0
releases..." This delusion dovetails into an underlying fear of change,
sees the word "freeze" bandied about a lot and ultimately leads to
environments where significant amounts of pain are passed on to the users
while the IT drones (or IT drone apprentices) study the ancient scrolls of
compatible version matrices and try to get or maintain ITIL certification
(with bonus points achieved if they are wearing a six sigma black belt).

griznog
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-12-07 Thread Michael Di Domenico
On Fri, Dec 7, 2018 at 11:35 AM John Hanks  wrote:
>
>  But, putting it in a container wouldn't make my life any easier and would, 
> in fact, just add yet another layer of something to keep up to date.

i think the theory behind this is the containers allow the sysadmins
to kick the can down the road and put the onus of updates on the
container developer.  but then you get into a circle of trust issue,
whereby now you have to trust the container developers are doing
something sane and in a timely manner.

a perfect example that we pitched up to our security team was (this
was few year ago mind you); what happens when someone embeds openssl
libraries in the container.  who's responsible for updating them?
what happens when that container gets abandoned by the dev?  and those
containers are running with some sort of docker/root privilege
menagire.  this was back when openssl had bugs coming up left and
right.  yeah, that conversation stopped dead in its tracks and we put
a moratorium on docker.

but i don't think the theory lines up with the practice, and that's
why dev's shouldn't be doing ops
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-12-07 Thread John Hanks
On Fri, Dec 7, 2018 at 7:04 AM Gerald Henriksen  wrote:

> On Wed, 5 Dec 2018 09:35:07 -0800, you wrote:
>
> Now obviously you could do what for example Java does with a jar file,
> and simply throw everything into a single rpm/deb and ignore the
> packaging guidelines, but then you are back to in essence creating a
> container and just hiding it behind a massive rpm/deb.
>
>
An rpm/deb packaged container? You better trademark that as soon as
possible, I had completely overlooked circular abstractions as an option :)

We have plenty of examples now about the limitations and dangers of
distributing software via models like nodejs/npm, python/pip, etc. I make
some effort to keep a build of R current with all libraries from CRAN and
Bioconductor reasonably up to date and it's an ongoing, rolling nightmare.
I'm aware that packaging it in rpm/deb would be extremely difficult and
there are plenty more examples of this. But, putting it in a container
wouldn't make my life any easier and would, in fact, just add yet another
layer of something to keep up to date. For some things a pragmatic approach
of treating it as a standalone entity loosely linked to the OS is
warranted. But at that point I'm faced with doing with N amount of effort
where N = "amount of effort to install software" or N + C where C = "work
out containerizing and extra step of pulling container when running". N + C
> N for any controlled OS environment like a cluster or, one would hope,
production environments which handle my credit card and private data. In my
view it follows then that one would only choose to do the N + C effort if
one had extra warm bodies around that
needed tasks.

I might also consider it from this perspective. I pull plugins into my vim
config without ever entertaining the thought that they should have been
packaged as rpms/debs. Maybe some software just doesn't deserve to be
packaged? Either because it already has a means of distribution in
userspace (plugins, pip, npm, cran,...) or because it's just crap software.
I accept that containers may be the best way to deal with crap, as
previously noted.


> Option 2 worked 20 years ago when we only cared about 2 or 3
> distributions of Linux and had a lot less open source / free software.
> But, unfortunately, it does not scale and so for that reason (and a
> few others) the effort to create Docker / npm, maven, etc. is the
> lesser of the options.
>
>
I think the more accurate phrase there is "lesser of the evils". Every time
I get into a discussion about this topic it feels as if the discussion is
about determining which rusty saw is the best choice to saw off my own leg
and when I suggest "maybe we shouldn't saw off our own legs?" I'm met with
a chorus of "but griznog, EVERYONE is sawing of their own leg with X, Y and
Z these days!! You simply must attend the next Leg Sawing Conference..."
Electron and nodejs is a great example. No one seems to step back and
consider that perhaps the entire concept is broken and maybe wasn't a good
idea. But unfortunately it's quite easy to blame the packaging tools and at
this point many careers and reputations are intertwined with nodejs
continuing to be popular, so the people in the best position to consider it
was a bad idea are unlikely to consider that.

In my view containers are little more than incredibly complex static
binaries and I've already sat through years and years of discussions about
the problems of static binaries and concluded they are great for a few
narrow cases but not the best solution in general. So, 'cat
every_argument_against_static_binaries_ever | sed s/static
binary/container/g'. That containers require some minimal amount of setuid
root or a daemon to wrap root access adds a significant bit of weight to
those criticisms.

I realize I'm the odd person out here and that containers are a part of the
fabric now, for better or worse. In the same way I'm forever stuck dealing
with the clusterfsck that Java is, I'll be stuck with containers. To that
end massive kudos to singularity for at least bringing some semblance of
sanity to the mix. FWIW, I think containers are better than Java, but I
will still grumble as I slowly begin dragging the rusty blade back and
forth across my thigh.

griznog
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-12-07 Thread John Hearns via Beowulf
Good points regarding packages shipped with distributions.
One of my pet peeves (only one? Editor) is being on mailiing lists for HPC
software such as OpenMPI and Slurm and seeing many requests along the lines
of
"I installed PackageX on my cluster" and then finding fromt he replies that
the versiion is a very out of date one delivered by the distribution's
repositories.

The other day I Was interacting with someone who was using a CentOS 6.5
cluster on the Julia discussion list. His cluster uses the original SGE
version.
I created a test CentOS 6.5 cluster using Vagrant and Ansible, and found to
my horror that Gridengine RPMs are available out of the box with CentOS 6.5
Now let me make something clear - a good few years ago I installed SGE on
customer clusters, and became somewhat of an expert in SGE and MPI
integration.
But in 2018? Would I advise installing the original Sun version ff SGE?
No.  (I am not referring to Univa etc which is excellent)

There is deifnitely a place for packaging and delivery of up to date
software stacks for HPC.
If I might mention Bright Computing - that is what they do./ The compile up
(or instance) SLurm and put it on their own repos.
So you can have a tested set of packages without continually rolling your
own.

I hate to say it, I think the current generation of WEb developers, who
will incorporate some Javascript from an online repo to do a bitshift
(I am referrin gto the famous package which the developer took down and
which affected thousands of web sites),
are only too ready to install software without thinking from the Ubuntu
repos. That might work for web services stacks - but for HPC?

Perhaps for another thread:
Actually I went t the AWS USer Group in the UK on Wednesday. Ver
impressive, and there are the new Lustre filesystems and MPI networking.
I guess the HPC World will see the same philosophy of building your setup
using the AWS toolkit as Uber etc. etc. do today.
Also a lot of noise is being made at the moment about the convergence of
HPC and Machine Learning workloads.
Are we going to see the MAchine Learning folks adapting their workflows to
run on HPC on-premise bare metal clusters?
Or are we going to see them go off and use AWS (Azure, Google ?)































On Fri, 7 Dec 2018 at 16:04, Gerald Henriksen  wrote:

> On Wed, 5 Dec 2018 09:35:07 -0800, you wrote:
>
> >Certainly the inability of distros to find the person-hours to package
> >everything plays a role as well, your cause and effect chain there is
> >pretty accurate. Where I begin to branch is at the idea of software that
> is
> >unable to be packaged in an rpm/deb.
>
> In some convenient timing, the following was posted by overtmind on
> Reddit discussing why Atom hasn't been packaged for Fedora(*):
>
> ---
> "This means, for every nodejs dependency Electron needs - and there
> are a metric #$%# ton - since you can't use npm as an
> installer/package manager - you need to also package all of those and
> make sure they're in fedora and up-to-date, and then you also need to
> package all of the non-nodejs dependencies that come along with
> Electron apps, such as electron itself, and THEN you need to extract
> and remove all of the vendor'd libraries and binaries that essentially
> make Electron work, and THEN you need to make sure that there's no
> side-car'd non-free or questionable software that is forbidden in
> fedora also, like ffmpeg. G'head and look at the Chromium SPEC, it's a
> living nightmare (Spot godbless your heart)"
> ---
>
> Now obviously you could do what for example Java does with a jar file,
> and simply throw everything into a single rpm/deb and ignore the
> packaging guidelines, but then you are back to in essence creating a
> container and just hiding it behind a massive rpm/deb.
>
> >The thing we can never measure and thus can only speculate about forever
> >is:  if all the person-hours poured into containers (and pypi/pip and cran
> >and cpan and maven and scons and ...) had been poured into rpm/deb
> >packaging would we just be simply apt/yum/dnf installing what we needed
> >today? (I'm ignoring other OS/packaging tools, but you get the idea.)
>
> I (theoretically) could write a new library in
> Python/Perl/Javascript/Go/etc. and with very minimal effort can place
> that library in the repository for that language with minimal effort.
> My library is now available to everyone using that language regardless
> of what OS they are using.
>
> Alternately, I could spend many, many hours perhaps even days learning
> multiple different packaging systems, joining multiple different
> mailing lists / bugzillas / build systems, so that I can make my
> library easily available to people on Windows, macOS, Fedora, RHEL,
> openSUSE, Debian, Ubuntu, ...  - or alternately hope that someone will
> not only take the time to package my library for all those different
> platforms, but also commit the future time to keep it up to date.
>
> Option 2 worked 20 years ago when 

Re: [Beowulf] HPC workflows

2018-12-07 Thread Gerald Henriksen
On Wed, 5 Dec 2018 09:35:07 -0800, you wrote:

>Certainly the inability of distros to find the person-hours to package
>everything plays a role as well, your cause and effect chain there is
>pretty accurate. Where I begin to branch is at the idea of software that is
>unable to be packaged in an rpm/deb.

In some convenient timing, the following was posted by overtmind on
Reddit discussing why Atom hasn't been packaged for Fedora(*):

---
"This means, for every nodejs dependency Electron needs - and there
are a metric #$%# ton - since you can't use npm as an
installer/package manager - you need to also package all of those and
make sure they're in fedora and up-to-date, and then you also need to
package all of the non-nodejs dependencies that come along with
Electron apps, such as electron itself, and THEN you need to extract
and remove all of the vendor'd libraries and binaries that essentially
make Electron work, and THEN you need to make sure that there's no
side-car'd non-free or questionable software that is forbidden in
fedora also, like ffmpeg. G'head and look at the Chromium SPEC, it's a
living nightmare (Spot godbless your heart)"
---

Now obviously you could do what for example Java does with a jar file,
and simply throw everything into a single rpm/deb and ignore the
packaging guidelines, but then you are back to in essence creating a
container and just hiding it behind a massive rpm/deb.

>The thing we can never measure and thus can only speculate about forever
>is:  if all the person-hours poured into containers (and pypi/pip and cran
>and cpan and maven and scons and ...) had been poured into rpm/deb
>packaging would we just be simply apt/yum/dnf installing what we needed
>today? (I'm ignoring other OS/packaging tools, but you get the idea.)

I (theoretically) could write a new library in
Python/Perl/Javascript/Go/etc. and with very minimal effort can place
that library in the repository for that language with minimal effort.
My library is now available to everyone using that language regardless
of what OS they are using.

Alternately, I could spend many, many hours perhaps even days learning
multiple different packaging systems, joining multiple different
mailing lists / bugzillas / build systems, so that I can make my
library easily available to people on Windows, macOS, Fedora, RHEL,
openSUSE, Debian, Ubuntu, ...  - or alternately hope that someone will
not only take the time to package my library for all those different
platforms, but also commit the future time to keep it up to date.

Option 2 worked 20 years ago when we only cared about 2 or 3
distributions of Linux and had a lot less open source / free software.
But, unfortunately, it does not scale and so for that reason (and a
few others) the effort to create Docker / npm, maven, etc. is the
lesser of the options.

* - https://www.reddit.com/r/Fedora/comments/a3q1a2/atom_editoride/
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-12-05 Thread John Hanks
I think you do a better job explaining the underpinnings of my frustration
with it all, but then arrive at a slightly different set of conclusions.
I'd be the last to say autotools isn't complex, in fact pretty much all
build systems eventually reach an astounding level of complexity. But I'm
not sure copy/paste of an autotools recipe is any more egregious than a
copy/paste of cmake or any others. That seems to be the first step to
learning how anything works, so I accept that it'll happen a lot with
anything. Credit for a huge portion of my limited success is owed to
copy/paste from stack overflow of things I didn't understand at first, so I
can't really throw rocks at the practice. People tend to dig as deep as
they need to to get something to work and then wander off unless they are
getting paid to keep digging or just like a particular hole.

Certainly the inability of distros to find the person-hours to package
everything plays a role as well, your cause and effect chain there is
pretty accurate. Where I begin to branch is at the idea of software that is
unable to be packaged in an rpm/deb. This is where our collective computing
train goes off the rails. Reaching a point where something is too
complicated to package with a set of tools which have the ability to run
*any arbitrary set of commands* and concluding the solution is to invent a
new set of tools/abstractions to run *any arbitrary set of commands* is the
derailing step. Worse, with containers that set of arbitrary commands
generally starts out by running "apt-get install ...", a precarious
dependency for an abstraction layer whose primary claim to fame is getting
past the limitations of those same commands.

I'll rephrase my earlier complaint about the rise of these abstractions as
"I can't figure out how to write spec files so I'll create a new
build/distribution system" which is, basically, a variation on the classic
argument from ignorance. Maybe that has merit if these new tools turned out
to actually be simpler and easier, but to date that hasn't been the case.
The water just keeps getting muddier.

The thing we can never measure and thus can only speculate about forever
is:  if all the person-hours poured into containers (and pypi/pip and cran
and cpan and maven and scons and ...) had been poured into rpm/deb
packaging would we just be simply apt/yum/dnf installing what we needed
today? (I'm ignoring other OS/packaging tools, but you get the idea.) We
can't run that experiment, but I suspect that it isn't a limitation of
rpm/deb to be able to package so much as it is that there is no incentive
to package in rpm/deb, but there is an incentive to invent/monetize new
abstraction layers. It doesn't hurt that humans crave novelty, so new is
always more appealing than old and without a good grasp of the old it's
impossible to properly evaluate the new relative to it. How does the old
quote go, "those who cannot remember the past are condemned to repeat it."
I look forward to ever more complex methods to package containers once we
have containers that are too complex to deliver as containers. Our only
real hope is that eventually human language will run out of metaphors to
use when monetizing the next big abstraction.

griznog

On Tue, Dec 4, 2018 at 9:51 PM Gerald Henriksen  wrote:

> On Mon, 3 Dec 2018 10:12:10 -0800, you wrote:
>
> > And then I realized that I was seeing
> >software which was "easier to containerize" and that "easier to
> >containerize" really meant "written by people who can't figure out
> >'./configure; make; make install' and who build on a sand-like foundation
> >of fragile dependencies to the extent that it only runs on their Ubuntu
> >laptop so you have to put their Ubuntu laptop in a container."
>
> The problem is that essentially nobody knows how autotools works, so
> that those C/Fortran codes that use it have usually copy/pasted
> something until it seems to work.
>
> So 2 things happened.
>
> First, all the non-traditional languages created their own build
> systems, and more importantly their own package management systems.
> This developed because most development was happening on non-Linux
> systems, because Linux still struggles on laptops and laptops have
> taken over the non-server computer world.  It also happened because
> those developers using Linux, or at least aware of deploying on Linux,
> rebelled at the limitations of the Linux ecosystem (namely
> libraries/components that hadn't been natively packaged, or the normal
> conflict of the "wrong" version being packaged).
>
> A side effect of all these package management systems is that they are
> frequently hostile to the "Linux way", and create software that is
> essentially unable to be packaged into RPM or deb format.
>
> The other issue of course is that open source won, and the explosion
> of open source means the distributions no longer have the person-power
> not just to package everything, but for those packages to do much of
> the heavy 

Re: [Beowulf] HPC workflows

2018-12-04 Thread Tony Brian Albers
On Tue, 2018-12-04 at 11:20 -0500, Prentice Bisbal via Beowulf wrote:
> On 12/3/18 2:44 PM, Michael Di Domenico wrote:
> > On Mon, Dec 3, 2018 at 1:13 PM John Hanks 
> > wrote:
> > >   From the perspective of the software being containerized, I'm
> > > even more skeptical. In my world (bioinformatics) I install a lot
> > > of crappy software. We're talking stuff resulting from "I read
> > > the first three days of 'learn python in 21 days' and now I'm an
> > > expert, just run this after installing these 17 things from
> > > pypi...and trust the output" I'm good friends with crappy
> > > software, we hang out together a lot. To me it just doesn't feel
> > > like making crappy software more portable is the *right* thing to
> > > do. When I walk my dog, I follow him with a bag and
> > > "containerize" what drops out. It makes it easier to carry
> > > around, but doesn't change what it is. As of today I see the
> > > biggest benefit of containers as that they force a developer to
> > > actually document the install procedure somewhere in a way that
> > > actually has to work so we can see firsthand how ridiculous it is
> > > (*cough* tensorflow *cough*).
> > 
> > I vote this the single best explanation of containers I've heard
> > all year... :)
> > ___
> 
> I send that motion. All in favor say "aye".
> 
> --
> 
> Prentice
> 
> 
> 

Aye!


-- 
-- 
Tony Albers
Systems Architect
Systems Director, National Cultural Heritage Cluster
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-12-04 Thread Gerald Henriksen
On Mon, 3 Dec 2018 10:12:10 -0800, you wrote:

> And then I realized that I was seeing
>software which was "easier to containerize" and that "easier to
>containerize" really meant "written by people who can't figure out
>'./configure; make; make install' and who build on a sand-like foundation
>of fragile dependencies to the extent that it only runs on their Ubuntu
>laptop so you have to put their Ubuntu laptop in a container."

The problem is that essentially nobody knows how autotools works, so
that those C/Fortran codes that use it have usually copy/pasted
something until it seems to work.

So 2 things happened.

First, all the non-traditional languages created their own build
systems, and more importantly their own package management systems.
This developed because most development was happening on non-Linux
systems, because Linux still struggles on laptops and laptops have
taken over the non-server computer world.  It also happened because
those developers using Linux, or at least aware of deploying on Linux,
rebelled at the limitations of the Linux ecosystem (namely
libraries/components that hadn't been natively packaged, or the normal
conflict of the "wrong" version being packaged).

A side effect of all these package management systems is that they are
frequently hostile to the "Linux way", and create software that is
essentially unable to be packaged into RPM or deb format.

The other issue of course is that open source won, and the explosion
of open source means the distributions no longer have the person-power
not just to package everything, but for those packages to do much of
the heavy lifting in keeping the software up to date.

As for autotools, it to is now being abandoned with the 2 leading
contenders being cmake and meson, but it being C++ the chaos wouldn't
be complete with multiple competing package management solutions...

> Then I
>started asking myself "do I want to trust software of that quality?" And
>after that, "do I want to trust the tools written to support that type of
>poor-quality software?"

On the other hand can you really trust the software built in more
traditional ways? see OpenSSL / Heartbleed.

>From the perspective of the software being containerized, I'm even more
>skeptical. In my world (bioinformatics) I install a lot of crappy software.
>We're talking stuff resulting from "I read the first three days of 'learn
>python in 21 days' and now I'm an expert, just run this after installing
>these 17 things from pypi...and trust the output" I'm good friends with
>crappy software, we hang out together a lot. To me it just doesn't feel
>like making crappy software more portable is the *right* thing to do. When
>I walk my dog, I follow him with a bag and "containerize" what drops out.
>It makes it easier to carry around, but doesn't change what it is. As of
>today I see the biggest benefit of containers as that they force a
>developer to actually document the install procedure somewhere in a way
>that actually has to work so we can see firsthand how ridiculous it is
>(*cough* tensorflow *cough*).

All very true.  To paraphrase, containers are the best of a bunch of
bad options.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-12-04 Thread Prentice Bisbal via Beowulf


On 12/3/18 2:44 PM, Michael Di Domenico wrote:

On Mon, Dec 3, 2018 at 1:13 PM John Hanks  wrote:

  From the perspective of the software being containerized, I'm even more skeptical. In my world 
(bioinformatics) I install a lot of crappy software. We're talking stuff resulting from "I 
read the first three days of 'learn python in 21 days' and now I'm an expert, just run this after 
installing these 17 things from pypi...and trust the output" I'm good friends with crappy 
software, we hang out together a lot. To me it just doesn't feel like making crappy software more 
portable is the *right* thing to do. When I walk my dog, I follow him with a bag and 
"containerize" what drops out. It makes it easier to carry around, but doesn't change 
what it is. As of today I see the biggest benefit of containers as that they force a developer to 
actually document the install procedure somewhere in a way that actually has to work so we can see 
firsthand how ridiculous it is (*cough* tensorflow *cough*).

I vote this the single best explanation of containers I've heard all year... :)
___


I send that motion. All in favor say "aye".

--

Prentice


___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-12-03 Thread Michael Di Domenico
On Mon, Dec 3, 2018 at 1:13 PM John Hanks  wrote:
>
> From the perspective of the software being containerized, I'm even more 
> skeptical. In my world (bioinformatics) I install a lot of crappy software. 
> We're talking stuff resulting from "I read the first three days of 'learn 
> python in 21 days' and now I'm an expert, just run this after installing 
> these 17 things from pypi...and trust the output" I'm good friends with 
> crappy software, we hang out together a lot. To me it just doesn't feel like 
> making crappy software more portable is the *right* thing to do. When I walk 
> my dog, I follow him with a bag and "containerize" what drops out. It makes 
> it easier to carry around, but doesn't change what it is. As of today I see 
> the biggest benefit of containers as that they force a developer to actually 
> document the install procedure somewhere in a way that actually has to work 
> so we can see firsthand how ridiculous it is (*cough* tensorflow *cough*).

I vote this the single best explanation of containers I've heard all year... :)
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-12-03 Thread John Hanks
On Fri, Nov 30, 2018 at 9:44 PM John Hearns via Beowulf 
wrote:

> John, your reply makes so many points which could start a whole series of
> debates.
>

I would not deny partaking of the occasional round of trolling.


>  > Best use of our time now may well be to 'rm -rf SLURM' and figure out
> how to install kubernetes.
> ...
>


> My own thoughts on HPC for a tightly coupled, on premise setup is that we
> need a lightweight OS on the nodes, which does the bare minimum. No general
> purpose utilities, no GUIS, nothing but network and storage. And container
> support.
> The cluster will have the normal login nodes of course but will present
> itself as a 'black box' to run containers.
> But - given my herd analogy above - will we see that? Or will we see
> private Openstack setups?
>

10 years ago, maybe even 5 I would have agreed with you wholeheartedly. I
was never impressed much by early LXC, but for my first year of exposure to
Docker hype I was thinking exactly what you are saying here. And then I
tried CoreOS and started missing having a real OS. And then I started
trying to do things with containers. And then I realized that I was seeing
software which was "easier to containerize" and that "easier to
containerize" really meant "written by people who can't figure out
'./configure; make; make install' and who build on a sand-like foundation
of fragile dependencies to the extent that it only runs on their Ubuntu
laptop so you have to put their Ubuntu laptop in a container." Then I
started asking myself "do I want to trust software of that quality?" And
after that, "do I want to trust the tools written to support that type of
poor-quality software?" And then I started to notice how much containers
actually *increased* the amount of time/complexity it took to manage
software. And then I started enjoying all the container engine bugs... At
that point, reality squished the hype for me because I had other stuff I
needed to get done and didn't have budget to hire a devops person to sit
around mulling these things over.

>From the perspective of the software being containerized, I'm even more
skeptical. In my world (bioinformatics) I install a lot of crappy software.
We're talking stuff resulting from "I read the first three days of 'learn
python in 21 days' and now I'm an expert, just run this after installing
these 17 things from pypi...and trust the output" I'm good friends with
crappy software, we hang out together a lot. To me it just doesn't feel
like making crappy software more portable is the *right* thing to do. When
I walk my dog, I follow him with a bag and "containerize" what drops out.
It makes it easier to carry around, but doesn't change what it is. As of
today I see the biggest benefit of containers as that they force a
developer to actually document the install procedure somewhere in a way
that actually has to work so we can see firsthand how ridiculous it is
(*cough* tensorflow *cough*).

I got sidetracked on a rant again. Your proposed solution works fine in an
IT style computing world, it needs the exact staff IT wants to grow these
days and instead of just a self-directed sysadmin it has the potential to
need a project manager. I don't see it showing up on many lab/office
clusters anytime soon though because it's a model that embraces hype first
and in an environment not focused on publishing or press releases around
hype, it's a lot of extra work/cost/complexity for very little real
benefit.  While you (and many on this list) might be interested in
exploring the technical merits of the approach, it's actual utility really
hits home for people who require that extra complexity and layered
abstraction to justify themselves. The understaffed/overworked among us
will just write a shell/job script and move along to the next raging fire
to put out.

griznog


>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Fri, 30 Nov 2018 at 23:04, John Hanks  wrote:
>
>>
>>
>> On Thu, Nov 29, 2018 at 4:46 AM Jon Forrest  wrote:
>>
>>
>>> I agree completely. There is and always be a need for what I call
>>> "pretty high performance computing", which is the highest performance
>>> computing you can achieve, given practical limits like funding, space,
>>> time, ... Sure there will always people who can figure out how to go
>>> faster, but PHPC is pretty good.
>>>
>>>
>> What a great term, PHPC. That probably describes the bulk of all "HPC"
>> oriented computing being done today, if you consider all cores in use down
>> to the lab/workbench level of clustering. Certainly for my userbase
>> (bioinformatics) the computational part of a project often is a small
>> subset of the total time spent on it and time to total solution is the most
>> important metric for them. It's rare for us to try to get that last 10% or
>> 20% of performance gain.
>>
>> This has been a great thread overall, but I think no one is
>> considering the elephant in the room. Technical arguments are not winning
>> out in any of these 

Re: [Beowulf] HPC workflows

2018-12-02 Thread Gerald Henriksen
On Sat, 1 Dec 2018 06:43:05 +0100, you wrote:

>My own thoughts on HPC for a tightly coupled, on premise setup is that we
>need a lightweight OS on the nodes, which does the bare minimum. No general
>purpose utilities, no GUIS, nothing but network and storage. And container
>support.

One of the latest attempts at this is Fedora CoreOS, the merger of
Fedora Atomic and CoreOS (which Red Hat bought).

https://coreos.fedoraproject.org/

>The cluster will have the normal login nodes of course but will present
>itself as a 'black box' to run containers.
>But - given my herd analogy above - will we see that? Or will we see
>private Openstack setups?

Maybe, Red Hat appears to be moving in that direction as well with a
Red Hat CoreOS offering with OpenShift though how it all ends up is
yet to be seen I suspect.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC Workflows

2018-12-02 Thread Tim Cutts
Ho ho.  Yes, there is rarely anything completely new.  Old ideas get dusted 
off, polished up, and packaged slightly differently.  At the end of the day, a 
Dockerfile is just a script to build your environment, but it has the advantage 
now of doing it in a reasonably standard way, rather than whatever random 
method any of us might have come up with independently in the past.  Adding 
central repositories and allowing you to base one Dockerfile on top of another 
are nice additions.  Neither of those ideas is new either, of course.

I agree with your last sentence, we in scientific IT definitely need to stick 
very close to the scientists.  But I don’t necessarily agree with the whole of 
the last paragraph.  IT may appear to be there to justify more IT, but I think 
that’s a vicious circle.  Many organisations see IT (whether scientific or 
enterprise) as a cost centre rather than as a strategic tool to meet their 
goals.  They become forced into justifying their own existence, and of course 
the political game then becomes they have to seek to expand in order to stay 
the same size, otherwise they will be cut.  Many years ago I listened to a talk 
by Joe Baguley from VMware, who said he could determine which was the case or 
not when talking to a customer with a single question,  which was:  “Who does 
your CIO report to?”  CEO:  good, CFO: bad

I suspect there’s some truth in that.

Regards,

Tim

> On 1 Dec 2018, at 14:29, John Hanks  wrote:
> 
> For me personally I just assume it's my lack of vision that is the problem. I 
> was submitting VMs as jobs using SGE well over 10 years ago. Job scripts that 
> build the software stack if it's not found? 15 or more. Never occurred to me 
> to call it "cloud" or "containerized", it was just a few stupid scripts to 
> solve some specific problem we had. I look at containers and cloud now and 
> just don't get it. Early in my career I had a mentor who was from the IBM 
> mainframe world. I recall excitedly explaining what I was playing around with 
> with early Xen versions and he said "Yeah, we've been doing that for a long 
> time." Now it's my turn to say "Yeah, I was doing that years ago" and scratch 
> my head at what all the fuss is about. Such are the effects of the ravages of 
> time.
> 
> Keeping distance from IT is always a good idea. The first rule of interacting 
> with IT is: "IT is not here to solve your problem, IT is here to justify more 
> IT. If your problem is solved, then it is the result of random chance, do not 
> look for patterns." Best to sit as close to the scientists as possible.




-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC Workflows

2018-12-01 Thread John Hanks
For me personally I just assume it's my lack of vision that is the problem.
I was submitting VMs as jobs using SGE well over 10 years ago. Job scripts
that build the software stack if it's not found? 15 or more. Never occurred
to me to call it "cloud" or "containerized", it was just a few stupid
scripts to solve some specific problem we had. I look at containers and
cloud now and just don't get it. Early in my career I had a mentor who was
from the IBM mainframe world. I recall excitedly explaining what I was
playing around with with early Xen versions and he said "Yeah, we've been
doing that for a long time." Now it's my turn to say "Yeah, I was doing
that years ago" and scratch my head at what all the fuss is about. Such are
the effects of the ravages of time.

Keeping distance from IT is always a good idea. The first rule of
interacting with IT is: "IT is not here to solve your problem, IT is here
to justify more IT. If your problem is solved, then it is the result of
random chance, do not look for patterns." Best to sit as close to the
scientists as possible.

griznog

On Sat, Dec 1, 2018 at 12:53 AM  wrote:

> Yeah, I often thing some people are using the letters HPC as in 'high
> profile computing' nowadays. The diluting effect I mentoined a few posts
> ago.
>
> Actually LOT of HPC admin folks I know are scientists, scientificly active
> and tightly coupled to scientists in groups and they were doing DevOps
> even before it got that fancy name.
>
> Here at my uni I find it is *regular* IT that needed and is (indeed in a
> strange herdy way) adjusting to the pace and workings of scientists and
> science. They are using old concepts that have recently relabeled into
> configuration management or stack deployments etc. (as if we used to
> install them hundreds of HPC nodes all by hand or something?) Things there
> get re introduced by them commercial-here-is-my-bill-consultants with a
> fancy gui klickemy-thingy oh and an icon. Let me tell you, now they are
> doing them stand up comedy things there with their scum mates every day
> and are talking about 'fragile' software and saying things like 'moving
> fast breaking things' ?!
>
> Ha, all i can say is i am keeping my distance there cause there is a lot
> of branding and marketing buzzword speak involved and it is giving me a
> rash ! Back to running my Molecular Quantum Dynamics I am.
>
> m.
>
> --
> mark somers
> tel: +31715274437
> mail: m.som...@chem.leidenuniv.nl
> web:  http://theorchem.leidenuniv.nl/people/somers
>
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC Workflows

2018-12-01 Thread m . somers
Yeah, I often thing some people are using the letters HPC as in 'high
profile computing' nowadays. The diluting effect I mentoined a few posts
ago.

Actually LOT of HPC admin folks I know are scientists, scientificly active
and tightly coupled to scientists in groups and they were doing DevOps
even before it got that fancy name.

Here at my uni I find it is *regular* IT that needed and is (indeed in a
strange herdy way) adjusting to the pace and workings of scientists and
science. They are using old concepts that have recently relabeled into
configuration management or stack deployments etc. (as if we used to
install them hundreds of HPC nodes all by hand or something?) Things there
get re introduced by them commercial-here-is-my-bill-consultants with a
fancy gui klickemy-thingy oh and an icon. Let me tell you, now they are
doing them stand up comedy things there with their scum mates every day
and are talking about 'fragile' software and saying things like 'moving
fast breaking things' ?!

Ha, all i can say is i am keeping my distance there cause there is a lot
of branding and marketing buzzword speak involved and it is giving me a
rash ! Back to running my Molecular Quantum Dynamics I am.

m.

-- 
mark somers
tel: +31715274437
mail: m.som...@chem.leidenuniv.nl
web:  http://theorchem.leidenuniv.nl/people/somers


___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-11-29 Thread Jon Forrest



On 11/27/2018 4:51 AM, Michael Di Domenico wrote:


this seems a bit too stringent of a statement for me.  i don't dismiss
or disagree with your premise, but i don't entirely agree that HPC
"must" change in order to compete. 


I agree completely. There is and always be a need for what I call
"pretty high performance computing", which is the highest performance
computing you can achieve, given practical limits like funding, space,
time, ... Sure there will always people who can figure out how to go
faster, but PHPC is pretty good.

Jon Forrest

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-11-28 Thread Jonathan Engwall
You can probably fork from a central repo.

> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-11-28 Thread Gerald Henriksen
On Wed, 28 Nov 2018 13:51:05 +0100, you wrote:

>Now I am all for connecting divers and flexible workflows to true HPC systems 
>and grids that feel different if not experienced
>with (otherwise what is the use of a computer if there are no users making use 
>of it?), but do not make the mistake of thinking
>everything is cloud or will be cloud soon that fast. 

The "cloud" is a massive business that is currently growing fast.

Will it take over everything or continue its growth forever, of course
not.

But dismissing it is equally a dangerous thing to do, particularly if
your job relies on something not being in the cloud.

>So, one could say bare metal cloud have arisen mostly because of this but they 
>also do come with expenses. Somehow I find that a
>simple rule always seems to apply; if more people in a scheme need to be paid, 
>the scheme is probably more expensive than
>alternatives, if available. Or state differently; If you can do things 
>yourself, it is always a cheaper option than let some
>others do things (under normal 'open market' rules and excluding the option of 
>slavery :)).

But this is one area where the cloud can often win - the scale of the
Azure/Google/AWS operations means that you get 24/7/365 coverage with
essentially the lowest possible labour overhead.

And the fact is that while much of society insists on making decisions
purely based on cost - see airfares for example - there are a lot of
cases where people are willing to pay a premium for a service/product
that "just works".

>One has to note that in academia one often is in the situation that grants are 
>obtained to buy hardware and that running costs
>(i.e. electricity and rack space) are matched by the university making the 
>case of spending the grant money on paying amazone or
>google to do your 'compute' not so sensible if you can do things yourself.

Currently.

If on premise HPC doesn't reflect the ease of use that can be found
elsewhere, combined with some lobbying by the existing or specialized
cloud providers, and those grants could become a lot more flexible.

And given that many/most/all universities are often short on space and
they may well welcome an opportunity to be able to repurpose an
existing cluster space...

>There is also another aspect when for example dealing with sensitive data you 
>are to be helt responsible for. The Cloud model is
>not so friendly under those circumstances either. Again your data is put "on 
>someone else's computer". Thinking of GDPR and
>such.

I don't think this is so clear an advantage to on premise as some
think.

I think the fact that we are all on this mailing list in order to
learn and discuss issues puts us as an outlier - there are very few
people participating on this list, and even allowing for discussions
happening on other sites I (sadly) suspect you will find that the
majority of people running HPC aren't as informed as they should be.

Who do you trust more to keep your data safe - to keep systems
patched, to keep firewalls up to date, to properly configure
everything, etc.?  Is it your local HPC, where maybe they are
struggling to hire staff, or can't afford to offer a "good enough"
salary, or simply can't justify hiring a security specialist?  Or
perhaps you go with Google or Microsoft, who have entire departments
of staff dealing with these issues, who monitor their networks full
time looking for flaws?

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-11-28 Thread Eliot Eshelman
Those interested in providing user-friendly HPC might want to take a 
look at Open OnDemand. I'm not affiliated with this project, but wanted 
to make sure it got a plug. I've heard good things so far.


http://openondemand.org/

Eliot



On 11/26/18 10:26, John Hearns via Beowulf wrote:

This may not be the best place to discuss this - please suggest a better
forum if you have one.
I have come across this question in a few locations. Being specific, I am a
fan of the Julia language. Ont he Juia forum a respected developer recently
asked what the options were for keeping code developed on a laptop in sync
with code being deployed on an HPC system.
There was some discussion of having Git style repositories which can be
synced to/from.
My suggestion was an ssh mount of the home directory on the HPC system,
which I have configured effectively int he past when using remote HPC
systems.

At a big company I worked with recently, the company provided home
directories on NFS Servers. But the /home/username directory on the HPC was
different - on higher performance storage. The 'company' home was mounted -
so you could copy between them. But we did have the inevitable incidents of
jobs being run from company NFS - and pulling code across the head node
interfaces etc.

Developers these days are used to carrying their Mac laptops around and
working at hotdesks, at home, at conferences. ME too - and I love it.
Though I have a lovely HP Spectre Ultrabook.
Again their workflow is to develop on the laptop and upload code to Github
type repositories. Then when running on a cloud service the software ids
downloaded from the Repo.
There are of course HPC services on the cloud, with gateways to access them.

This leads me to ask - shoudl we be presenting HPC services as a 'cloud'
service, no matter that it is a non-virtualised on-premise setup?
In which case the way to deploy software would be via downloading from
Repos.
I guess this is actually more common nowadays.

I think out loud that many HPC codes depend crucially on a $HOME directory
being presnet on the compute nodes as the codes look for dot files etc. in
$HOME. I guess this can be dealt with by fake $HOMES which again sync back
to the Repo.

And yes I know containerisation may be the saviour here!

Sorry for a long post.


___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-11-28 Thread INKozin via Beowulf
On Wed, 28 Nov 2018 at 11:33, Bogdan Costescu  wrote:

> On Mon, Nov 26, 2018 at 4:27 PM John Hearns via Beowulf <
> beowulf@beowulf.org> wrote:
>
>> I have come across this question in a few locations. Being specific, I am
>> a fan of the Julia language. Ont he Juia forum a respected developer
>> recently asked what the options were for keeping code developed on a laptop
>> in sync with code being deployed on an HPC system.
>>
>


> I think out loud that many HPC codes depend crucially on a $HOME directory
>> being presnet on the compute nodes as the codes look for dot files etc. in
>> $HOME. I guess this can be dealt with by fake $HOMES which again sync back
>> to the Repo.
>>
>
> I don't follow you here... $HOME, dot files, repo, syncing back? And why
> "Repo" with capital letter, is it supposed to be a name or something
> special?
>

I think John is talking here about doing version control on whole HOME
directories but trying to be mindful of dot files such as .bashrc and
others which can be application or system specific. The first thing which
comes to mind is to use branches for different cluster systems. However
this also taps into backup (which is another important topic since HOME
dirs are not necessarily backed up). There could be a working solution
which makes use of recursive repos and git lfs support but pruning old
history could still be desirable. Git would minimize the amount of storage
because it's hash based. While this could make it possible to replicate
your environment "wherever you go", a/ you would drag a lot history around
and b/ a significantly different mindset is required to manage the whole
thing. A typical HPC user may know git clone but generally is not a git
adept. Developers are different and, who knows John, maybe someone will
pick up your idea.

Is gitfs any popular?

In my HPC universe, people actually not only need code, but also data -
> usually LOTS of data. Replicating the code (for scripting languages) or the
> binaries (for compiled stuff) would be trivial, replicating the data would
> not. Also pulling the data in or pushing it out (f.e. to/from AWS) on the
> fly whenever the instance is brought up would be slow and costly. And by
> the way this is in no way a new idea - queueing systems have for a long
> time the concept of "pre" and "post" job stages, which could be used to
> pull in code and/or data to the node(s) on which the node would be running
> and clean up afterwards.
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-11-28 Thread John Pellman
>
> If HPC doesn't make it easy for these users to transfer their workflow
> to the cluster, and the cloud providers do, then the users will move
> to using the cloud even if it costs them 10%, 20% more because at the
> end of the day it is about getting the job done and not about spending
> time to work to antiquated methods of putting jobs in a cluster.
>


> And of course if the users would rather spend their department budgets
> with Amazon, Azure, Google, or others then the next upgrade cycle
> their won't be any money for the in house cluster...


Interestingly enough, Cornell has been adopting a sort of compromise
between traditional HPC and cloud computing by maintaining an
AWS-compatible private cloud on-prem (Red Cloud
).  I'd speculate
that this would have the advantage of preventing researchers from "going
rogue" and foregoing traditional HPC groups entirely by going directly to
AWS.

On Tue, Nov 27, 2018 at 7:42 PM Gerald Henriksen  wrote:

> On Tue, 27 Nov 2018 07:51:06 -0500, you wrote:
>
> >On Mon, Nov 26, 2018 at 9:50 PM Gerald Henriksen 
> wrote:
> >> On Mon, 26 Nov 2018 16:26:42 +0100, you wrote:
> >> If on premise HPC doesn't change to reflect the way the software is
> >> developed today then the users will in the future prefer cloud HPC.
> >>
> >> I guess it is a brave new world for on premise HPC as far as that the
> >> users now, and likely more in the future, will have alternatives thus
> >> forcing the on premise HPC to "compete" in order to survive.
> >
> >this seems a bit too stringent of a statement for me.  i don't dismiss
> >or disagree with your premise, but i don't entirely agree that HPC
> >"must" change in order to compete.  We've all heard this kind of stuff
> >in the past if x doesn't change y will take over the world!
>
> HPC, like most things, exists to get something done.
>
> If HPC doesn't change to reflect the changes in society and the way
> the software is developed (*) then the users will look for more modern
> ways to replace traditional HPC.  As noted the software is no longer
> developed on workstations that are connected to the lab/company
> network but rather on laptops that stay with the user wherever they
> go.
>
> This in turn is at least in part what has driven to the rise of
> distributed version control, git in particular.
>
> If HPC doesn't make it easy for these users to transfer their workflow
> to the cluster, and the cloud providers do, then the users will move
> to using the cloud even if it costs them 10%, 20% more because at the
> end of the day it is about getting the job done and not about spending
> time to work to antiquated methods of putting jobs in a cluster.
>
> And of course if the users would rather spend their department budgets
> with Amazon, Azure, Google, or others then the next upgrade cycle
> their won't be any money for the in house cluster...
>
>
> * - note the HPC isn't unique in this regard.  The Linux distributions
> are facing their own version of this, where much of the software is no
> longer packagable in the traditional sense as it instead relies on
> language specific packaging systems and languages that don't lend
> themselves to the older rpm/deb style system.
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-11-28 Thread mark somers
As a follow up note on workflows,

we also have used 'sshfs like constructs' to help non technical users to 
compute things on local clusters, the actual CERN grid
infrastructure and on (national) super computers. We built some middleware 
suitable for that many moons ago:

http://lgi.tc.lic.leidenuniv.nl/LGI/

Works great for python coded workflows on workstations so coming back to the 
'sshfs trick':

We have some organic chemists here doing many many many Gaussian calculations 
and only knowing windows. They do this by creating
input files using the gui of Gaussian on their workstations and save them in a 
special directory that is synced using
SyncBackPro to a CentOS server. On that server a python script runs via cron 
every 5 min to push these input files for Gaussian
into our LGI setup. Compute resources hooked up in our LGI that can do Gaussian 
pick up those jobs, run them using slurm /
torque / glite or whatever is suitable on that compute resource and eventually 
upload results into the LGI repository again. The
cron python job on the CentOS server notices finished jobs in the LGI queue and 
downloads the results into a special output
directory and removes the job from the LGI queue. Now the windows workstation 
with SynBackPro again retrieves the outputs to the
windows share they all use. This has been running 24x7 for several years now 
without a glitch using super computers, the actual
grid and local clusters without these organic chemists having to worry about 
unix or details like that.

So I can concur, a seemingly simple 'sshfs trick' should not be underestimated 
:).

We also have many unix literate users here using the python api to build 
workflows via LGI or the simple cli interface of LGI to
submit jobs from their workstations. 

m.

-- 
mark somers
tel: +31715274437
mail: m.som...@chem.leidenuniv.nl
web:  http://theorchem.leidenuniv.nl/people/somers
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-11-28 Thread John Hearns via Beowulf
MArk, again I do not have time to give your answer justice today.
However, as you are in NL, can you send me some olliebollen please? I am a
terrible addict.

On Wed, 28 Nov 2018 at 13:52, mark somers 
wrote:

> Well, please be careful in naming things:
>
> http://cloudscaling.com/blog/cloud-computing/grid-cloud-hpc-whats-the-diff/
>
> (note; The guy only heard about MPI and does not consider SMP based codes
> using i.e. OpenMP, but he did understand there are
> different things being talked about).
>
> Now I am all for connecting divers and flexible workflows to true HPC
> systems and grids that feel different if not experienced
> with (otherwise what is the use of a computer if there are no users making
> use of it?), but do not make the mistake of thinking
> everything is cloud or will be cloud soon that fast.
>
> Bare with me for a second:
>
> There are some very fundamental problems when dealing with large scale
> parallel programs (OpenMP) on virtual machines (most of
> the cloud). Google for papers talking about co-scheduling. All VM
> specialists I know and talked with, state generally that using
> more than 4 cores in a VM is not smart and one should switch to bare metal
> then. Don't believe it? Google for it or just try it
> yourself by doing a parallel scaling experiment and fitting Amdahls law
> through your measurements.
>
> So, one could say bare metal cloud have arisen mostly because of this but
> they also do come with expenses. Somehow I find that a
> simple rule always seems to apply; if more people in a scheme need to be
> paid, the scheme is probably more expensive than
> alternatives, if available. Or state differently; If you can do things
> yourself, it is always a cheaper option than let some
> others do things (under normal 'open market' rules and excluding the
> option of slavery :)).
>
> Nice read for some background:
>
> http://staff.um.edu.mt/carl.debono/DT_CCE3013_1.pdf
>
> One has to note that in academia one often is in the situation that grants
> are obtained to buy hardware and that running costs
> (i.e. electricity and rack space) are matched by the university making the
> case of spending the grant money on paying amazone or
> google to do your 'compute' not so sensible if you can do things yourself.
> Also given the ease of deploying an HPC cluster
> nowadays with OpenHPC or something commercial like Qlustar or Bright, it
> will be hard pressed to justify long term bare metal
> cloud usage in these settings.
>
> Those were some technical and economical considerations that play a role
> in things.
>
> There is also another aspect when for example dealing with sensitive data
> you are to be helt responsible for. The Cloud model is
> not so friendly under those circumstances either. Again your data is put
> "on someone else's computer". Thinking of GDPR and
> such.
>
> So, back to the point, some 'user driven' workloads might end up on clouds
> or on bare-metal on-premisse clouds (seems to be the
> latest fad right now) but clearly not everything. Especially if the
> workloads are not 'user driven' but technology (or
> economically or socially driven) i.e. there is no other way of doing it
> except using some type of (specialized) technology (or
> it is just not allowed). I therefore also am of opinion that cloud
> computing is also not true (traditional) HPC and that the
> term HPC has been diluted over the year by commercial interest / marketing
> speak.
>
> BTW, on a side note / rant; The mathematics we are dealing with here are
> the constraints to be met in optimising things. The
> constraints actually determine the final optimal case (
> https://en.wikipedia.org/wiki/Lagrange_multiplier) and people tend to
> 'ignore' or not specify the constraints in their arguments about what is
> the best or optimal thing to do. So what I did here is
> I have given you some example of constraints (technical, economical and
> social) in the 'everything will be cloud' rhetoric to
> keep an eye on before drawing any conclusions about what the future might
> bring :).
>
> just my little opinion though...
>
> Disclaimer; I could be horribly wrong :).
>
> --
> mark somers
> tel: +31715274437
> mail: m.som...@chem.leidenuniv.nl
> web:  http://theorchem.leidenuniv.nl/people/somers
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-11-28 Thread mark somers
Well, please be careful in naming things:

http://cloudscaling.com/blog/cloud-computing/grid-cloud-hpc-whats-the-diff/

(note; The guy only heard about MPI and does not consider SMP based codes using 
i.e. OpenMP, but he did understand there are
different things being talked about).

Now I am all for connecting divers and flexible workflows to true HPC systems 
and grids that feel different if not experienced
with (otherwise what is the use of a computer if there are no users making use 
of it?), but do not make the mistake of thinking
everything is cloud or will be cloud soon that fast. 

Bare with me for a second:

There are some very fundamental problems when dealing with large scale parallel 
programs (OpenMP) on virtual machines (most of
the cloud). Google for papers talking about co-scheduling. All VM specialists I 
know and talked with, state generally that using
more than 4 cores in a VM is not smart and one should switch to bare metal 
then. Don't believe it? Google for it or just try it
yourself by doing a parallel scaling experiment and fitting Amdahls law through 
your measurements.

So, one could say bare metal cloud have arisen mostly because of this but they 
also do come with expenses. Somehow I find that a
simple rule always seems to apply; if more people in a scheme need to be paid, 
the scheme is probably more expensive than
alternatives, if available. Or state differently; If you can do things 
yourself, it is always a cheaper option than let some
others do things (under normal 'open market' rules and excluding the option of 
slavery :)).

Nice read for some background:

http://staff.um.edu.mt/carl.debono/DT_CCE3013_1.pdf

One has to note that in academia one often is in the situation that grants are 
obtained to buy hardware and that running costs
(i.e. electricity and rack space) are matched by the university making the case 
of spending the grant money on paying amazone or
google to do your 'compute' not so sensible if you can do things yourself. Also 
given the ease of deploying an HPC cluster
nowadays with OpenHPC or something commercial like Qlustar or Bright, it will 
be hard pressed to justify long term bare metal
cloud usage in these settings.

Those were some technical and economical considerations that play a role in 
things. 

There is also another aspect when for example dealing with sensitive data you 
are to be helt responsible for. The Cloud model is
not so friendly under those circumstances either. Again your data is put "on 
someone else's computer". Thinking of GDPR and
such.

So, back to the point, some 'user driven' workloads might end up on clouds or 
on bare-metal on-premisse clouds (seems to be the
latest fad right now) but clearly not everything. Especially if the workloads 
are not 'user driven' but technology (or
economically or socially driven) i.e. there is no other way of doing it except 
using some type of (specialized) technology (or
it is just not allowed). I therefore also am of opinion that cloud computing is 
also not true (traditional) HPC and that the
term HPC has been diluted over the year by commercial interest / marketing 
speak.

BTW, on a side note / rant; The mathematics we are dealing with here are the 
constraints to be met in optimising things. The
constraints actually determine the final optimal case 
(https://en.wikipedia.org/wiki/Lagrange_multiplier) and people tend to
'ignore' or not specify the constraints in their arguments about what is the 
best or optimal thing to do. So what I did here is
I have given you some example of constraints (technical, economical and social) 
in the 'everything will be cloud' rhetoric to
keep an eye on before drawing any conclusions about what the future might bring 
:). 

just my little opinion though...

Disclaimer; I could be horribly wrong :).

-- 
mark somers
tel: +31715274437
mail: m.som...@chem.leidenuniv.nl
web:  http://theorchem.leidenuniv.nl/people/somers
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-11-28 Thread John Hearns via Beowulf
Bogdan, Igor. Thankyou very much for your thoughtful answers. I don not
have much time today to do your replies the justice of a proper answer.
Regarding the ssh filesystem, the scenario was that I was working for a
well known company.
We were running CFD simulations on remote academic HPC setups. There was
more than one site!
The corporate firewall allowed us an outgoing ssh connection. I found it a
lot easier to configure an sshfs mount so that engineers could transfer
programs and scripts between their locla system and the remote system,
rather than using a graphical or a command line ssh client.
The actual large data files were transferred by yours truly, via a USB disk
drive.

I did not know about gitfs (my bad). That sounds interesting.







On Wed, 28 Nov 2018 at 13:09, INKozin  wrote:

>
>
> On Wed, 28 Nov 2018 at 11:33, Bogdan Costescu  wrote:
>
>> On Mon, Nov 26, 2018 at 4:27 PM John Hearns via Beowulf <
>> beowulf@beowulf.org> wrote:
>>
>>> I have come across this question in a few locations. Being specific, I
>>> am a fan of the Julia language. Ont he Juia forum a respected developer
>>> recently asked what the options were for keeping code developed on a laptop
>>> in sync with code being deployed on an HPC system.
>>>
>>
>
>
>> I think out loud that many HPC codes depend crucially on a $HOME
>>> directory being presnet on the compute nodes as the codes look for dot
>>> files etc. in $HOME. I guess this can be dealt with by fake $HOMES which
>>> again sync back to the Repo.
>>>
>>
>> I don't follow you here... $HOME, dot files, repo, syncing back? And why
>> "Repo" with capital letter, is it supposed to be a name or something
>> special?
>>
>
> I think John is talking here about doing version control on whole HOME
> directories but trying to be mindful of dot files such as .bashrc and
> others which can be application or system specific. The first thing which
> comes to mind is to use branches for different cluster systems. However
> this also taps into backup (which is another important topic since HOME
> dirs are not necessarily backed up). There could be a working solution
> which makes use of recursive repos and git lfs support but pruning old
> history could still be desirable. Git would minimize the amount of storage
> because it's hash based. While this could make it possible to replicate
> your environment "wherever you go", a/ you would drag a lot history around
> and b/ a significantly different mindset is required to manage the whole
> thing. A typical HPC user may know git clone but generally is not a git
> adept. Developers are different and, who knows John, maybe someone will
> pick up your idea.
>
> Is gitfs any popular?
>
> In my HPC universe, people actually not only need code, but also data -
>> usually LOTS of data. Replicating the code (for scripting languages) or the
>> binaries (for compiled stuff) would be trivial, replicating the data would
>> not. Also pulling the data in or pushing it out (f.e. to/from AWS) on the
>> fly whenever the instance is brought up would be slow and costly. And by
>> the way this is in no way a new idea - queueing systems have for a long
>> time the concept of "pre" and "post" job stages, which could be used to
>> pull in code and/or data to the node(s) on which the node would be running
>> and clean up afterwards.
>>
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-11-28 Thread Bogdan Costescu
On Mon, Nov 26, 2018 at 4:27 PM John Hearns via Beowulf 
wrote:

> I have come across this question in a few locations. Being specific, I am
> a fan of the Julia language. Ont he Juia forum a respected developer
> recently asked what the options were for keeping code developed on a laptop
> in sync with code being deployed on an HPC system.
>

In keeping with the rest of the buzzwords, where does CI/CD fit between
"code developed" and "code being deployed"? Once you have a mechanism for
this, can't this be used for the final deployment? Or even CD could
automatically take care of that final deployment?


> There was some discussion of having Git style repositories which can be
> synced to/from.
>

Yes, that would work fine. Why would git not be compatible with an HPC
setup? And why restrict yourself to git and not talk about distributed
version control systems in general?


> My suggestion was an ssh mount of the home directory on the HPC system,
> which I have configured effectively int he past when using remote HPC
> systems.
>

I don't quite parse the first part of the phrase - care to
reformulate/elaborate?


> Again their workflow is to develop on the laptop and upload code to Github
> type repositories. Then when running on a cloud service the software ids
> downloaded from the Repo.
>

The way I read it, this is very much restricted to code that can be run
immediately after download, i.e. using a scripting language. That might fit
your HPC universe, but the parallel one I live in still mostly runs code
built and maybe even optimized on the HPC system it runs on. This includes
software delivered in binary form from ISVs, open source code (f.e.
GROMACS), or code developed in-house - they all have in common using an
internode (f.e. MPI) or intranode (OpenMP, CUDA) communication and/or
control library directly, not through a deep stack.


> There are of course HPC services on the cloud, with gateways to access
> them.
>
> This leads me to ask - shoudl we be presenting HPC services as a 'cloud'
> service, no matter that it is a non-virtualised on-premise setup?
>

What's in a name? It's called cloud computing today, but it was called grid
computing 10-15 years ago...

For many years, before the cloud-craze began, scientists might have had
access to some HPC resources in their own institution, in other
institutions in the same city, country, continent or even across
continents. How is this different from having access to an on-premise
install of f.e. OpenStack or a cloud computing offer somewhere else also
using OpenStack? The only advantage in some cases is that the on-premise
stuff might be better integrated with the "home" setup (i.e. common file
systems, common user management, or - why not? - better documentation :)),
which improves the user experience, but the functionality is very similar
or the same.

To come back to your initial topic - a git repo can just as well be sync-ed
to a login node of a cluster (wherever that is located) or to a VM in the
AWS cloud (wherever that is located).


> I think out loud that many HPC codes depend crucially on a $HOME directory
> being presnet on the compute nodes as the codes look for dot files etc. in
> $HOME. I guess this can be dealt with by fake $HOMES which again sync back
> to the Repo.
>

I don't follow you here... $HOME, dot files, repo, syncing back? And why
"Repo" with capital letter, is it supposed to be a name or something
special?

In my HPC universe, people actually not only need code, but also data -
usually LOTS of data. Replicating the code (for scripting languages) or the
binaries (for compiled stuff) would be trivial, replicating the data would
not. Also pulling the data in or pushing it out (f.e. to/from AWS) on the
fly whenever the instance is brought up would be slow and costly. And by
the way this is in no way a new idea - queueing systems have for a long
time the concept of "pre" and "post" job stages, which could be used to
pull in code and/or data to the node(s) on which the node would be running
and clean up afterwards.

Cheers,
Bogdan
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-11-28 Thread John Hearns via Beowulf
Julia packaging   https://docs.julialang.org/en/v1/stdlib/Pkg/index.html

On Wed, 28 Nov 2018 at 01:42, Gerald Henriksen  wrote:

> On Tue, 27 Nov 2018 07:51:06 -0500, you wrote:
>
> >On Mon, Nov 26, 2018 at 9:50 PM Gerald Henriksen 
> wrote:
> >> On Mon, 26 Nov 2018 16:26:42 +0100, you wrote:
> >> If on premise HPC doesn't change to reflect the way the software is
> >> developed today then the users will in the future prefer cloud HPC.
> >>
> >> I guess it is a brave new world for on premise HPC as far as that the
> >> users now, and likely more in the future, will have alternatives thus
> >> forcing the on premise HPC to "compete" in order to survive.
> >
> >this seems a bit too stringent of a statement for me.  i don't dismiss
> >or disagree with your premise, but i don't entirely agree that HPC
> >"must" change in order to compete.  We've all heard this kind of stuff
> >in the past if x doesn't change y will take over the world!
>
> HPC, like most things, exists to get something done.
>
> If HPC doesn't change to reflect the changes in society and the way
> the software is developed (*) then the users will look for more modern
> ways to replace traditional HPC.  As noted the software is no longer
> developed on workstations that are connected to the lab/company
> network but rather on laptops that stay with the user wherever they
> go.
>
> This in turn is at least in part what has driven to the rise of
> distributed version control, git in particular.
>
> If HPC doesn't make it easy for these users to transfer their workflow
> to the cluster, and the cloud providers do, then the users will move
> to using the cloud even if it costs them 10%, 20% more because at the
> end of the day it is about getting the job done and not about spending
> time to work to antiquated methods of putting jobs in a cluster.
>
> And of course if the users would rather spend their department budgets
> with Amazon, Azure, Google, or others then the next upgrade cycle
> their won't be any money for the in house cluster...
>
>
> * - note the HPC isn't unique in this regard.  The Linux distributions
> are facing their own version of this, where much of the software is no
> longer packagable in the traditional sense as it instead relies on
> language specific packaging systems and languages that don't lend
> themselves to the older rpm/deb style system.
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-11-28 Thread John Hearns via Beowulf
> * - note the HPC isn't unique in this regard.  The Linux distributions
> are facing their own version of this, where much of the software is no
> longer packagable in the traditional sense as it instead relies on
> language specific packaging systems and languages that don't lend
> themselves to the older rpm/deb style system.

Gerald, very well said. The term in the UK At the moment would be
'friction'. On-premise HPC has to be frictionless as cloud HPC.

 Note that I referred to Julia, which has a packaging system.
The Julia community has given a lot of thought to the packaging system for
1.0 and it has concepts such as environments different projects.

I hate to single out Python, but have experience of users using Anaconda
which means a huge variation in what everyone has.
And more importantly for HPC systems the packages are placed in the users
home directory (by default).
On the system I am thinking about there was very limited space on /home and
it was an NFS mount. MEaning any parallel program startup would
pull lots of data from NFS.

On Wed, 28 Nov 2018 at 01:42, Gerald Henriksen  wrote:

> On Tue, 27 Nov 2018 07:51:06 -0500, you wrote:
>
> >On Mon, Nov 26, 2018 at 9:50 PM Gerald Henriksen 
> wrote:
> >> On Mon, 26 Nov 2018 16:26:42 +0100, you wrote:
> >> If on premise HPC doesn't change to reflect the way the software is
> >> developed today then the users will in the future prefer cloud HPC.
> >>
> >> I guess it is a brave new world for on premise HPC as far as that the
> >> users now, and likely more in the future, will have alternatives thus
> >> forcing the on premise HPC to "compete" in order to survive.
> >
> >this seems a bit too stringent of a statement for me.  i don't dismiss
> >or disagree with your premise, but i don't entirely agree that HPC
> >"must" change in order to compete.  We've all heard this kind of stuff
> >in the past if x doesn't change y will take over the world!
>
> HPC, like most things, exists to get something done.
>
> If HPC doesn't change to reflect the changes in society and the way
> the software is developed (*) then the users will look for more modern
> ways to replace traditional HPC.  As noted the software is no longer
> developed on workstations that are connected to the lab/company
> network but rather on laptops that stay with the user wherever they
> go.
>
> This in turn is at least in part what has driven to the rise of
> distributed version control, git in particular.
>
> If HPC doesn't make it easy for these users to transfer their workflow
> to the cluster, and the cloud providers do, then the users will move
> to using the cloud even if it costs them 10%, 20% more because at the
> end of the day it is about getting the job done and not about spending
> time to work to antiquated methods of putting jobs in a cluster.
>
> And of course if the users would rather spend their department budgets
> with Amazon, Azure, Google, or others then the next upgrade cycle
> their won't be any money for the in house cluster...
>
>
> * - note the HPC isn't unique in this regard.  The Linux distributions
> are facing their own version of this, where much of the software is no
> longer packagable in the traditional sense as it instead relies on
> language specific packaging systems and languages that don't lend
> themselves to the older rpm/deb style system.
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-11-27 Thread Gerald Henriksen
On Tue, 27 Nov 2018 07:51:06 -0500, you wrote:

>On Mon, Nov 26, 2018 at 9:50 PM Gerald Henriksen  wrote:
>> On Mon, 26 Nov 2018 16:26:42 +0100, you wrote:
>> If on premise HPC doesn't change to reflect the way the software is
>> developed today then the users will in the future prefer cloud HPC.
>>
>> I guess it is a brave new world for on premise HPC as far as that the
>> users now, and likely more in the future, will have alternatives thus
>> forcing the on premise HPC to "compete" in order to survive.
>
>this seems a bit too stringent of a statement for me.  i don't dismiss
>or disagree with your premise, but i don't entirely agree that HPC
>"must" change in order to compete.  We've all heard this kind of stuff
>in the past if x doesn't change y will take over the world!

HPC, like most things, exists to get something done.

If HPC doesn't change to reflect the changes in society and the way
the software is developed (*) then the users will look for more modern
ways to replace traditional HPC.  As noted the software is no longer
developed on workstations that are connected to the lab/company
network but rather on laptops that stay with the user wherever they
go.

This in turn is at least in part what has driven to the rise of
distributed version control, git in particular.

If HPC doesn't make it easy for these users to transfer their workflow
to the cluster, and the cloud providers do, then the users will move
to using the cloud even if it costs them 10%, 20% more because at the
end of the day it is about getting the job done and not about spending
time to work to antiquated methods of putting jobs in a cluster.

And of course if the users would rather spend their department budgets
with Amazon, Azure, Google, or others then the next upgrade cycle
their won't be any money for the in house cluster...


* - note the HPC isn't unique in this regard.  The Linux distributions
are facing their own version of this, where much of the software is no
longer packagable in the traditional sense as it instead relies on
language specific packaging systems and languages that don't lend
themselves to the older rpm/deb style system.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-11-27 Thread Michael Di Domenico
On Mon, Nov 26, 2018 at 9:50 PM Gerald Henriksen  wrote:
> On Mon, 26 Nov 2018 16:26:42 +0100, you wrote:
> If on premise HPC doesn't change to reflect the way the software is
> developed today then the users will in the future prefer cloud HPC.
>
> I guess it is a brave new world for on premise HPC as far as that the
> users now, and likely more in the future, will have alternatives thus
> forcing the on premise HPC to "compete" in order to survive.

this seems a bit too stringent of a statement for me.  i don't dismiss
or disagree with your premise, but i don't entirely agree that HPC
"must" change in order to compete.  We've all heard this kind of stuff
in the past if x doesn't change y will take over the world!  I'm sure
we could come up with a heck of a list.  there is, and i believe will
always be, a large percentage of the "HPC" population that doesn't get
counted on the Top500 list and will not or can not use the cloud.

i also believe these are two separate issues.  in my opinion, how code
is developed shouldn't really have anything to do with how an HPC
resource is run.  having said that however, i suspect in a few years
there's going to be an "HPC Code" revolution.  The generic code base
is getting too complicated, (ie look at the mess openmpi has become)

---
"It's a machine, Schroeder. It doesn't get pissed off. It doesn't get
happy, it doesn't get sad, it doesn't laugh at your jokes. It just
runs programs." (Newton Crosby, 1986)
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Re: [Beowulf] HPC workflows

2018-11-26 Thread Gerald Henriksen
On Mon, 26 Nov 2018 16:26:42 +0100, you wrote:

>This leads me to ask - shoudl we be presenting HPC services as a 'cloud'
>service, no matter that it is a non-virtualised on-premise setup?
>In which case the way to deploy software would be via downloading from
>Repos.
>I guess this is actually more common nowadays.

Simple answer yes.

If on premise HPC doesn't change to reflect the way the software is
developed today then the users will in the future prefer cloud HPC.

I guess it is a brave new world for on premise HPC as far as that the
users now, and likely more in the future, will have alternatives thus
forcing the on premise HPC to "compete" in order to survive.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf