from:"John Hearns"

Re: [Beowulf] immersion

2024-04-08 Thread John Hearns

HPC Sysadmins will have to gain other skills
https://youtu.be/Jf8Sheh4MD4?si=La0KfEF6OGPRKA2-

On Sun, 7 Apr 2024 at 23:07, Scott Atchley 
wrote:

> On Sun, Mar 24, 2024 at 2:38 PM Michael DiDomenico 
> wrote:
>
>> i'm curious if others think DLC might hit a power limit sooner or later,
>> like Air cooling already has, given chips keep climbing in watts.
>>
>
> What I am worried about is power per blade/node.  The Cray EX design used
> in Frontier has a limit per blade. Frontier and El Cap have two nodes per
> blade. Aurora, which uses more power, only has one node per blade. I
> imagine that ORv3 racks will have similar issues.
>
> We can remove 6KW per blade today and I am confident that we can remove
> some more. That said, we could reach a point where a blade might just be a
> single processor.
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [External] position adverts?

2024-02-23 Thread John Hearns

There is a Jobs channel on hpc.social
Just saying

On Fri, Feb 23, 2024, 2:09 PM Michael DiDomenico 
wrote:

> Maybe we should come with some kind of standard/wording/what-have-you to
> post such.  I have some open positions as well.  might liven the list up a
> little too... :)
>
> On Thu, Feb 22, 2024 at 7:45 PM Douglas Eadline 
> wrote:
>
>>
>> > I've always thought employment opps were fine, but e-mails trying to
>> > sell a product were bad.
>>
>> Yea that has been the general rule. HPC is such an interesting community,
>> one day you are working for a vendor, then at some point you move to
>> a university or lab or vice versa, lather rinse repeat
>>
>> Seems to work
>>
>>
>> --
>> Doug
>>
>> >
>> > Prentice Bisbal
>> > Senior HPC Engineer
>> > Computational Sciences Department
>> > Princeton Plasma Physics Laboratory
>> > Princeton, NJ
>> > https://cs.pppl.gov
>> > https://www.pppl.gov
>> >
>> > On 2/22/24 2:35 PM, Joe Landman wrote:
>> >>
>> >> Hi fellow beowulfers
>> >>
>> >> Â  I don't know if its bad form to post job adverts here.Â  Day job
>> >> (@AMD) is looking for lots of HPC (and AI) folks, think
>> >> debugging/support/etc. .Â  Happy to talk with anyone about this.
>> >>
>> >> Â  Regards
>> >>
>> >> Joe
>> >>
>> >> --
>> >> Joe Landman
>> >> e:joe.land...@gmail.com
>> >> t: @hpcjoe
>> >> w:https://scalability.org
>> >> g:https://github.com/joelandman
>> >> l:https://www.linkedin.com/in/joelandman
>> >>
>> >> ___
>> >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
>> Computing
>> >> To change your subscription (digest mode or unsubscribe) visit
>> >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>> > ___
>> > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
>> Computing
>> > To change your subscription (digest mode or unsubscribe) visit
>> > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>> >
>>
>>
>> --
>> Doug
>>
>> ___
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] And wearing another hat ...

2023-11-13 Thread John Hearns

https://sealandgov.org/

Move to Sealand. It is a WW2 gun platform in the English Channel.
I believe the servers are down in the legs.

On Mon, 13 Nov 2023, 17:07 Joshua Mora,  wrote:

> Some folks trying to bypass legally government restrictions.
>
> Is land on the Moon or Mars on sale for locating next gen datacenters ?
>
>
> https://www.techradar.com/pro/the-first-ai-nation-a-ship-with-1-nvidia-h100-gpus-worth-dollar500-million-could-become-the-first-ever-sovereign-territory-that-relies-entirely-on-artificial-intelligence-for-its-future
>
> -- Original Message --
> Received: 09:07 AM CST, 11/11/2023
> From: "Douglas Eadline" 
> To: "Joshua Mora"  Cc: beowulf@beowulf.org
> Subject: Re: [Beowulf] And wearing another hat ...
>
> >
> > I was talking to a writer about this story idea.
> >
> > More to come.
> >
> > --
> > Doug
> >
> >
> > > It would be good to track how governments try to "regulate"
> > > technologies/materials/processes that have an impact on HPC (AI at
> scale
> > > fits
> > > into HPC) for good and for bad.
> > > It could be for instance as convoluted as DC emissions cap aligning to
> a
> > > climate policy.
> > >
> > > Joshua
> > >
> > > -- Original Message --
> > > Received: 01:29 PM CDT, 10/31/2023
> > > From: "Douglas Eadline"
> > > To: beowulf@beowulf.org
> > > Subject: [Beowulf] And wearing another hat ...
> > >
> > >> All:
> > >>
> > >> Back in July, I stepped into the Managing Editor role at HPCwire.
> > >> I'm covering for a staff sabbatical, and I will be in place through
> > >> December, including attending SC23.
> > >>
> > >> A few things:
> > >>
> > >> 1. As ME, I am interested in what types of topics you would like to
> see
> > >> covered on HPCwire (even if you don't read it)
> > >>
> > >> 2. Also, if you have something you think is particularly interesting
> > >> at SC23 (yours or someone else's), let me know.
> > >>
> > >> As you can imagine, HPCwire sits (or stands or gets knocked down)
> > >> directly in the HPC information fire-hose. I'm interested in HPC
> > >> efforts, projects, and ideas that may not make it into the fire hose
> > >> stream or may get missed.
> > >>
> > >> I hope to see you at SC23 (I will be wearing a blazer
> > >> sometimes!). Not on Monday night at the Bash, though. I believe
> > >> there is some T-shirt or wardrobe planned.
> > >>
> > >>
> > >> --
> > >> Doug
> > >>
> > >> ___
> > >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
> Computing
> > >> To change your subscription (digest mode or unsubscribe) visit
> > > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> > >
> > >
> > > ___
> > > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
> Computing
> > > To change your subscription (digest mode or unsubscribe) visit
> > > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> > >
> >
> >
> > --
> > Doug
> >
>
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] ib neighbor

2023-09-20 Thread John Hearns

netloc is the tool you want to use.
Look in the latest hwloc dovumentation

On Wed, 20 Sep 2023, 13:55 John Hearns,  wrote:

> I did manage to get the graphical netloc utility working once. Part of the
> hwloc/openmpi project.
>
> It produces a very pretty image of I topology. I think if you zoom in you
> can get neighbours.
> A few years since I used it.
>
> On Tue, 19 Sep 2023, 19:03 Michael DiDomenico, 
> wrote:
>
>> does anyone know if there's a simple command to pull the neighbor of
>> the an ib port?  for instance, this horrible shell command line
>>
>> # for x in `ibstat | awk -F \' '/^CA/{print $2}'`; do iblinkinfo -C
>> ${x} -n 1 -l | grep `hostname -s`; done
>> 0x08006900fbcc "SwitchX -  Mellanox Technologies"  41134   29[  ]
>> ==( 4X   14.0625 Gbps Active/  LinkUp)==>  0x88e9a4404704
>> 6111[  ] "logs01 HCA-1" ( )
>> 0x88e9a4404704 "  logs01 HCA-1"6111[  ]
>> ==( 4X   14.0625 Gbps Active/  LinkUp)==>  0x08006900fbcc
>> 41134   29[  ] "SwitchX -  Mellanox Technologies" ( )
>> 0x08006900fbdc "SwitchX -  Mellanox Technologies"  41219   29[  ]
>> ==( 4X   14.0625 Gbps Active/  LinkUp)==>  0x88e9a4404705
>> 101051[  ] "logs01 HCA-2" ( )
>> 0x88e9a4404705 "  logs01 HCA-2"  101051[  ]
>> ==( 4X   14.0625 Gbps Active/  LinkUp)==>  0x08006900fbdc
>> 41219   29[  ] "SwitchX -  Mellanox Technologies" ( )
>>
>> outputs what i need (though i only need the CA perspective), but it's
>> going to be an atrocious effort in text parsing.   would be nice if
>> there was a nice simple command, preferably that outputs json, but
>> that's likely wishful thinking
>> ___
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] ib neighbor

2023-09-20 Thread John Hearns

Does ibnetdiscover not help you?


On Tue, 19 Sep 2023, 19:03 Michael DiDomenico, 
wrote:

> does anyone know if there's a simple command to pull the neighbor of
> the an ib port?  for instance, this horrible shell command line
>
> # for x in `ibstat | awk -F \' '/^CA/{print $2}'`; do iblinkinfo -C
> ${x} -n 1 -l | grep `hostname -s`; done
> 0x08006900fbcc "SwitchX -  Mellanox Technologies"  41134   29[  ]
> ==( 4X   14.0625 Gbps Active/  LinkUp)==>  0x88e9a4404704
> 6111[  ] "logs01 HCA-1" ( )
> 0x88e9a4404704 "  logs01 HCA-1"6111[  ]
> ==( 4X   14.0625 Gbps Active/  LinkUp)==>  0x08006900fbcc
> 41134   29[  ] "SwitchX -  Mellanox Technologies" ( )
> 0x08006900fbdc "SwitchX -  Mellanox Technologies"  41219   29[  ]
> ==( 4X   14.0625 Gbps Active/  LinkUp)==>  0x88e9a4404705
> 101051[  ] "logs01 HCA-2" ( )
> 0x88e9a4404705 "  logs01 HCA-2"  101051[  ]
> ==( 4X   14.0625 Gbps Active/  LinkUp)==>  0x08006900fbdc
> 41219   29[  ] "SwitchX -  Mellanox Technologies" ( )
>
> outputs what i need (though i only need the CA perspective), but it's
> going to be an atrocious effort in text parsing.   would be nice if
> there was a nice simple command, preferably that outputs json, but
> that's likely wishful thinking
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] ib neighbor

2023-09-20 Thread John Hearns

I did manage to get the graphical netloc utility working once. Part of the
hwloc/openmpi project.

It produces a very pretty image of I topology. I think if you zoom in you
can get neighbours.
A few years since I used it.

On Tue, 19 Sep 2023, 19:03 Michael DiDomenico, 
wrote:

> does anyone know if there's a simple command to pull the neighbor of
> the an ib port?  for instance, this horrible shell command line
>
> # for x in `ibstat | awk -F \' '/^CA/{print $2}'`; do iblinkinfo -C
> ${x} -n 1 -l | grep `hostname -s`; done
> 0x08006900fbcc "SwitchX -  Mellanox Technologies"  41134   29[  ]
> ==( 4X   14.0625 Gbps Active/  LinkUp)==>  0x88e9a4404704
> 6111[  ] "logs01 HCA-1" ( )
> 0x88e9a4404704 "  logs01 HCA-1"6111[  ]
> ==( 4X   14.0625 Gbps Active/  LinkUp)==>  0x08006900fbcc
> 41134   29[  ] "SwitchX -  Mellanox Technologies" ( )
> 0x08006900fbdc "SwitchX -  Mellanox Technologies"  41219   29[  ]
> ==( 4X   14.0625 Gbps Active/  LinkUp)==>  0x88e9a4404705
> 101051[  ] "logs01 HCA-2" ( )
> 0x88e9a4404705 "  logs01 HCA-2"  101051[  ]
> ==( 4X   14.0625 Gbps Active/  LinkUp)==>  0x08006900fbdc
> 41219   29[  ] "SwitchX -  Mellanox Technologies" ( )
>
> outputs what i need (though i only need the CA perspective), but it's
> going to be an atrocious effort in text parsing.   would be nice if
> there was a nice simple command, preferably that outputs json, but
> that's likely wishful thinking
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] NFS alternative for 200 core compute (beowulf) cluster

2023-08-10 Thread John Hearns

I would look at BeeGFS here

On Thu, 10 Aug 2023, 20:19 leo camilo,  wrote:

> Hi everyone,
>
> I was hoping I would seek some sage advice from you guys.
>
> At my department we have build this small prototyping cluster with 5
> compute nodes,1 name node and 1 file server.
>
> Up until now, the name node contained the scratch partition, which
> consisted of 2x4TB HDD, which form an 8 TB striped zfs pool. The pool is
> shared to all the nodes using nfs. The compute nodes and the name node and
> compute nodes are connected with both cat6 ethernet net cable and
> infiniband. Each compute node has 40 cores.
>
> Recently I have attempted to launch computation from each node (40 tasks
> per node), so 1 computation per node.  And the performance was abysmal. I
> reckon I might have reached the limits of NFS.
>
> I then realised that this was due to very poor performance from NFS. I am
> not using stateless nodes, so each node has about 200 GB of SSD storage and
> running directly from there was a lot faster.
>
> So, to solve the issue,  I reckon I should replace NFS with something
> better. I have ordered 2x4TB NVMEs  for the new scratch and I was thinking
> of :
>
>
>- using the 2x4TB NVME in a striped ZFS pool and use a single node
>GlusterFS to replace NFS
>- using the 2x4TB NVME with GlusterFS in a distributed arrangement
>(still single node)
>
> Some people told me to use lustre,but I reckon that might be overkill. And
> I would only use a single fileserver machine(1 node).
>
> Could you guys give me some sage advice here?
>
> Thanks in advance
>
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] interconnect wars... again...

2023-07-31 Thread John Hearns

Please keep the list updated on what you find,

On Mon, 31 Jul 2023 at 20:08, Andrew Falgout 
wrote:

> Not ignoring you guys, literally have been moving.  We had to downsize,
> I've got no power and even my main computer is still powered off.  There's
> nothing more eerie than a quiet computer room.  I'm on an old laptop that I
> threw Linux Mint 21 on to be here now.  Okay.. to introduce myself a bit
> more.
> I've been doing linux for a long time, but been in a silo for a long
> time.  I feel like I've not used so many skills that I can't trust them
> anymore.  So I'm mentally just marking my cache as dirty and going to
> relearn as much as I can.  Great information so far.
>
> I have hardware and storage space to play with.  (Dell R930 112 core/600gb
> of ram)  The issue is getting a graphics card in them for compute is really
> not proving to be ideal.  I have about 4 of these machines, and I'd like to
> play around with clustering.  Learning how to properly and securely plan
> and implement them.  I've played around with docker, and have used multiple
> docker servers with portainer.  Next, when I get an electrician to install
> power, is to try to setup a kubernetes cluster.
> When I can get something with some decent compute (not this laptop), I'd
> like to learn how to train a small llm model using the cluster if
> possible.  I know I can do a good bit slowly with the CPU.  If I can get a
> GPU in the mix, doing that to speed things up.
> Again.. I would like to apologize for being quiet for so long.  I'll try
> to toss an "ack" in there from my phone if nothing else.
>
>
> ./Andrew Falgout
> KG5GRX
>
>
> On Mon, Jul 31, 2023 at 6:10 AM John Hearns  wrote:
>
>> A quick ack would be nice.
>>
>> On Fri, 28 Jul 2023, 06:38 John Hearns,  wrote:
>>
>>> Andrew, the answer is very much yes. I guess you are looking at the
>>> interface of 'traditional' HPC which uses workload schedulers and
>>> Kubernetes style clusters which use containers.
>>> Firstly I would ask if you are coming from the point of view of someone
>>> who wants to build a cluster in your home or company using kit which you
>>> already have.
>>> Or are you a company which wants to set up an AI infrastructure?
>>>
>>> By the way, I think you are thinking on a CPU cluster and scaling out
>>> using Beowulf concepts.
>>> In that case you are looking at Horovod
>>> https://github.com/horovod/horovod
>>> One thing though - for AI applications it is common to deploy Beowulf
>>> clusters which have servers with GPUs as part of their specification.
>>>
>>>
>>> I think it will be clear to you soon that you will be overwhelmed with
>>> options and opinions.
>>> Firstly join the hpc.social community and introduce yourself on the
>>> Slack channel introductions
>>> I would start with the following resources:
>>>
>>> https://www.clustermonkey.net/
>>> https://www.nvidia.com/en-gb/data-center/bright-cluster-manager/
>>> https://catalog.ngc.nvidia.com/containers
>>> https://openhpc.community/
>>> https://ciq.com/
>>> https://qlustar.com/
>>>
>>> https://www.delltechnologies.com/asset/en-nz/products/ready-solutions/technical-support/omnia-solution-overview.pdf
>>> https://omnia-doc.readthedocs.io/en/latest/index.html
>>>
>>> Does anyone know if the Bright Easy8 licenses are available? I would say
>>> that building  test cluster with Easy 8 would be the quickest way to get
>>> some hands on experience.
>>>
>>> You should of course consider cloud providers:
>>> https://aws.amazon.com/hpc/parallelcluster/
>>>
>>> https://azure.microsoft.com/en-gb/solutions/high-performance-computing/#intro
>>> https://cloud.google.com/solutions/hpc
>>> https://go.oracle.com/LP=134426
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, 28 Jul 2023 at 01:10, Andrew Falgout 
>>> wrote:
>>>
>>>> So I'm interested to see if a Beowulf Cluster could be used for Machine
>>>> Learning, LLM training, and LLM inference.  Anyone know where a good entry
>>>> point is for learning Beowulf Clustering?
>>>>
>>>>
>>>> ./Andrew Falgout
>>>> KG5GRX
>>>>
>>>>
>>>> On Wed, Jul 26, 2023 at 8:39 AM Michael DiDomenico <
>>>> mdidomeni...@gmail.com> wrote:
>>>>
>>>>> just a mailing list as far as i know.  it used to get a lot more
>>>>> traff

Re: [Beowulf] interconnect wars... again...

2023-07-31 Thread John Hearns

A quick ack would be nice.

On Fri, 28 Jul 2023, 06:38 John Hearns,  wrote:

> Andrew, the answer is very much yes. I guess you are looking at the
> interface of 'traditional' HPC which uses workload schedulers and
> Kubernetes style clusters which use containers.
> Firstly I would ask if you are coming from the point of view of someone
> who wants to build a cluster in your home or company using kit which you
> already have.
> Or are you a company which wants to set up an AI infrastructure?
>
> By the way, I think you are thinking on a CPU cluster and scaling out
> using Beowulf concepts.
> In that case you are looking at Horovod https://github.com/horovod/horovod
> One thing though - for AI applications it is common to deploy Beowulf
> clusters which have servers with GPUs as part of their specification.
>
>
> I think it will be clear to you soon that you will be overwhelmed with
> options and opinions.
> Firstly join the hpc.social community and introduce yourself on the Slack
> channel introductions
> I would start with the following resources:
>
> https://www.clustermonkey.net/
> https://www.nvidia.com/en-gb/data-center/bright-cluster-manager/
> https://catalog.ngc.nvidia.com/containers
> https://openhpc.community/
> https://ciq.com/
> https://qlustar.com/
>
> https://www.delltechnologies.com/asset/en-nz/products/ready-solutions/technical-support/omnia-solution-overview.pdf
> https://omnia-doc.readthedocs.io/en/latest/index.html
>
> Does anyone know if the Bright Easy8 licenses are available? I would say
> that building  test cluster with Easy 8 would be the quickest way to get
> some hands on experience.
>
> You should of course consider cloud providers:
> https://aws.amazon.com/hpc/parallelcluster/
>
> https://azure.microsoft.com/en-gb/solutions/high-performance-computing/#intro
> https://cloud.google.com/solutions/hpc
> https://go.oracle.com/LP=134426
>
>
>
>
>
>
>
> On Fri, 28 Jul 2023 at 01:10, Andrew Falgout 
> wrote:
>
>> So I'm interested to see if a Beowulf Cluster could be used for Machine
>> Learning, LLM training, and LLM inference.  Anyone know where a good entry
>> point is for learning Beowulf Clustering?
>>
>>
>> ./Andrew Falgout
>> KG5GRX
>>
>>
>> On Wed, Jul 26, 2023 at 8:39 AM Michael DiDomenico <
>> mdidomeni...@gmail.com> wrote:
>>
>>> just a mailing list as far as i know.  it used to get a lot more
>>> traffic, but seems to have simmered down quite a bit
>>>
>>> On Tue, Jul 25, 2023 at 6:50 PM Andrew Falgout 
>>> wrote:
>>> >
>>> > Just curious, do we have a discord channel, or just a mailing list?
>>> >
>>> >
>>> > ./Andrew Falgout
>>> > KG5GRX
>>> >
>>> >
>>> >
>>> > On Fri, Jul 21, 2023 at 9:12 AM Michael DiDomenico <
>>> mdidomeni...@gmail.com> wrote:
>>> >>
>>> >> ugh, as someone who worked the front lines in the 00's i got front row
>>> >> seat to the interconnect mud slinging...  but franky if they're going
>>> >> to come out of the gate with a product named "Ultra Ethernet", i smell
>>> >> a loser... :) (sarcasm...)
>>> >>
>>> >>
>>> https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/
>>> >> ___
>>> >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
>>> Computing
>>> >> To change your subscription (digest mode or unsubscribe) visit
>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>
>> ___
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] interconnect wars... again...

2023-07-27 Thread John Hearns

Andrew, the answer is very much yes. I guess you are looking at the
interface of 'traditional' HPC which uses workload schedulers and
Kubernetes style clusters which use containers.
Firstly I would ask if you are coming from the point of view of someone who
wants to build a cluster in your home or company using kit which you
already have.
Or are you a company which wants to set up an AI infrastructure?

By the way, I think you are thinking on a CPU cluster and scaling out using
Beowulf concepts.
In that case you are looking at Horovod https://github.com/horovod/horovod
One thing though - for AI applications it is common to deploy Beowulf
clusters which have servers with GPUs as part of their specification.

I think it will be clear to you soon that you will be overwhelmed with
options and opinions.
Firstly join the hpc.social community and introduce yourself on the Slack
channel introductions
I would start with the following resources:

https://www.clustermonkey.net/
https://www.nvidia.com/en-gb/data-center/bright-cluster-manager/
https://catalog.ngc.nvidia.com/containers
https://openhpc.community/
https://ciq.com/
https://qlustar.com/
https://www.delltechnologies.com/asset/en-nz/products/ready-solutions/technical-support/omnia-solution-overview.pdf
https://omnia-doc.readthedocs.io/en/latest/index.html

Does anyone know if the Bright Easy8 licenses are available? I would say
that building  test cluster with Easy 8 would be the quickest way to get
some hands on experience.

You should of course consider cloud providers:
https://aws.amazon.com/hpc/parallelcluster/
https://azure.microsoft.com/en-gb/solutions/high-performance-computing/#intro
https://cloud.google.com/solutions/hpc
https://go.oracle.com/LP=134426

On Fri, 28 Jul 2023 at 01:10, Andrew Falgout 
wrote:

> So I'm interested to see if a Beowulf Cluster could be used for Machine
> Learning, LLM training, and LLM inference.  Anyone know where a good entry
> point is for learning Beowulf Clustering?
>
>
> ./Andrew Falgout
> KG5GRX
>
>
> On Wed, Jul 26, 2023 at 8:39 AM Michael DiDomenico 
> wrote:
>
>> just a mailing list as far as i know.  it used to get a lot more
>> traffic, but seems to have simmered down quite a bit
>>
>> On Tue, Jul 25, 2023 at 6:50 PM Andrew Falgout 
>> wrote:
>> >
>> > Just curious, do we have a discord channel, or just a mailing list?
>> >
>> >
>> > ./Andrew Falgout
>> > KG5GRX
>> >
>> >
>> >
>> > On Fri, Jul 21, 2023 at 9:12 AM Michael DiDomenico <
>> mdidomeni...@gmail.com> wrote:
>> >>
>> >> ugh, as someone who worked the front lines in the 00's i got front row
>> >> seat to the interconnect mud slinging...  but franky if they're going
>> >> to come out of the gate with a product named "Ultra Ethernet", i smell
>> >> a loser... :) (sarcasm...)
>> >>
>> >>
>> https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/
>> >> ___
>> >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
>> Computing
>> >> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] interconnect wars... again...

2023-07-26 Thread John Hearns

All the cool kids are on hpc.Social

I am on the Slack there. I would encourage everyone to come over

On Wed, 26 Jul 2023, 14:39 Michael DiDomenico, 
wrote:

> just a mailing list as far as i know.  it used to get a lot more
> traffic, but seems to have simmered down quite a bit
>
> On Tue, Jul 25, 2023 at 6:50 PM Andrew Falgout 
> wrote:
> >
> > Just curious, do we have a discord channel, or just a mailing list?
> >
> >
> > ./Andrew Falgout
> > KG5GRX
> >
> >
> >
> > On Fri, Jul 21, 2023 at 9:12 AM Michael DiDomenico <
> mdidomeni...@gmail.com> wrote:
> >>
> >> ugh, as someone who worked the front lines in the 00's i got front row
> >> seat to the interconnect mud slinging...  but franky if they're going
> >> to come out of the gate with a product named "Ultra Ethernet", i smell
> >> a loser... :) (sarcasm...)
> >>
> >>
> https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/
> >> ___
> >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
> Computing
> >> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Your thoughts on the latest RHEL drama?

2023-06-28 Thread John Hearns

Rugged individuaiist? I like that...Me puts on plaid shirt and goes to
wrestle with some bears,,,

> Maybe it is time for an HPC Linux distro, this is where
Good move. I would say a lightweight distro that does not do much nd is
rebooted every time a job finishes.
Wonder what security types would think of that

Sidelining the discussion a bit I have been involved with projects where
security types insist on the entire stack for firmware upwards is kept up
to date.
This feeds into the Redha debate of course - if we go Debian how do you
satisfy corporate types?
i guess ubuntu has a role here.







On Tue, 27 Jun 2023 at 22:14, Douglas Eadline  wrote:

>
>
> A while ago, as a consultant, I managed an HPC cluster.
> The client was paying for RH licenses every year. (actually
> more than they were paying me). I asked them once "how often to you
> call RH with issues?" Their reply, "We don't, we call you
> because you understand HPC stuff."
>
> And of course, from the HW vendor, "you can't use that
> IB driver version because it has not been qualified to
> run on the hardware ... we only support RHEL version ..."
>
> In my experience, RH has never brought much value HPC.
> I'm not blaming or shamming, it is just not their thing.
> I'm just not sure you will get much help calling support
> with and opensm issue or new IB driver.
>
> And as we all know, HPC has always been the home of
> the "rugged individualists" (or "stubborn assholes"
> not sure which) that do their own thing
> to make things work. It is the nature of the HPC game.
>
> Maybe it is time for an HPC Linux distro, this is where
> Scientific Linux seemed to be headed, but then was stopped because
> CentOS worked just as well. And Red Hat's latest move would have
> killed it in any case.
>
> This is a much longer discussion, but now I have to go
> step on garden rakes while figuring out how to get some
> small Java package to build with Gradle (don't ask).
>
> --
> Doug
>
>
>
>
>
> > We're all ears...
> >
> >
> > Bill
> >
> > On 6/26/23 3:00 PM, Douglas Eadline wrote:
> >>
> >> I'll have more to say later and to me the irony of this situation is
> >> Red Hat has become what they were created to prevent*.
> >>
> >>
> >> --
> >> Doug
> >>
> >> * per conversations with Bob Young back in the day
> > We're all ears...
> > ___
> > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >
>
>
> --
> Doug
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Your thoughts on the latest RHEL drama?

2023-06-26 Thread John Hearns

There is a good discussion on this topic over on the Slack channel at
hpc.social
I would urge anyone on this list to join up there - you will find a home.
hpcsocial.slack.com


On Mon, 26 Jun 2023 at 19:27, Prentice Bisbal via Beowulf <
beowulf@beowulf.org> wrote:

> Beowulfers,
>
> By now, most of you should have heard about Red Hat's latest to eliminate
> any competition to RHEL. If not, here's some links:
>
> Red Hat's announcement:
> https://www.redhat.com/en/blog/furthering-evolution-centos-stream
>
> Alma Linux's response:
> https://almalinux.org/blog/impact-of-rhel-changes/
>
> Rocky Linux's response:
> https://rockylinux.org/news/2023-06-22-press-release/
>
> Software Freedom Conservancy's anaylsis of the situation:
> https://sfconservancy.org/blog/2023/jun/23/rhel-gpl-analysis/
>
> I'm writing to get your thoughts on this situation, as well as see what
> plans of action you are considering moving forward.
>
> Here are my thoughts:
>
> This is Red Hat biting the hands that feed them. Red Hat went from a small
> company operating out of a basement to a large global company thanks to
> open-source software. My first exposure to Linux was Red Hat Linux 4 in
> December 1996. I bought a physical, shrink-wrapped version with the
> commercial Metro-X X server to start learning Linux at home in my spare
> time shortly after graduation from college. I chose RHL because everything
> I read said RPM made it super easy to install and manage software (perfect
> for noobs like me), and the Metro-X X-server was far superior to any
> open-source X-server available at the time (which was just Xfree86,
> really). I felt good about giving RH my $40 for this not just because it
> would make it easier for me to learn Linux, but because it seemed like Red
> Hat were really the company that was going to take this underdog operating
> system and make it famous.
>
> They certainly achieved that goal, but along the way, I've seen them do a
> lot of anti-open-source things that I didn't like, leading me to change my
> image of them from champion of the underdog to the "Microsoft of Linux" to
> whatever my low opinion of them is now (Backstabber? Ingrate? Hypocrite?):
>
> 1. When they weren't making any money off a product they were giving away
> for free (Red Hat Linux, and "duh!"), they came out with an "Enterprise"
> version, that would still GPL-compliant, but you'd have to pay for
> subscriptions to get access to their update mechanism. To get people to buy
> into this model, they started spreading fear, uncertainty, and doubt (FUD),
> about "non-enterprise" Linux distributions, saying that any Linux
> distribution other than Red Hat Enterprise Linux (RHEL) wasn't reliable for
> use in any kind of enterprise that needed reliability.
>
> 2. When spreading FUD didn't work, RH killed of RHL entirely. If you
> wanted a free version of Red Hat, your only option was Rawhide, which was
> their development version for the next generation of RHEL, which was too
> unstable and unpredictable for enterprise needs (of course).
>
> 3. After RH starting contributing funding to GNOME development, the next
> major version of RHEL didn't install other desktops during the install. I
> remember RHEL saying this was a bug, but I've always suspected it was a
> deliberate act to reduce KDE market share and and give RH another area of
> the Linux ecosystem it could control. This, to me, was identical to
> Microsoft including IE with the OS to kill off Netscape. Now if you excuse,
> me, I need to go fashion a hat out of tin foil...
>
> 4. RH takes over control of CentOS, which at the time was the only
> competitor to RHEL. There used to be Scientific Linux (SL), which was
> maintained by the DOE at FermiLab, but FermiLab decided that the world
> didn't need both SL and CentOS, since they were essentially the same thing.
> Not long after, RHEL eliminates CentOS as a competitor by changing it to
> "CentOS  Stream" so it's no longer a competitor to RHEL. CentOS Stream is
> now a development version of sorts for RHEL, but I thought that was exactly
> what Fedora was for.
>
> 5. When Alma and Rocky pop-up to fill the void created by the killing of
> CentOS, RH does what it can to eliminate their access from RHEL source code
> so they can't be competitiors to RHEL, which brings us to today.
>
> Somewhere around event #3 is when I started viewing RHEL from as the MS of
> the Linux world for obvious reasons. It seems that RH is determined to make
> RHEL a monopoly of the "Enterprise Linux" market. Yes, I know there's
> Ubuntu and SLES, but Ubuntu is viewed as a desktop more than a server OS
> (IMO), and SLES hasn't really caught on, at least not in the US.
>
> I feel that every time the open-source community ratchets up efforts to
> preserve free alternatives to RHEL, RH ratchets up their efforts to
> eliminate any competition, so trying to stick with a free alternative to
> RHEL is ultimately going to be futile, so know is a good time to consider
>

Re: [Beowulf] [External] Re: old sm/sgi bios

2023-03-24 Thread John Hearns

That Supermicro board sounds like one of the boards from an ICE cluster,
right?
I know Joe flagged up the BIOS - thinking out loud is it not possible to
copy the BIOS from another, working, board of the same model?

Regarding SGI workstations when I worked in post production at Framestore
we had lots of SGI big beats - including O2000 and I think an O3000
We got the new dSGI x86 workstations - superbly engineered of course.
However they had a low voltage PCI slot setup which meant you could not use
normal off the shelf (read cheap) PCI cards
Clearly SGI made that decision from an engineering viewpoint - there must
be an advantage with low voltage.
Sadly those workstations really could not compete in price terms  with
commodity models such as Compaq






On Thu, 23 Mar 2023 at 22:13, Prentice Bisbal via Beowulf <
beowulf@beowulf.org> wrote:

> Mentioning it to him was pointless. You know the serenity prayer? Well, I
> had the serenity to accept that his backwards attitude was something I
> could not change, so it was just a source of entertainment for me.
>
> I did share my experience with my boss, who was the best boss ever, and
> she was like "Yeah, I figured".  That director of CADD talked a lot of
> game, but he couldn't walk the talk and was eventually let go, but not
> until he drove me out of the company, and cost them a lot of money in
> useless, outdated computer hardware.
>
> I do have to say it did inspire me to name my Linux workstation
> "underdog", which was one of my favorite hostnames I've come up with
> ("bobblehead" being my all time favorite - honestly is there any better
> task for a system admin than coming up with good hostnames?)
>
> --
> Prentice
>
> On 3/23/23 4:48 PM, Darren Wise via Beowulf wrote:
>
> I nearly dropped my coffee lol, did you make note & record his face if you
> ever did in past tense mention such by chance the comparisons? -I'm just
> curious with wonder..
> On 23/03/2023 19:08, Prentice Bisbal via Beowulf wrote:
>
> Yeah, that whole situation was frustrating. From 2004 -2007, I was working
> for a pharmaceutical startup supporting their Computer-Aided Drug Discovery
> (CADD) team. Had I been hired before the director of CADD, it would have
> been a 100% Linux shop. Instead, as soon as he was hired he started
> insisting, and circulated a memo, stating that Linux was still a toy for
> hobbyists and "not ready for primetime" (he used that exact quote). So we
> spent tens of thousands of dollars on two Octanes and the 8-way Origin 350.
> I got a Linux workstation as a proof-of-concept, and that HP Workstation
> running Linux that cost only a few thousand dollars ran circles around
> those SGI boxes, and when cost was factored in FLOPS/$ was like 10x better
> than the SGI hardware at that point. And all of that hardware was bought
> used ("remarketed") from SGI, so new hardware would have compared
> significantly worse in terms of value.
>
> Also, it turned out the director of CADD owned a nontrivial amount of SGI
> stocks, so not only was an he over-the-hill curmudgeon afraid of new
> technology, there was also a pretty clear conflict of interest for him to
> be pushing SGI, even though I'm sure our small purchase did nothing to
> improve SGI stock value.
> On 3/23/23 2:58 PM, Joe Landman wrote:
>
> They had laid off all the good people doing workstations by then, I think
> they outsourced design/production to ODMs by that time.  MIP processors
> were long in the tooth in 1999, never mind 2007.
>
> Having been at SGI from 1995-2001, I can tell you the reason MIPS sucked
> wind at that point, was the good ship Itanic sunk Alien and Beast
> processors.  Those design teams left, and we didn't have much for post
> R10k, other than respins and shrinks of R10k.  Which were renamed R12k,
> R14k ...
>
> Beast would have been relevant (near EOL though) in 2007.
>
>
> On 3/23/23 14:53, Prentice Bisbal via Beowulf wrote:
>
> Between 2003 and 2007, I worked with a lot of O2s, Octanes, an 8-way
> Origin 350, and even a Tezro. I don't miss those days.
>
> I always felt like the design of their workstations was done by the same
> people who design Playskool toys rather than professional hardware.
>
> Prentice Bisbal
> Senior HPC Engineer
> Computational Sciences Department
> Princeton Plasma Physics Laboratory
> Princeton, NJhttps://cs.pppl.govhttps://www.pppl.gov
>
> On 3/23/23 1:08 PM, Ryan Novosielski via Beowulf wrote:
>
> Seriously. I have an Indy and an Octane2 laying around. That’s not even an
> SGI. :-P
>
> --
> #BlackLivesMatter
> 
> || \\UTGERS,|---*O*---
> ||_// the State  | Ryan Novosielski - novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\of NJ  | Office of Advanced Research Computing - MSB
> A555B, Newark
>  `'
>
> On Mar 23, 2023, at 13:07, Michael DiDomenico 
>  wrote:
>
> ack, irix flashbacks... :) fortunately, this machine isn't

Re: [Beowulf] HPCG benchmark, again

2022-03-20 Thread John Hearns

Jörg,   I would have a look at the Archer/UK-HPC benchmarks
https://github.com/hpc-uk/archer-benchmarks
They have Castep and CP2K in the applications benchmarks which will be
relevant to you.
Also thankyou for looking for advice here!

As someone who has worked for several cluster vendors, please can I make
these heartfelt plese to anyone who is doing a procurement exercise:


   - choose relevant benchmarks - as Jorg is doing here. Do not load a lot
   of benchmarks into your RFP just because they sound cute
   - application benchmarks are great - but remember they take time to set
   up and run properly. Again choose a set which are relevant to your use and
   have some mercy on the poor overworked engineers
   - If the benchmark requires a license for commercial use be prepared to
   help the vendors get the relevant license (Guassian I believe)
   - be realistic about the scale you want the benchmarks run on. Vendors
   like my company can get time on Top500 class clusters to run benchmarks.
   Also you can book time on large clusters provided by CPU vendors. But be
   realistic - are you procuring a huge system where your users really will be
   scaling to those numbers of cores? If so yes go ahead and ask for it. But
   if you are procuring one rack of servers...



On Fri, 18 Mar 2022 at 23:08, Jörg Saßmannshausen <
sassy-w...@sassy.formativ.net> wrote:

> Dear all,
>
> further the emails back in 2020 around the HPCG benchmark test, as we are
> in
> the process of getting a new cluster I was wondering if somebody else in
> the
> meantime has used that test to benchmark the particular performance of the
> cluster.
> From what I can see, the latest HPCG version is 3.1 from August 2019. I
> also
> have noticed that their website has a link to download a version which
> includes the latest A100 GPUs from nVidia.
> https://www.hpcg-benchmark.org/software/view.html?id=280
>
> What I was wondering is: has anybody else apart from Prentice tried that
> test
> and is it somehow useful, or does it just give you another set of numbers?
>
> Our new cluster will not be at the same league as the supercomputers, but
> we
> would like to have at least some kind of handle so we can compare the
> various
> offers from vendors. My hunch is the benchmark will somehow (strongly?)
> depend
> on how it is tuned. As my former colleague used to say: I am looking for
> some
> war stories (not very apt to say these days!).
>
> Either way, I hope you are all well given the strange new world we are
> living
> in right now.
>
> All the best from a spring like dark London
>
> Jörg
>
>
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] AMD Accelerated Data Center Keynote

2021-11-09 Thread John Hearns

All good Jim. However to be allowed to benchmark these systems you must
pronounce the CPU as "Milawn"
As I said elsewhere, they are getting pretty far north now. Is the plan to
cross the Alps?

On Tue, 9 Nov 2021 at 09:23, Jim Cownie  wrote:

> @Prentice:
> > Certainly looking forward to running some benchmarks on these systems
> myself when I can.
>
> Apparently you can get access to these machines immediately in Azure...
> https://twitter.com/hpcnotes/status/1457755481544577039
>
> 5:02 PM · Nov 8, 2021·Twitter Web App
>   Andrew Jones @hpcnotes
>   Like the sound of the high performance #AMD EPYC #Milan-X processors
> from today’s
>   @LisaSu keynote?
>
>   As far as I know, the only place you can actually get access to use
> Milan-X today is via Microsoft
>   #Azure #HPC HBv3
>
>   You can sign up at http://aka.ms/MilanXPreview
>
> -- Jim
> James Cownie 
> Mob: +44 780 637 7146
>
> > On 8 Nov 2021, at 20:45, Prentice Bisbal via Beowulf <
> beowulf@beowulf.org> wrote:
> >
> > Did anyone else catch the AMD Datacenter Premier Keynote this morning?
> AMD made some pretty impressive claims, like their MI200 GPUs are 4.5x
> faster than NVIDIA A100s. and one of these GPUs will be ~4x faster than an
> entire Summit node. Their Milan (Zen4) CPUs will have 96 up to 96 cores,
> and their Bergamo "cloud native" CPUs will have up to 128 cores.
> >
> > It was live streamed on YouTube this morning, and is available for
> watching at the link below, which is how I watched it:
> >
> > https://youtu.be/ECHhuvuiNzs
> >
> > I think it's definitely worth a watch to see what's coming, and to see
> AMD's performance claims. Certainly looking forward to running some
> benchmarks on these systems myself when I can.
> >
> > --
> > Prentice
> >
> > ___
> > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
>
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

[Beowulf] Infiniband Fabric test tool

2021-10-22 Thread John Hearns

I recently saw a presentation which referenced a framework to test out
Infiniband (or maybe in general MPI) fabrics.
This was a Github repository.
It ran a series of inter-node tests and analysed the results.
It seemed similar in operation to Linktest

https://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/LinkTest/_node.html

If anyone recognises what this package is from thai scatter
brained description please put me out of my misery..

Thankyou
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Infiniband for MPI computations setup guide

2021-10-20 Thread John Hearns

As Paul says - start a subnet manager.  I guess you are using the distro
supplied IB stack?
Run the following commands:
sminfo
ibdiagnet

these will check out your subnet manager and your fabric

On Wed, 20 Oct 2021 at 17:21, Paul Edmon via Beowulf 
wrote:

> Oh you will also need a IB subnet manager (opensm) running since you have
> an unmanaged switch.  You can start this on one of the compute nodes.   I
> would probably start up 2 so you have redundancy.
>
> -Paul Edmon-
> On 10/20/2021 6:08 AM, leo camilo wrote:
>
>  I have recently acquired a few ConnectX-3 cards and an unmanaged IB
> switch (IS5022) to upgrade my department's beowulf cluster.
>
> Thus far, I have been able to verify that the cards and switch work via
> the MFT and opensource tools in ubuntu,
>
> Though, I was wondering if anyone knew of any guide or resources for
> setting up a cluster for MPI based computations in a linux/debian
> environment? Some guides about how to make it work with SLURM would also be
> appreciated.
>
> Thanks in advance for any suggestions, I am often a user of clusters, but
> it is my first time setting one up.
>
> Cheers,
>
> Leonardo
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [EXTERNAL] server lift

2021-10-20 Thread John Hearns

The engine hoist is just superb! The right tool for the job.
Thinking about this, old style factories had overhead cranes. At Glasgow
University we had a cyclotron, and I am told one of the professors took a
great joy in driving the crane.
The Tate Modern art gallery has a huge overhead crane, kept to remind of
the original purpose of the building (a power station).

Thinking about server rooms we have electrical and communications services
dropping down into  racks from the top.
So a travelling crane would foul against the services - unless it was
higher.
How about positioning steel I-beams running along all cold aisles though?
You could run an engine hoist along these with a cradle below it for the
server.
Sealed hot/cold aisles need not be a problem - often they have lift off
panels in the roof.

Probably easier to have a server lift but one can dream...






On Wed, 20 Oct 2021 at 09:49, Tina Friedrich 
wrote:

> +1 for the manual
>
> We had an electric scissor lift in the DC, and exactly that happened -
> battery died. More expensive to replace than buying a new lift.
>
> So we got a manual one (making sure it can go to full rack height which
> is very useful) - much less likely to go wrong.
>
> Tina
>
> On 19/10/2021 18:40, Robert Taylor wrote:
> > I like the serverlift, having used one in our colo. If I got one, I
> > would go with the crank one. Electric is nice, but when the battery is
> > dead, you're stuck. Maybe it would have run off of wall power, but our
> > colo space is all 208v so I couldn't be sure if I could plug it in
> > there. Don't want to let out the blue smoke :-)
> >
> >
> >
> > On Tue, Oct 19, 2021 at 12:47 PM Michael Di Domenico
> > mailto:mdidomeni...@gmail.com>> wrote:
> >
> > i would love to go oil bath, it's just not in the cards for this
> > data center
> >
> > On Tue, Oct 19, 2021 at 2:34 AM Stu Midgley  > > wrote:
> >
> > we use a gantry crane...
> >
> >  From our install in 2019 :)
> >
> > image.png
> >
> > On Tue, Oct 19, 2021 at 9:34 AM Lux, Jim (US 7140) via Beowulf
> > mailto:beowulf@beowulf.org>> wrote:
> >
> > Do you want one with a scissor lift type arrangement, or
> > with a "prongs" arrangement (more like a forklift)
> >
> > For instance, the scissor types have a nice flat surface
> > that remains level, so you can slide stuff off a table onto
> > the lift, or from the lift onto a table. But you don't have
> > access "under" the platform.
> >
> >
> > On 10/18/21, 7:33 AM, "Beowulf on behalf of Michael Di
> > Domenico"  >  on behalf of
> > mdidomeni...@gmail.com >
> wrote:
> >
> >  we're using an older genie lift as a server lift
> > currently.  which as
> >  you can guess isn't designed for servers.  the most
> > recent set of
> >  compute nodes we purchased are pretty much impossible
> > to lift by man
> >  (even if that was a good idea) and the genie lift is
> > getting awkward
> >  in our cramped data center given it's design and server
> > weight.
> >
> >  i can certainly google for one, but they all look great
> > in the
> >  glossies.  does anyone want to provide some real world
> > info?
> >  ___
> >  Beowulf mailing list, Beowulf@beowulf.org
> >  sponsored by Penguin Computing
> >  To change your subscription (digest mode or
> > unsubscribe) visit
> >
> https://urldefense.us/v3/__https://beowulf.org/cgi-bin/mailman/listinfo/beowulf__;!!PvBDto6Hs4WbVuu7!YJuZvpOsx3S9-WpmbnljrU2N33GRjVe386v7Qt2vtX5ZP7MjYvhNsEgTlTu2j7XsL0xiP5c$
> > <
> https://urldefense.us/v3/__https://beowulf.org/cgi-bin/mailman/listinfo/beowulf__;!!PvBDto6Hs4WbVuu7!YJuZvpOsx3S9-WpmbnljrU2N33GRjVe386v7Qt2vtX5ZP7MjYvhNsEgTlTu2j7XsL0xiP5c$
> >
> >
> >
> > ___
> > Beowulf mailing list, Beowulf@beowulf.org
> >  sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe)
> > visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> > 
> >
> >
> >
> > --
> > Dr Stuart Midgley
> > sdm...@gmail.com 
> >
> > ___
> > Beowulf mailing list, Beowulf@beowulf.org
> >  sponsored by Penguin Computing
> > To change your

Re: [Beowulf] Data Destruction

2021-09-30 Thread John Hearns

I once had an RMA case for a failed tape with Spectralogic. To prove it was
destroyed and not re-used I asked the workshop guys to put it through a
bandsaw, then sent off the pictures.

On Wed, 29 Sept 2021 at 16:47, Ellis Wilson  wrote:

> On 9/29/21 11:41 AM, Jörg Saßmannshausen wrote:
> > If you still need more, don't store the data at all but print it out on
> paper
> > and destroy it by means of incineration. :D
>
> I have heard stories from past colleagues of one large US Lab putting
> their HDDs through wood chippers with magnets on the chipped side to
> kill the bits good and dead.  As a storage fanatic that always struck me
> as something I'd have loved to see.
>
> Best,
>
> ellis
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Rant on why HPC isn't as easy as I'd like it to be.

2021-09-21 Thread John Hearns

Some points well made here. I have seen in the past job scripts passed on
from graduate student to graduate student - the case I am thinking on was
an Abaqus script for 8 core systems, being run on a new 32 core system. Why
WOULD a graduate student question a script given to them - which works.
They should be getting on with their science. I guess this is where
Research Software Engineers come in.

Another point I would make is about modern processor architectures, for
instance AMD Rome/Milan. You can have different Numa Per Socket options,
which affect performance. We set the preferred IO path - which I have seen
myself to have an effect on latency of MPI messages. IF you are not
concerned about your hardware layout you would just go ahead and run,
missing  a lot of performance.

I am now going to be controversial and common that over in Julia land the
pattern seems to be these days people develop on their own laptops, or
maybe local GPU systems. There is a lot of microbenchmarking going on. But
there seems to be not a lot of thought given to CPU pinning or shat happens
with hyperthreading. I guess topics like that are part of HPC 'Black Magic'
- though I would imagine the low latency crowd are hot on them.

I often introduce people to the excellent lstopo/hwloc utilities which show
the layout of a system. Most people are pleasantly surprised to find this.











On Mon, 20 Sept 2021 at 19:28, Lux, Jim (US 7140) via Beowulf <
beowulf@beowulf.org> wrote:

> The recent comments on compilers, caches, etc., are why HPC isn’t a bigger
> deal.  The infrastructure today is reminiscent of what I used in the 1970s
> on a big CDC or Burroughs or IBM machine, perhaps with a FPS box attached.
>
> I prepare a job, with some sort of job control structure, submit it to a
> batch queue, and get my results some time later.  Sure, I’m not dropping
> off a deck or tapes, and I’m not getting green-bar paper or a tape back,
> but really, it’s not much different – I drop a file and get files back
> either way.
>
>
>
> And just like back then, it’s up to me to figure out how best to arrange
> my code to run fastest (or me, wall clock time, but others it might be CPU
> time or cost or something else)
>
>
>
> It would be nice if the compiler (or run-time or infrastructure) figured
> out the whole “what’s the arrangement of cores/nodes/scratch storage for
> this application on this particular cluster”.
>
> I also acknowledge that this is a “hard” problem and one that doesn’t have
> the commercial value of, say, serving the optimum ads to me when I read the
> newspaper on line.
>
>
> Yeah, it’s not that hard to call library routines for matrix operations,
> and to put my trust in the library writers – I trust them more than I trust
> me to find the fastest linear equation solver, fft, etc. – but so far, the
> next level of abstraction up – “how many cores/nodes” is still left to me,
> and that means doing instrumentation, figuring out what the results mean,
> etc.
>
>
>
>
>
> *From: *Beowulf  on behalf of "
> beowulf@beowulf.org" 
> *Reply-To: *Jim Lux 
> *Date: *Monday, September 20, 2021 at 10:42 AM
> *To: *Lawrence Stewart , Jim Cownie <
> jcow...@gmail.com>
> *Cc: *Douglas Eadline , "beowulf@beowulf.org" <
> beowulf@beowulf.org>
> *Subject: *Re: [Beowulf] [EXTERNAL] Re: Deskside clusters
>
>
>
>
>
>
>
> *From: *Beowulf  on behalf of Lawrence
> Stewart 
> *Date: *Monday, September 20, 2021 at 9:17 AM
> *To: *Jim Cownie 
> *Cc: *Lawrence Stewart , Douglas Eadline <
> deadl...@eadline.org>, "beowulf@beowulf.org" 
> *Subject: *Re: [Beowulf] [EXTERNAL] Re: Deskside clusters
>
>
>
> Well said.  Expanding on this, caches work because of both temporal
> locality and
>
> spatial locality.  Spatial locality is addressed by having cache lines be
> substantially
>
> larger than a byte or word.  These days, 64 bytes is pretty common.  Some
> prefetch schemes,
>
> like the L1D version that fetches the VA ^ 64 clearly affect spatial
> locality.  Streaming
>
> prefetch has an expanded notion of “spatial” I suppose!
>
>
>
> What puzzles me is why compilers seem not to have evolved much notion of
> cache management. It
>
> seems like something a smart compiler could do.  Instead, it is left to
> Prof. Goto and the folks
>
> at ATLAS and BLIS to figure out how to rewrite algorithms for efficient
> cache behavior. To my
>
> limited knowledge, compilers don’t make much use of PREFETCH or any
> non-temporal loads and stores
>
> either. It seems to me that once the programmer helps with RESTRICT and so
> forth, then compilers could perfectly well dynamically move parts of arrays
> around to maximize cache use.
>
>
>
> -L
>
>
>
> I suspect that there’s enough variability among cache implementation and
> the wide variety of algorithms that might use it that writing a
> smart-enough compiler is “hard” and “expensive”.
>
>
>
> Leaving it to the library authors is probably the best “bang for the
> buck”.
>
>
>
>
>
>
>

Re: [Beowulf] [EXTERNAL] Re: Deskside clusters

2021-09-21 Thread John Hearns

Over on the Julia discussion list there are often topics on performance or
varying performance - these often turn out to be due to the BLAS libraries
in use, and how they are being used.
I believe that there is a project for pureJulia BLAS.

On Mon, 20 Sept 2021 at 18:41, Lux, Jim (US 7140) via Beowulf <
beowulf@beowulf.org> wrote:

>
>
>
>
> *From: *Beowulf  on behalf of Lawrence
> Stewart 
> *Date: *Monday, September 20, 2021 at 9:17 AM
> *To: *Jim Cownie 
> *Cc: *Lawrence Stewart , Douglas Eadline <
> deadl...@eadline.org>, "beowulf@beowulf.org" 
> *Subject: *Re: [Beowulf] [EXTERNAL] Re: Deskside clusters
>
>
>
> Well said.  Expanding on this, caches work because of both temporal
> locality and
>
> spatial locality.  Spatial locality is addressed by having cache lines be
> substantially
>
> larger than a byte or word.  These days, 64 bytes is pretty common.  Some
> prefetch schemes,
>
> like the L1D version that fetches the VA ^ 64 clearly affect spatial
> locality.  Streaming
>
> prefetch has an expanded notion of “spatial” I suppose!
>
>
>
> What puzzles me is why compilers seem not to have evolved much notion of
> cache management. It
>
> seems like something a smart compiler could do.  Instead, it is left to
> Prof. Goto and the folks
>
> at ATLAS and BLIS to figure out how to rewrite algorithms for efficient
> cache behavior. To my
>
> limited knowledge, compilers don’t make much use of PREFETCH or any
> non-temporal loads and stores
>
> either. It seems to me that once the programmer helps with RESTRICT and so
> forth, then compilers could perfectly well dynamically move parts of arrays
> around to maximize cache use.
>
>
>
> -L
>
>
>
> I suspect that there’s enough variability among cache implementation and
> the wide variety of algorithms that might use it that writing a
> smart-enough compiler is “hard” and “expensive”.
>
>
>
> Leaving it to the library authors is probably the best “bang for the
> buck”.
>
>
>
>
>
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [EXTERNAL] Re: Deskside clusters

2021-09-21 Thread John Hearns

Yes, but which foot? You have enough space for two toes from each foot for
q taste, and you then need some logic to decide which one to use.

On Mon, 20 Sept 2021 at 21:59, Prentice Bisbal via Beowulf <
beowulf@beowulf.org> wrote:

> On 9/20/21 6:35 AM, Jim Cownie wrote:
>
> >> Eadline's Law : Cache is only good the second time.
> >
> > Hmm, that’s why they have all those clever pre-fetchers which try to
> > guess your memory access patterns and predict what's going to be
> > needed next.
> > (Your choice whether you read “clever” in a cynical voice or not :-))
> > *IF* that works, then the cache is useful the first time.
> > If not, then they can mess things up royally by evicting stuff that
> > you did want there.
> >
>
> I thought about prefetching, but deliberately left it out of my original
> response because I didn't want to open that can of worms... or put my
> foot in my mouth.
>
> Prentice
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [beowulf] nfs vs parallel filesystems

2021-09-20 Thread John Hearns

This talk by Keith Manthey is well worth listening to. Vendor neutral as I
recall, so don't worry about a sales message bein gpushed
HPC Storage 101 in this series

https://www.dellhpc.org/eventsarchive.html

On Sat, 18 Sept 2021 at 18:21, Lohit Valleru via Beowulf <
beowulf@beowulf.org> wrote:

> Hello Everyone,
>
> I am trying to find answers to an age old question of NFS vs Parallel file
> systems. Specifically - Isilon oneFS vs parallel filesystems.Specifically
> looking for any technical articles or papers that can help me understand
> what exactly will not work on oneFS.
> I understand that at the end - it all depends on workloads.
> But at what capacity of metadata io or a particular io pattern is bad in
> NFS.Would just getting a beefy isilon NFS HDD based storage - resolve
> most of the issues?
> I am trying to find sources that can say that no matter how beefy an NFS
> server can get with HDDs as backed - it will not be as good as parallel
> filesystems for so and so workload.
> If possible - Can anyone point me to experiences or technical papers that
> mention so and so do not work with NFS.
>
> Does it have to be that at the end - i will have to test my workloads
> across both NFS/OneFS and Parallel File systems and then see what would not
> work?
>
> I am concerned that any test case might not be valid, compared to real
> shared workloads where performance might lag once the storage reaches PBs
> in scale and millions of files.
>
> Thank you,
> Lohit
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [EXTERNAL] Re: Deskside clusters

2021-09-19 Thread John Hearns

Eadline's Law : Cache is only good the second time.

On Fri, 17 Sep 2021, 21:25 Douglas Eadline,  wrote:

> --snip--
> >
> > Where I disagree with you is (3). Whether or not cache size is important
> > depends on the size of the job. If your iterating through data-parallel
> > loops over a large dataset that exceeds cache size, the opportunity to
> > reread cached data is probably limited or nonexistent. As we often say
> > here, "it depends". I'm sore someone with better low-level hardware
> > knowledge will pipe in and tell me why I'm wrong (Cunningham's Law).
> >
>
> Of course it all depends. However, as core counts go up, a
> fixed amount of cache must get shared. Since the high core counts
> are putting pressure on main memory BW, cache gets more
> important. This is why AMD is doing V-cache for new processors.
> Core counts have outstripped memory BW, their solution
> seems to be big caches. And, cache is only good the second time :-)
>
>
> -- big snip--
>
> --
> Doug
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [beowulf] nfs vs parallel filesystems

2021-09-19 Thread John Hearns

Lohit, good morning.  I work for Dell in the EMEA HPC team.  You make some
interesting observations.
Please ping me offline regarding Isilon.
Regarding NFS we have a brand new Ready Architecture which uses Poweredge
servers and ME series storage (*)
It gets some pretty decent performance and I would honestly say that these
days NFS is a perfectly good fit for small clusters -
the clusters which are used by departments or small/medium sized
engineering companies.

If you want to try out your particular workloads we have labs available.

You then go on to talk about petabytes of data - that is the field where
you have to look at scale out filesystems.

(*) I cannot find this on public webpages yet, sorry

On Sat, 18 Sept 2021 at 18:21, Lohit Valleru via Beowulf <
beowulf@beowulf.org> wrote:

> Hello Everyone,
>
> I am trying to find answers to an age old question of NFS vs Parallel file
> systems. Specifically - Isilon oneFS vs parallel filesystems.Specifically
> looking for any technical articles or papers that can help me understand
> what exactly will not work on oneFS.
> I understand that at the end - it all depends on workloads.
> But at what capacity of metadata io or a particular io pattern is bad in
> NFS.Would just getting a beefy isilon NFS HDD based storage - resolve
> most of the issues?
> I am trying to find sources that can say that no matter how beefy an NFS
> server can get with HDDs as backed - it will not be as good as parallel
> filesystems for so and so workload.
> If possible - Can anyone point me to experiences or technical papers that
> mention so and so do not work with NFS.
>
> Does it have to be that at the end - i will have to test my workloads
> across both NFS/OneFS and Parallel File systems and then see what would not
> work?
>
> I am concerned that any test case might not be valid, compared to real
> shared workloads where performance might lag once the storage reaches PBs
> in scale and millions of files.
>
> Thank you,
> Lohit
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [EXTERNAL] Re: Deskside clusters

2021-08-25 Thread John Hearns

If anyone works with Dell kit I am happy to discuss thermal profiles and
power capping. But definitely off list.


On Wed, 25 Aug 2021 at 07:16, Tony Brian Albers  wrote:

> I have a Precision 5820 in my office. It's only got one CPU(14 physical
> cores), but it's more quiet than my HP SFF desktop PC. So yeah, I think
> they can make something like that.
>
> /tony
>
> On 25/08/2021 05.35, Lux, Jim (US 7140) via Beowulf wrote:
> > Yeah, but it is quiet enough to put in your office and not drive your
> office mate out?
> >
> >
> > From: Jonathan Engwall 
> > Date: Tuesday, August 24, 2021 at 3:43 PM
> > To: Jim Lux 
> > Cc: Douglas Eadline , "beowulf@beowulf.org" <
> beowulf@beowulf.org>
> > Subject: Re: [Beowulf] [EXTERNAL] Re: Deskside clusters
> >
> > EMC offers dual socket 28 physical core processors. That's a lot of
> computer.
> >
> > On Tue, Aug 24, 2021, 1:33 PM Lux, Jim (US 7140) via Beowulf <
> beowulf@beowulf.org> wrote:
> > Yes, indeed.. I didn't call out Limulus, because it was mentioned
> earlier in the thread.
> >
> > And another reason why you might want your own.
> > Every so often, the notice from JPL's HPC goes out to the users -
> "Halo/Gattaca/clustername will not be available because it is reserved for
> Mars {Year}"  While Mars landings at JPL are a *big deal*, not everyone is
> working on them (in fact, by that time, most of the Martians are now
> working on something else), and you want to get your work done.  I suspect
> other institutional clusters have similar "the 800 pound (363 kg) gorilla
> has requested" scenarios.
> >
> >
> > 
> >
> >
> > ___
> > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >
>
>
> --
> Tony Albers - Systems Architect - Data Department, Royal Danish Library,
> Victor Albecks Vej 1, 8000 Aarhus C, Denmark
> Tel: +45 2566 2383 - CVR/SE: 2898 8842 - EAN: 5798000792142
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] List archives

2021-08-18 Thread John Hearns

I plead an advanced case of not keeping up with technology.
I not this is for Ryzen - anyone care to comment on Rome/Milan?

On Wed, 18 Aug 2021 at 08:56, Jim Cownie  wrote:

> John may have been looking for Doug’s tweet and just confused the delivery
> medium...
>
> https://twitter.com/thedeadline/status/1424833944000909313
>
> On 17 Aug 2021, at 07:16, Chris Samuel  wrote:
>
> Hi John,
>
> On Monday, 16 August 2021 12:57:20 AM PDT John Hearns wrote:
>
> The Beowulf list archives seem to end in July 2021.
> I was looking for Doug Eadline's post on limiting AMD power and the results
> on performance.
>
>
> I just went through the archives for July and compared them with what I
> have
> in my inpile and as far as I can tell there's nothing missing. There was a
> thread from June with the subject "AMD and AVX512", perhaps that's what
> you're
> thinking of?
>
> https://www.beowulf.org/pipermail/beowulf/2021-June/thread.html
>
> Your email from today & my earlier reply are in the archives for August.
>
> All the best!
> Chris
> --
>  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>
>
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
>
> -- Jim
> James Cownie 
> Mob: +44 780 637 7146
>
>
>
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

[Beowulf] List archives

2021-08-16 Thread John Hearns

The Beowulf list archives seem to end in July 2021.
I was looking for Doug Eadline's post on limiting AMD power and the results
on performance.

John H
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] AMD and AVX512 [EXT]

2021-06-19 Thread John Hearns

That is a very interesting point! I never thought of that.
Also mobile drives ARM development - yes I know the CPUs in Isambard and
Fugaku will not be seen in your mobile phone but the ecosystem is propped
up by having a diverse market and also the power saving priorities of
mobile will influence HPC ARM CPUs.



On Sun, 20 Jun 2021 at 02:04, Tim Cutts  wrote:

> I think that’s a major important point.  Even if the whole of the HPC
> market were clamouring for it (which they’re not, judging by this
> discussion) that’s still a very small proportion of the worldwide CPU
> market.  We have to remember that we in the HPC community are a niche
> market.  I recall at SC a couple of years ago someone from Intel pointing
> out that mobile devices and IoT were what was driving IT technology; the
> volume dwarfs everything else.  Hence the drive to NVRAM - not to make
> things faster for HPC (although that was the benefit being presented
> through that talk), but the fundamental driver was to increase phone
> battery life.
>
> Tim
>
> --
> Tim Cutts
> Head of Scientific Computing
> Wellcome Sanger Institute
>
>
> On 19 Jun 2021, at 16:49, Gerald Henriksen  wrote:
>
> I suspect that is marketing speak, which roughly translates to not
> that no one has asked for it, but rather requests haven't reached a
> threshold where the requests are viewed as significant enough.
>
>
> -- The Wellcome Sanger Institute is operated by Genome Research Limited, a
> charity registered in England with number 1021457 and a company registered
> in England with number 2742969, whose registered office is 215 Euston Road,
> London, NW1 2BE.
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] AMD and AVX512

2021-06-19 Thread John Hearns

Regarding benchmarking real world codes on AMD , every year Martyn Guest
presents a comprehensive set of benchmark studies to the UK Computing
Insights Conference.
I suggest a Sunday afternoon with the beverage of your choice is a good
time to settle down and take time to read these or watch the presentation.

2019
https://www.scd.stfc.ac.uk/SiteAssets/Pages/CIUK-2019-Presentations/Martyn_Guest.pdf


2020 Video session
https://ukri.zoom.us/rec/share/ajvsxdJ8RM1wzpJtnlcypw4OyrZ9J27nqsfAG7eW49Ehq_Z5igat_7gj21Ge8gWu.78Cd9I1DNIjVViPV?startTime=1607008552000

Skylake / Cascade Lake / AMD Rome

The slides for 2020 do exist - as I remember all the slides from all talks
are grouped together, but I cannot find them.
Watch the video - it is an excellent presentation.


















On Sat, 19 Jun 2021 at 16:49, Gerald Henriksen  wrote:

> On Wed, 16 Jun 2021 13:15:40 -0400, you wrote:
>
> >The answer given, and I'm
> >not making this up, is that AMD listens to their users and gives the
> >users what they want, and right now they're not hearing any demand for
> >AVX512.
> >
> >Personally, I call BS on that one. I can't imagine anyone in the HPC
> >community saying "we'd like processors that offer only 1/2 the floating
> >point performance of Intel processors".
>
> I suspect that is marketing speak, which roughly translates to not
> that no one has asked for it, but rather requests haven't reached a
> threshold where the requests are viewed as significant enough.
>
> > Sure, AMD can offer more cores,
> >but with only AVX2, you'd need twice as many cores as Intel processors,
> >all other things being equal.
>
> But of course all other things aren't equal.
>
> AVX512 is a mess.
>
> Look at the Wikipedia page(*) and note that AVX512 means different
> things depending on the processor implementing it.
>
> So what does the poor software developer target?
>
> Or that it can for heat reasons cause CPU frequency reductions,
> meaning real world performance may not match theoritical - thus easier
> to just go with GPU's.
>
> The result is that most of the world is quite happily (at least for
> now) ignoring AVX512 and going with GPU's as necessary - particularly
> given the convenient libraries that Nvidia offers.
>
> > I compared a server with dual AMD EPYC >7H12 processors (128)
> > quad Intel Xeon 8268 >processors (96 cores).
>
> > From what I've heard, the AMD processors run much hotter than the Intel
> >processors, too, so I imagine a FLOPS/Watt comparison would be even less
> >favorable to AMD.
>
> Spec sheets would indicate AMD runs hotter, but then again you
> benchmarked twice as many Intel processors.
>
> So, per spec sheets for you processors above:
>
> AMD - 280W - 2 processors means system 560W
> Intel - 205W - 4 processors means system 820W
>
> (and then you also need to factor in purchase price).
>
> >An argument can be made that for calculations that lend themselves to
> >vectorization should be done on GPUs, instead of the main processors but
> >the last time I checked, GPU jobs are still memory is limited, and
> >moving data in and out of GPU memory can still take time, so I can see
> >situations where for large amounts of data using CPUs would be preferred
> >over GPUs.
>
> AMD's latest chips support PCI 4 while Intel is still stuck on PCI 3,
> which may or may not mean a difference.
>
> But what despite all of the above and the other replies, it is AMD who
> has been winning the HPC contracts of late, not Intel.
>
> * - https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] head node abuse

2021-03-26 Thread John Hearns

https://bofhcam.org/co-larters/lart-reference/index.html

[image: image.png]

On Fri, 26 Mar 2021 at 13:57, Michael Di Domenico 
wrote:

> does anyone have a recipe for limiting the damage people can do on
> login nodes on rhel7.  i want to limit the allocatable cpu/mem per
> user to some low value.  that way if someone kicks off a program but
> forgets to 'srun' it first, they get bound to a single core and don't
> bump anyone else.
>
> i've been poking around the net, but i can't find a solution, i don't
> understand what's being recommended, and/or i'm implementing the
> suggestions wrong.  i haven't been able to get them working.  the most
> succinct answer i found is that per user cgroup controls have been
> implemented in systemd v239/240, but since rhel7 is still on v219
> that's not going to help.  i also found some wonkiness that runs a
> program after a user logs in and hacks at the cgroup files directly,
> but i couldn't get that to work.
>
> supposedly you can override the user-{UID}.slice unit file and jam in
> the cgroup restrictions, but I have hundreds of users clearly that's
> not maintainable
>
> i'm sure others have already been down this road.  any suggestions?
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Project Heron at the Sanger Institute [EXT]

2021-02-04 Thread John Hearns

Referring to lambda functions, I think I flagged up that AWS now supports
containers up to 10GB in size for the lambda payload
https://aws.amazon.com/blogs/aws/new-for-aws-lambda-container-image-support/

which makes a Julia language lambda possible
https://www.youtube.com/watch?v=6DvpneWRb_w


On Thu, 4 Feb 2021 at 11:49, Tim Cutts  wrote:

>
>
> On 4 Feb 2021, at 10:40, Jonathan Aquilina 
> wrote:
>
> Maybe SETI@home wasnt the right project to mention, just remembered there
> is another project but not in genomics on that distributed platform called
> Folding@home.
>
>
> Right, protein dynamics simulations like that are at the other end of the
> data/compute ratio spectrum.  Very suitable for distributed computing in
> that sort of way.
>
> So with genomics you cannot break it down into smaller chunks where the
> data can be crunched then returned to sender and then processed once the
> data is back or as its being received?
>
>
> It depends on what you’re doing.  If you already know the reference genome
> then, yes you can.  We already do this to some extent; the reads from the
> sequencing run are de-multiplexed first, and then the reads for each sample
> are processed as a separate embarrassingly parallel job.  This is basically
> doing a jigsaw puzzle when you know the picture.
>
> The read alignment to reference (if you already have a standard reference
> genome) easily decomposable as much as you like, right down to a single
> read in the extreme case, but the compute for a single read is tiny (this
> is basically fuzzy grep going on here),  and you’d be swamped in scheduling
> overhead.  For maximum throughput we don’t bother distributing it further,
> but use multithreading on a single node.
>
> There have been some interesting distributed mapping attempts, for example
> decomposing the problem into read groups small enough to fit in the time
> limit of an AWS lambda function.  You get fabulous turnaround time on the
> analysis if you do that, but you use about four times as much actual
> compute time as the single node, multi-thread approach we currently use.
> (reference to the lambda work:
> https://www.biorxiv.org/content/10.1101/576199v1.full.pdf). As usual, it
> all depends on what you’re optimising for, cost, throughput, or turnaround
> time?
>
> For some of our projects (Darwin Tree of Life being the prime example),
> you don’t know what the reference genome looks like.  The problem is still
> fuzzy grep, but now you’re comparing the reads against each other and
> looking for overlaps, rather than comparing them all independently against
> the reference.  You’re doing the jigsaw puzzle without knowing the
> picture.  That’s a bit harder to distribute, and most approaches currently
> cop out and do it all in single large memory machines.  One way to make
> this easier is to make the reads longer (i.e. make the puzzle pieces larger
> and fewer of them) which is what sequencing technologies like Oxford
> Nanopore and PacBio Sequel try to do.  But their throughput is not as high
> as the short read Illumina approach.
>
> Some people have taken distributed approaches though (JGI’s MetaHipMer for
> example:  https://www.nature.com/articles/s41598-020-67416-5).  That’s
> tackling an even nastier problem; simultaneously sequencing many genomes at
> the same time, for example gut flora from a stool sample, and not only
> doing *de novo* assembly as in the last example, but trying to do so when
> you don’t know how many different genomes you have in the sample.  So now
> you have multiple jigsaw puzzles mixed up in the same box, and you don’t
> know any of the pictures.  And of course you have multiple strains, so some
> of those puzzles have the same picture but 1% of the pieces are different,
> and you need to work out which is which.
>
> Fun fun fun!
>
> Tim
>
>
> -- The Wellcome Sanger Institute is operated by Genome Research Limited, a
> charity registered in England with number 1021457 and a company registered
> in England with number 2742969, whose registered office is 215 Euston Road,
> London, NW1 2BE.
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Project Heron at the Sanger Institute [EXT]

2021-02-04 Thread John Hearns

In the seminar the graph of sequencing effort for Sanger/ rest of UK/
worldwide is very impressive.


On Thu, 4 Feb 2021 at 10:21, Tim Cutts  wrote:

>
>
> > On 3 Feb 2021, at 18:23, Jörg Saßmannshausen <
> sassy-w...@sassy.formativ.net> wrote:
> >
> > Hi John,
> >
> > interesting stuff and good reading.
> >
> > For the IT interests on here: these sequencing machine are chucking out
> large
> > amount of data per day. The project I am involved in can chew out 400 GB
> or so
> > on raw data per day. That is a small machine. That then needs to be
> processed
> > before you actually can analyze it. So there is quite some data movement
> etc
> > involved here.
>
>
> If anyone wants any details, just ask me, since the IT supporting all that
> sequencing is my team’s baby.
>
> Actually, the sequencing capacity for this volume of COVID samples is not
> great.  The virus genome is so small (only 30,000 bases, compared to a
> human’s 3 billion base pairs) that you can massively multiplex the samples
> in a single sequencing run.
>
> Currently, we multiplex 384 samples per Novaseq sequencing lane.  There
> are four lanes per flowcell, and two flowcells per sequencer.  The
> sequencing run takes about 24 hours, so each instrument can sequence about
> 3,000 samples per day.
>
> We have about 20 of these sequencers, so our total capacity is very high;
> in fact we only use three sequencers for COVID at the moment, because
> sample and library preparation is actually the bottleneck.  Getting those
> 384 samples ready for the sequencer.  We are planning to increase it
> though, both by increasing multiplexing and by using more sequencers.
>
> Sequencing itself is a bit less than a day, and the computational analysis
> to de-multiplex and reconstruct the genomes is less than a day running on
> our production-oriented OpenStack cluster (we keep critical projects like
> Heron on a physically separate cluster from normal faculty research); we
> can easily keep up with the sequencers.  We then upload our results to the
> folks at CLIMB, and that’s where the comparative genomics tends to take
> place.
>
> There’s a lot of effort at the moment going into speeding up the
> end-to-end process; for this sequencing to be as useful as possible for
> close-to-real-time outbreak and mutation analysis, the turnaround time
> needs to be as short as possible.  It turns out you can see statistically
> significant new mutation signatures very early on before infection rates
> really start to rise (this was visible in Kent data for B.1.1.7), so the
> sooner we can see this sort of thing the better we will get at taking
> appropriate measures.
>
> For more details on the actual analysis, we released a public seminar a
> couple of weeks ago:
>
> https://stream.venue-av.com/e/sanger_seminars/Barrett
>
> Tim
>
>
>
>
> --
>  The Wellcome Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

[Beowulf] Project Heron at the Sanger Institute

2021-02-03 Thread John Hearns

https://edition.cnn.com/2021/02/03/europe/tracing-uk-variant-origins-gbr-intl/index.html

Dressed in white lab coats and surgical masks, staff here scurry from
machine to machine -- robots and giant computers that are so heavy, they're
placed on solid steel plates to support their weight.
Heavy metal!
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] The xeon phi

2020-12-31 Thread John Hearns

Stupid question from me - does OneAPI handle Xeon Phi?

(a) I should read the manual
(b) it is a discontinued product - why would they put any effort into it



On Thu, 31 Dec 2020 at 05:52, Jonathan Engwall <
engwalljonathanther...@gmail.com> wrote:

> Hello Beowulf,
> Both the Xeon Phi and Tesla Grid cost so little on ebay right now, the
> precious metal inside may be worth more.
> If you want one for your self, now is the time. People do scrap these
> things.
> I had to buy one! The Xeon Phi looks so neat!
> Jonathan Engwall
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] RIP CentOS 8

2020-12-12 Thread John Hearns

Great vision Doug.
May I also promote EESSI  https://www.eessi-hpc.org/
(the European part may maagically be transformed into something else soon)

On Fri, 11 Dec 2020 at 18:57, Douglas Eadline  wrote:

>
>
> Some thoughts an this issue and future HPC
>
> First, in general it is poor move by CentOS, a community
> based distribution that has just killed their community.
> Nice work.
>
> Second, and most importantly, CentOS will not matter to HPC.
> (and maybe other sectors as well) Distributions will become
> second class citizens to containers.  All that is needed is a
> base OS to run the container (think Singularity)
>
> Years ago in the early days of Warewwulf, Greg Kurtzer
> (Warewulf/Singularity) talked about the idea of bundling the
> essential/minimal OS and libraries with applications in custom
> Warewulf VNFS image. The scheduler would then boot the application
> image -- everything works. Indeed, in my Limulus systems all
> Warewulf VNFS images and kernel bootstraps are in RPM files.
> Users can load a new VNFS using Yum (and some basic Warewulf
> provision commands)
>
> Now jump ahead to containers and HPCng (https://hpcng.org/)
>
> An open source project will release a container that "contains"
> everything thing it needs to run (along with the container recipe)
> Using Singularity you can also sign the container to assure
> provenance of the code. The scheduler runs containers. Simple.
>
> Software Vendors will gladly do the same. Trying to support
> multiple distribution goes away. Applications show up in
> tested containers. The scheduler runs containers. Things just work,
> less support issues for the vendor. Simple.
>
> The need to maintain library version trees and Modules for
> goes away, Of course if are developer writing your own application,
> you need specific libraries, but not system wide. Build the
> application in your working directly, include any specific libraries
> you need in the local source tree and fold it all into a container.
>
> Joe Landman also comments on this topic in his blog (does not seem
> to be showing up for me today, however)
>
>
> https://scalability.org/2020/12/the-future-of-linux-distributions-in-the-age-of-docker-and-k8s/
>
> Bottom line, it is all good, we are moving on.
>
> --
> Doug
>
>
>
>
>
> > Hi folks,
> >
> > It looks like the CentOS project has announced the end of CentOS 8 as a
> > version that tracked RHEL for the end of 2021, it will be replaced by
> > the CentOS stream which will run ahead of RHEL8. CentOS 7 is unaffected
> > (though RHEL7 only has 3 more years of life left).
> >
> > https://blog.centos.org/2020/12/future-is-centos-stream/
> >
> >  > The future of the CentOS Project is CentOS Stream, and over the
> >  > next year weâ€™ll be shifting focus from CentOS Linux, the rebuild
> >  > of Red Hat Enterprise Linux (RHEL), to CentOS Stream, which
> >  > tracks just ahead of a current RHEL release. CentOS Linux 8, as
> >  > a rebuild of RHEL 8, will end at the end of 2021. CentOS Stream
> >  > continues after that date, serving as the upstream (development)
> >  > branch of Red Hat Enterprise Linux.
> >  >
> >  > Meanwhile, we understand many of you are deeply invested in
> >  > CentOS Linux 7, and weâ€™ll continue to produce that version through
> >  > the remainder of the RHEL 7 life cycle.
> >
> > I always thought that Fedora was meant to be that upstream for RHEL, but
> > perhaps the arrangement now will be Fedora -> CentOS -> RHEL.
> >
> > I wonder where this leaves the Lustre project, currently they only
> > support RHEL7/CentOS7 as the server, and more interestingly, people who
> > build Lustre appliances on top of CentOS.
> >
> > Then there's the question of projects like OpenHPC who've only just
> > announced support for CentOS8 (and OpenSuSE15). They could choose to
> > track CentOS Stream instead, probably without too much effort.
> >
> > I do wonder if this opens the door for the return of something like
> > Scientific Linux.
> >
> > All the best,
> > Chris
> > --
> > Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
> > ___
> > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >
>
>
> --
> Doug
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [External] RIP CentOS 8

2020-12-09 Thread John Hearns

A quick reminder that there are specific Redhat SKUs for cluster head nodes
and a cheaper one for cluster nodes.
Tha announcement regarding Centos Stream said that there would be a new
offer.

On Wed, 9 Dec 2020 at 11:59, Peter Kjellström  wrote:

> On Tue, 8 Dec 2020 18:13:46 +
> Ryan Novosielski  wrote:
>
> > I don’t think that’s all that hard to answer: because it doesn’t. Who
> > on this list is magically going to buy hundreds of RedHat licenses
> > because of this?
>
> Assuming Redhat can provide sites with a reasonable way to license a
> cluster, some form of HPC-system license, then I think there'll be
> quite a few willing to go this way...
>
> A non-zero cost for the OS would not have been a big issue for any of
> our clusters in principle.
>
> /Peter K (point-of-view: national level site in Europe)
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [External] RIP CentOS 8

2020-12-09 Thread John Hearns

Jorg, a big seismic processing company I worked with did indeed use Debian.
The answer though is that industrial customers use commercial software
packages which are licensed and they want support from the software vendors.
If you check the OSes which are supported then you find Redhat  and SuSE.
There is increasing support with Ubuntu in the AI space of course.

I had one customer recently who have one principal engineering design
package they need.
This means a Redhat 7 cluster - the software vendor I am sure is developing
for Redhat 8 but is not ready yet to certify it.






On Tue, 8 Dec 2020 at 21:50, Jörg Saßmannshausen <
sassy-w...@sassy.formativ.net> wrote:

> Dear all,
>
> what I never understood is: why are people not using Debian?
>
> I done some cluster installation (up to 100 or so nodes) with Debian, more
> or
> less out of the box, and I did not have any issue with it. I admit, I
> might
> have missed out something I don't know about, the famous unkown-unkowns,
> but
> by enlarge the clusters were running rock solid with no unusual problem.
> I did not use Lustre or GPFS etc. on it, I only played around a bit with
> BeeFS
> and some GlusterFS in a small scale.
>
> Just wondering, as people mentioned Ubuntu.
>
> All the best from a dark London
>
> Jörg
>
> Am Dienstag, 8. Dezember 2020, 21:12:02 GMT schrieb Christopher Samuel:
> > On 12/8/20 1:06 pm, Prentice Bisbal via Beowulf wrote:
> > > I wouldn't be surprised if this causes Scientific Linux to come back
> > > into existence.
> >
> > It sounds like Greg K is already talking about CentOS-NG (via the ACM
> > SIGHPC syspro Slack):
> >
> >
> https://www.linkedin.com/posts/gmkurtzer_centos-project-shifts-focus-to-cent
> > os-stream-activity-6742165208107761664-Ng4C
> >
> > All the best,
> > Chris
>
>
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [EXTERNAL] Lambda and Alexa [EXT]

2020-12-03 Thread John Hearns

Reviving this topic slightly, these were flagged up on the Julia forum

https://github.com/aws/aws-lambda-runtime-interface-emulator

The Lambda Runtime Interface Emulator is a proxy for Lambda’s Runtime and
Extensions APIs, which allows customers to locally test their Lambda
function packaged as a container image.

https://github.com/aws/aws-lambda-python-runtime-interface-client
The Lambda Runtime Interface Client is a lightweight interface that allows
your runtime to receive requests from and send requests to the Lambda
service.







On Wed, 25 Nov 2020 at 16:59, Tim Cutts  wrote:

> I think the 8 second limit is probably arbitrary.  Lambda’s normal limit
> is 5 minutes.  I presume Amazon did some UX work, and basically asked
> “what’s the maximum length of time your average user is willing to wait for
> an answer before they consider it a bad experience”, and came up with 8
> seconds.  You’re not allowed to change that value, so they obviously take
> it seriously!
>
> While testing the skill I developed, I certainly found that the turnaround
> time when I had to perform a full remote data fetch was about 5 seconds.
> That’s long enough after asking Alexa the question that I start to think
> “is it going to reply? is it working?” and that’s not a good experience, so
> my approach to that has been:
>
> (a) cache the data fetched; the data is stored in session attributes, and
> persisted to S3.  That cached copy provides a response which is within a
> second or two, a much nicer experience.
>
> (b) when fetching fresh data, there’s a progressive response API which you
> can call asynchronously, while the slower task takes place.  Now, that 5
> second wait doesn’t feel so bad, because you’re listening to “Please wait
> while I ask for the latest data” while the real work goes on in the
> background.  Silence in a conversation feels really uncomfortable really
> quickly, as we all know.
>
> Sorry, this is nothing to do with HPC or Beowulf, although kind of
> interesting from a UX perspective on voice-controlled systems.
>
> Tim
>
>
>
>
>
> On 25 Nov 2020, at 15:33, Lux, Jim (US 7140) 
> wrote:
>
> Interesting..
>
> Where does the 8 second limit come from? (Rodeos and bull/bronc riding,
> where you only have to stay on for 8 seconds?) I’ve seen this 8 second
> thing in a bunch of places lately, and I wonder.. why not 7, or 10 or
> whatever?  I find it hard to believe that someone has a 3 bit counter in
> seconds (or worse, it’s a 33 bit counter counting nanoseconds or some such,
> and the limit is actually 8.589 seconds)
>
>
>
> -- The Wellcome Sanger Institute is operated by Genome Research Limited, a
> charity registered in England with number 1021457 and a company registered
> in England with number 2742969, whose registered office is 215 Euston Road,
> London, NW1 2BE.
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Automatically replication of directories among nodes

2020-11-27 Thread John Hearns

James, that is cool!
A though I have had - for HA setups DRBD can be used for the shared files
which the nodes need to keep updated.
Has anyone tried Syncthing for this purpose?
I suppose there is only one way to find out!

On Fri, 27 Nov 2020 at 01:06, James Braid  wrote:

> On Wed, 25 Nov 2020, 09:28 Lux, Jim (US 7140) via Beowulf, <
> beowulf@beowulf.org> wrote:
>
>> What I have is 3 rPi computers, A,B, and C, and what I’d like to do is
>> keep the desktop and some data directories on all of them synchronized.  So
>> if on node A, I add something to A:~/Desktop, it (in short order) winds up
>> in the ~/Desktop directory on the other 2 machines.
>>
>
> Syncthing works great for these kinds of applications:
> https://syncthing.net/
>
> Install on all 3 nodes, add a shared directory on all 3 and it will keep
> everything synced. Lightweight single binary written in go and runs on
> almost every platform.
>
> I've replaced a number of messy rsync setups with syncthing and it also
> enables some more complex and interesting topologies (for example I have an
> offline host with limited resources pushing data to a nearby host with
> internet connectivity and from there to multiple other hosts).
>
> James
>
>
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] RoCE vs. InfiniBand

2020-11-26 Thread John Hearns

Jorg, I think I might know where the Lustre storage is !
It is possible to install storage routers, so you could route between
ethernet and infiniband.
It is also worth saying that Mellanox have Metro Infiniband switches -
though I do not think they go as far as the west of London!

Seriously though , you ask about RoCE. I will stick my neck out and say
yes, if you are planning an Openstack cluster
with the intention of having mixed AI and 'traditional' HPC workloads I
would go for a RoCE style setup.
In fact I am on a discussion about a new project for a customer with
similar aims in an hours time.

I could get some benchmarking time if you want to do a direct comparison of
Gromacs on IB / RoCE









On Thu, 26 Nov 2020 at 11:14, Jörg Saßmannshausen <
sassy-w...@sassy.formativ.net> wrote:

> Dear all,
>
> as the DNS problems have been solve (many thanks for doing this!), I was
> wondering if people on the list have some experiences with this question:
>
> We are currently in the process to purchase a new cluster and we want to
> use
> OpenStack for the whole management of the cluster. Part of the cluster
> will
> run HPC applications like GROMACS for example, other parts typical
> OpenStack
> applications like VM. We also are implementing a Data Safe Haven for the
> more
> sensitive data we are aiming to process. Of course, we want to have a
> decent
> size GPU partition as well!
>
> Now, traditionally I would say that we are going for InfiniBand. However,
> for
> reasons I don't want to go into right now, our existing file storage
> (Lustre)
> will be in a different location. Thus, we decided to go for RoCE for the
> file
> storage and InfiniBand for the HPC applications.
>
> The point I am struggling is to understand if this is really the best of
> the
> solution or given that we are not building a 100k node cluster, we could
> use
> RoCE for the few nodes which are doing parallel, read MPI, jobs too.
> I have a nagging feeling that I am missing something if we are moving to
> pure
> RoCE and ditch the InfiniBand. We got a mixed workload, from ML/AI to MPI
> applications like GROMACS to pipelines like they are used in the
> bioinformatic
> corner. We are not planning to partition the GPUs, the current design
> model is
> to have only 2 GPUs in a chassis.
> So, is there something I am missing or is the stomach feeling I have
> really a
> lust for some sushi? :-)
>
> Thanks for your sentiments here, much welcome!
>
> All the best from a dull London
>
> Jörg
>
>
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Lambda and Alexa [EXT]

2020-11-25 Thread John Hearns

Aha. I did not know about the 8 second limit. I use Alexa with a Philips
smart lighting hub to control house lights. Sometimes nothing happens...
I assumed this was Alexa not understanding a Scottish accent. I forgive
Alexa now - she might have been having trouble talking to the Hue.

On Wed, 25 Nov 2020 at 10:21, Tim Cutts  wrote:

> Indeed, my main personal experience with Lambda so far has been in writing
> an Alexa skill in my spare time.  It’s been quite fun, and very instructive
> in the benefits and pitfalls of lambda.
>
> My main takehomes so far:
>
> 1.  I love the fact that there’s basically no code at all other than that
> required to deliver the actual skill. Just handler functions for the
> incoming requests (Intents, as Amazon call them)
>
> 2.  Debugging is awkward.  There is no interactive debugging, as far as I
> can tell.   Log inspection is about all you have, and some errors are
> obtuse (for example, some valid Node.js constructs produce syntax errors on
> Lambda, and it’s very hard to track down when it happens - unit tests all
> pass locally but then you get a syntax error in the LogWatch logs, with a
> useless stack trace that doesn’t tell you where the syntax error is).
> Debugging and unit testing on your laptop is hard to do; many Alexa APIs
> rely on real hardware functions and the simulators don’t handle them.
>
> 3.  Persistence of data is fairly straightforward using S3 buckets or
> DynamoDB, and I haven’t noticed latency issues with those (of course the
> interactions are on a human timescale, so latency isn’t really much of an
> issue)
>
> 4.  Interaction with external services can be problematic; Alexa lambda
> functions must return within 8 seconds, which can be fun if your skill
> needs to fetch data from some other source (in my case a rather sluggish
> data service in Azure run by my local council), and there’s no clean way to
> handle the event if you hit the 8 second limit, the function just gets
> terminated and Alexa returns a rather meaningless error to the user.
>
> Tim
>
> On 25 Nov 2020, at 09:45, John Hearns  wrote:
>
> BTW, I am sure everyone knows this but if you have a home assistant such
> as Alexa everytime you ask Alexa it is a lambda which is spun up
>
>
> -- The Wellcome Sanger Institute is operated by Genome Research Limited, a
> charity registered in England with number 1021457 and a company registered
> in England with number 2742969, whose registered office is 215 Euston Road,
> London, NW1 2BE.
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Clustering vs Hadoop/spark [EXT]

2020-11-25 Thread John Hearns

Or to put it simply:  "Alexa - sequence my genome"

On Wed, 25 Nov 2020 at 09:45, John Hearns  wrote:

> Tim, that is really smart. Over on the Julia discourse forum I have blue
> skyed about using Lambdas to run Julia functions (it is an inherently
> functional language) (*)
> Blue skying further, for exascale compute needs can we think of 'Science
> as a Service'?
> As in your example the scientist thinks about the analysis and how it is
> performed. Then sends it off to be executed. Large chunks are run using
> Lambda functions.
> Crucially, if a Lambda (or whatever) fails the algorithm should be able to
> continue. People building web scale applications think like this today
> anyway.
> Do you REALLY think you are connected to Amazon's single web server when
> you make a purchase? But it looks that way.
> Also if you are about to purchase something and your Wifi goes down - as a
> customer you would be very angry if you were billed for this item.
>
> (*) It is possible to insert your own 'payload' in a Lambda. There are
> standard ones like Python obviously.
> However at the time I looked there was a small size limit on the payload.
>
> Re-reading my won response
> https://discourse.julialang.org/t/lambda-or-cloud-functions-eventually-possible/39128/5
> you CAN have a larger payload, but this has to be in an S3 bucket
> https://docs.aws.amazon.com/lambda/latest/dg/nodejs-package.html
>
> BTW, I am sure everyone knows this but if you have a home assistant such
> as Alexa everytime you ask Alexa it is a lambda which is spun up
>
>
>
>
>
>
>
> On Wed, 25 Nov 2020 at 09:27, Tim Cutts  wrote:
>
>>
>>
>> On 24 Nov 2020, at 18:31, Alex Chekholko via Beowulf 
>> wrote:
>>
>> If you can run your task on just one computer, you should always do that
>> rather than having to build a cluster of some kind and all the associated
>> headaches.
>>
>>
>> If you take on the cloud message, that of course isn’t necessarily the
>> case.  If you use very high level cloud services like lambda, you don’t
>> have to build that infrastructure.  It’s very unlikely to be anywhere near
>> as efficient, of course, but throughput efficiency is not what your average
>> scientist cares about.  What they care about is getting their answer
>> quickly (and to a lesser extent, cheaply)
>>
>> I saw a recent example where someone took a fairly simple sequencing read
>> alignment process, which normally runs on a single 16-core node in about 6
>> hours, and split the input files small enough that the alignment code
>> execution time and memory use would fit with AWS Lambda’s envelope.  The
>> result executed in a couple of minutes, elapsed, but used about four times
>> as many core-hours as the optimised single node version.  Of course, this
>> is an embarrassingly parallel problem, so this is a relatively easy
>> analysis to move to this sort of design.
>>
>> From the scientist’s point of view, which is better?  Getting their
>> answer in 5 minutes or 6 hours?  Especially if they’ve also reduced their
>> development time as well because they don’t have to worry so much about
>> infrastructure and optimisation.
>>
>> The total value is hard to work out, many of these considerations are
>> hard to put a dollar value on.  When I saw that article, I did ask the
>> author how much the analysis actually cost, and she didn’t have a number.
>> But I don’t think we can dogmatically say that we should always run a task
>> on a single machine if we can.
>>
>> Tim
>> -- The Wellcome Sanger Institute is operated by Genome Research Limited,
>> a charity registered in England with number 1021457 and a company
>> registered in England with number 2742969, whose registered office is 215
>> Euston Road, London, NW1 2BE.
>> ___
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Clustering vs Hadoop/spark [EXT]

2020-11-25 Thread John Hearns

Tim, that is really smart. Over on the Julia discourse forum I have blue
skyed about using Lambdas to run Julia functions (it is an inherently
functional language) (*)
Blue skying further, for exascale compute needs can we think of 'Science as
a Service'?
As in your example the scientist thinks about the analysis and how it is
performed. Then sends it off to be executed. Large chunks are run using
Lambda functions.
Crucially, if a Lambda (or whatever) fails the algorithm should be able to
continue. People building web scale applications think like this today
anyway.
Do you REALLY think you are connected to Amazon's single web server when
you make a purchase? But it looks that way.
Also if you are about to purchase something and your Wifi goes down - as a
customer you would be very angry if you were billed for this item.

(*) It is possible to insert your own 'payload' in a Lambda. There are
standard ones like Python obviously.
However at the time I looked there was a small size limit on the payload.

Re-reading my won response
https://discourse.julialang.org/t/lambda-or-cloud-functions-eventually-possible/39128/5
you CAN have a larger payload, but this has to be in an S3 bucket
https://docs.aws.amazon.com/lambda/latest/dg/nodejs-package.html

BTW, I am sure everyone knows this but if you have a home assistant such as
Alexa everytime you ask Alexa it is a lambda which is spun up







On Wed, 25 Nov 2020 at 09:27, Tim Cutts  wrote:

>
>
> On 24 Nov 2020, at 18:31, Alex Chekholko via Beowulf 
> wrote:
>
> If you can run your task on just one computer, you should always do that
> rather than having to build a cluster of some kind and all the associated
> headaches.
>
>
> If you take on the cloud message, that of course isn’t necessarily the
> case.  If you use very high level cloud services like lambda, you don’t
> have to build that infrastructure.  It’s very unlikely to be anywhere near
> as efficient, of course, but throughput efficiency is not what your average
> scientist cares about.  What they care about is getting their answer
> quickly (and to a lesser extent, cheaply)
>
> I saw a recent example where someone took a fairly simple sequencing read
> alignment process, which normally runs on a single 16-core node in about 6
> hours, and split the input files small enough that the alignment code
> execution time and memory use would fit with AWS Lambda’s envelope.  The
> result executed in a couple of minutes, elapsed, but used about four times
> as many core-hours as the optimised single node version.  Of course, this
> is an embarrassingly parallel problem, so this is a relatively easy
> analysis to move to this sort of design.
>
> From the scientist’s point of view, which is better?  Getting their answer
> in 5 minutes or 6 hours?  Especially if they’ve also reduced their
> development time as well because they don’t have to worry so much about
> infrastructure and optimisation.
>
> The total value is hard to work out, many of these considerations are hard
> to put a dollar value on.  When I saw that article, I did ask the author
> how much the analysis actually cost, and she didn’t have a number.  But I
> don’t think we can dogmatically say that we should always run a task on a
> single machine if we can.
>
> Tim
> -- The Wellcome Sanger Institute is operated by Genome Research Limited, a
> charity registered in England with number 1021457 and a company registered
> in England with number 2742969, whose registered office is 215 Euston Road,
> London, NW1 2BE.
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Best case performance of HPL on EPYC 7742 processor ...

2020-10-26 Thread John Hearns

This article might be interesting here:

https://www.dell.com/support/article/en-uk/sln319015/amd-rome-is-it-for-real-architecture-and-initial-hpc-performance?lang=en

And Hello Joshua. Long time no see.

On Sun, 25 Oct 2020 at 23:11, Joshua Mora  wrote:

> Reach out AMD,
> they have specific instructions (including BIOS/OS settings) and even
> binaries
> on how to get the best performance.
> Dont go try and error as is very time consuming.
> BLIS has also multiple parameters as it has nested loops, so you could also
> have to try multiple configurations to get the optimal performance.
> Just reach to them.
>
> Joshua
>
> -- Original Message --
> Received: 04:30 PM CDT, 08/14/2020
> From: Richard Walsh 
> To: Beowulf List 
> Subject: [Beowulf] Best case performance of HPL on EPYC 7742 processor ...
>
> > All,
> >
> > What have people achieved on this SKU on a single-node using the stock
> > HPL 2.3 source... ??
> >
> > I have seen a variety of performance claims even as high as 90% of its
> > nominal
> > per node peak of 4.608 TFLOPs.  I can now get above 80% of peak, but not
> > higher.
> > I have heard that to get higher values special BIOS settings are
> required,
> > including
> > the turning off SMT which allows the chip to turbo higher.  Remember this
> > is not the
> > 7542 processor with 32 cores per chip and the same bandwidth per socket
> as
> > the
> > 7742 which can turbo to over 100% of nominal peak for HPL.
> >
> > If people have gotten higher single node numbers ... what is your recipe
> > ... ??
> >
> > I am particularly interested in BIOS settings, and maybe surprise
> settings
> > in the HPL.dat file.  Do higher performing runs require using close to
> the
> > maximum memory on the node ... ??  As this is single-node, I would not
> > expect choice of MPI to make a difference
> >
> > To get to 80% with SMT on in the BIOS, I am building with an older Intel
> > compiler and MKL that still recognizes the MKL_DEBUG_CPU_TYPE=5.
> > Running so that the number of MPI ranks run on the node matches the
> > number of CCXs seems ot give the best numbers.
> >
> > Following the tuning instructions from AMD for using BLIS and GCC for
> > the build does not get me there.
> >
> > Thanks,
> >
> > Richard Walsh
> >
>
> > ___
> > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] UNCHECKED Re: Re: [EXTERNAL] Re: Re: Spark, Julia, OpenMPI etc. - all in one place

2020-10-20 Thread John Hearns

> Most compilers had extensions from the IV/66 (or 77) – quoted strings,
for instance, instead of Hollerith constants, and free form input.  Some
allowed array index origins other than 1

I can now date exactly when the rot set in.
Hollerith constants are good enough for anyone. It's a gosh darned
computer, not your nearest and dearest whispering in your ear. It still
thinks it is talking to a thundering line printer and getting its input
from a real Teletype.

Indexing from zero - who ever heard of zero of a thing. Damn quiche eaters.



On Mon, 19 Oct 2020 at 22:27, Lux, Jim (US 7140) via Beowulf <
beowulf@beowulf.org> wrote:

> Yes, the evil-ution of languages proceeded at a much more stately pace in
> “arpanet” days.
>
>
>
> Typically, you’d have a bunch of vendor specific versions, and since PCs
> per-se didn’t exist, you bought the compiler for the machine you had.  And
> then, maybe you paid attention to the notes in the back of the manual about
> deviations from the Fortran IV, 66, or 77.  Most compilers had extensions
> from the IV/66 (or 77) – quoted strings, for instance, instead of Hollerith
> constants, and free form input.  Some allowed array index origins other
> than 1 (handy for FFTs where you wanted to go from -N/2 to N/2).  Most also
> had some provision for direct access to files, as opposed to sequential,
> but it was very, very OS dependent.
>
>
>
> Probably by the 80s and early 90s, with widespread use of personal
> computers, and the POSIX standard, you started to see more “machine
> independent, standards compliant” Fortran. And, you saw the idea of buying
> your compiler from someone different than the computer maker, i.e.
> companies like Absoft and Portland Group (now part of nvidia), partly
> because the microcomputer manufacturers had no interest in developing
> compilers for cheap processors, and sometimes to accommodate a specialized
> need.  Hence products like Fortran for 8080 under CP/M from Digital
> Research.  ( I ran Cromemco Fortran IV in 48k of RAM on my mighty Cromemco
> Z80 at 4MHz, which I believe was a variant of Fortran-80 from DR)
>
>
>
> But even then, it was a pretty slow evolution – the Fortran compilers I
> was running in the 80s on microcomputers under MS-DOS wasn’t materially
> different from the Fortran I was running in 1978 on a Z80, which wasn’t
> significantly different from the Fortran I ran on mainframes (IBM 360, CDC
> 6xxx, etc.) and minis (IBM 1130, PDP-11 in the 60s and 70s. What would
> change is things like the libraries available to do “non-standard” stuff
> (like random disk access).
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From: *Beowulf  on behalf of "
> beowulf@beowulf.org" 
> *Reply-To: *Prentice Bisbal 
> *Date: *Monday, October 19, 2020 at 12:21 PM
> *To: *"Renfro, Michael" , "beowulf@beowulf.org" <
> beowulf@beowulf.org>
> *Subject: *Re: [Beowulf] ***UNCHECKED*** Re: Re: [EXTERNAL] Re: Re:
> Spark, Julia, OpenMPI etc. - all in one place
>
>
>
> That's exactly what I suspected. I guess 13 years is like an eternity in
> the modern "Speed of the Internet" world we live in, but may not have been
> such a slow evolution time of the pre-Internet days.
>
> Prentice
>
> On 10/19/20 2:53 PM, Renfro, Michael wrote:
>
> Minor point of pedagogy from my place in the "learned FORTRAN 77 in 1990"
> crowd: your instructor's options would have been:
>
>
>
>- standard FORTRAN 77
>- vendor-specific dialect of FORTRAN (VAX or otherwise)
>- maybe a pre-release of FORTRAN 90? Wasn't released and standardized
>until 1991-92.
>
>
>
> Never mind the availability of texts for same.
>
>
>
> *From: *Beowulf 
> 
> *Date: *Monday, October 19, 2020 at 12:06 PM
> *To: *beowulf@beowulf.org  
> *Subject: *Re: [Beowulf] ***UNCHECKED*** Re: Re: [EXTERNAL] Re: Re:
> Spark, Julia, OpenMPI etc. - all in one place
>
>
> On 10/19/20 10:28 AM, Douglas Eadline wrote:
> > --snip--
> >
> >> Unfortunately the presumption seems to be that the old is deficient
> >> because it is old, and "my generationâ€ didn't invent it (which is
> >> clearly perverse; I see no rush to replace English, French, â€¦ which
> are
> >> all older than any of our programming languages, and which adapt, as do
> >> our programming languages).
> >>
> > I think this has a lot to do with the Fortran situation. In these
> "modern"
> > times, software seems to have gone from "releases" to a "sliding
> > constant release" cycle and anything not released in the past few
> > months is "old."
> >
> > How many people here will wait a 2-6 months before installing
> > a "new version" of some package in production to make sure there
> > are no major issues. And of course keep older version options
> > with software modules. Perhaps because I've been at this a while,
> > I have a let it "mellow a bit" approach to shinny new software.
> >
> > I find it odd that Fortran gets placed in the "old software box"
> > because it works while new languages with their constant feature
> > churn and versions break dependency

Re: [Beowulf] [EXTERNAL] Re: UNCHECKED Re: Spark, Julia, OpenMPI etc. - all in one place

2020-10-20 Thread John Hearns

Atul, welcome to the Beowulf list. My apology if you have posted here in
the past.
I invite you to make a post regarding quantum computing. This is a
fascinating field in its own right.

Please start a thread - I will then make my comparison with guitars.

On Tue, 20 Oct 2020 at 04:02, atul kumthekar 
wrote:

> Regards Quantum, one area is security. Quantum communication for
> exchanging public key. If anyone tries to evesdrop, state collapses and
> receiver can know.
>
> Cheers
>
> On Thu, 15 Oct, 2020, 9:26 pm Lux, Jim (US 7140), <
> james.p@jpl.nasa.gov> wrote:
>
>> I don’t know that Quantum computing is something that, say, Bank of
>> America, or the IRS, would be able to effectively leverage.  What they do
>> isn’t computationally complex – the vast majority of the workload is “pick
>> up data from place A, look at it, do something or not, put it back in place
>> A, and potentially generate some new data in place B”, but it’s exceedingly
>> requirements intensive. Consider something simple like handling a monthly
>> mortgage deposit – at the core, it’s “bump current balance with amount of
>> check”, but wait, what if they’re in foreclosure? Or have a forbearance? Or
>> the mortgage is held by an active duty servicemember? And the exception
>> handling is even more complex – what country, state and/or county is the
>> property in, the owner, the lien holder?
>>
>>
>>
>> There are thousands upon thousands of business rules which have to be
>> continuously maintained and audited for regulatory compliance.  The
>> limiting resource isn’t computational horsepower, it’s organizing the
>> thousands of processes, which is primarily a people problem, not a computer
>> problem.  Their existing work and data flows are already parallelized in
>> some sense, and if they need to do it faster, they just add processors or
>> storage as needed.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From: *atul kumthekar 
>> *Date: *Thursday, October 15, 2020 at 8:08 AM
>> *To: *Jim Lux 
>> *Cc: *Oddo Da , John Hearns , "
>> beowulf@beowulf.org" 
>> *Subject: *Re: [Beowulf] [EXTERNAL] Re: ***UNCHECKED*** Re: Spark,
>> Julia, OpenMPI etc. - all in one place
>>
>>
>>
>> with Quantum Computing on the horizon, there may be major change in the
>> direction.
>>
>>
>>
>> On Thu, Oct 15, 2020 at 7:46 PM Lux, Jim (US 7140) via Beowulf <
>> beowulf@beowulf.org> wrote:
>>
>> Not all offensive..
>>
>> It’s always useful to take a step back and say “well, rather than
>> incremental change X, what about wholesale change Y”.
>>
>>
>>
>> One interesting phenomenon, too, is that once a large, complex system has
>> been around a while, it becomes the embodiment of the requirements that
>> produced it, yet those requirements are not found anywhere (at least not in
>> a coherent single source). So the risk of new implementation is enormous,
>> since the probability of the new system not properly implementing a
>> requirement is large.  If your system is, say, processing airline
>> reservations or income tax returns, the cost of a problem is enormous.   It
>> doesn’t take many multi-million dollar “oopsies” to make the cost of half a
>> dozen skilled software developers to tinker at the edges negligible.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From: *Beowulf  on behalf of Oddo Da <
>> oddodao...@gmail.com>
>> *Date: *Thursday, October 15, 2020 at 4:10 AM
>> *To: *John Hearns 
>> *Cc: *"beowulf@beowulf.org" 
>> *Subject: *Re: [Beowulf] [EXTERNAL] Re: ***UNCHECKED*** Re: Spark,
>> Julia, OpenMPI etc. - all in one place
>>
>>
>>
>> On Thu, Oct 15, 2020 at 1:11 AM John Hearns  wrote:
>>
>> This has been a great discussion. Please keep it going.
>>
>>
>>
>> I am all out of ammo ;). In all seriousness, it is not easy to ask these
>> questions because it kind of can be interpreted as offensive - in a
>> nutshell, people may perceive what I am asking as "what have y'all been
>> doing for 20 years? Nothing?".
>>
>>
>>
>> To the points on technical debt, may I also add re-validation?
>>
>> Let's say you have a weather model which your institute has been running
>> for 20 years.
>>
>> If you decide to start again from fresh with code in a new language you
>> are going to have to re-run known models
>>
>> and debate whether or not they fit within error bound

Re: [Beowulf] UNCHECKED Re: Re: [EXTERNAL] Re: Re: Spark, Julia, OpenMPI etc. - all in one place

2020-10-19 Thread John Hearns

> I have a let it "mellow a bit" approach to shinny new software.
Software as malt whisky... I like it.
 Which reminds me to ask re LECBIG plans?

On Mon, 19 Oct 2020 at 15:28, Douglas Eadline  wrote:

> --snip--
>
> > Unfortunately the presumption seems to be that the old is deficient
> > because it is old, and "my generation” didn't invent it (which is
> > clearly perverse; I see no rush to replace English, French, … which are
> > all older than any of our programming languages, and which adapt, as do
> > our programming languages).
> >
>
> I think this has a lot to do with the Fortran situation. In these "modern"
> times, software seems to have gone from "releases" to a "sliding
> constant release" cycle and anything not released in the past few
> months is "old."
>
> How many people here will wait a 2-6 months before installing
> a "new version" of some package in production to make sure there
> are no major issues. And of course keep older version options
> with software modules. Perhaps because I've been at this a while,
> I have a let it "mellow a bit" approach to shinny new software.
>
> I find it odd that Fortran gets placed in the "old software box"
> because it works while new languages with their constant feature
> churn and versions break dependency trees all over the place,
> and somehow that is good thing. Now get off my lawn.
>
> --
> Doug
>
>
>
>
>
>
> --
> Doug
>
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] UNCHECKED Re: [EXTERNAL] Re: Re: Spark, Julia, OpenMPI etc. - all in one place

2020-10-19 Thread John Hearns

Och Jim, it's weel kent that yir a canny loon.
Gie Fortran and OpenMP tae the bairns.

On Mon, 19 Oct 2020 at 10:51, Jim Cownie  wrote:

> Modern Fortran workshops exist - but they need to be promoted more widely.
>
>
> Part of the issue may be in the use of the word “modern”, which is always
> relative, (see “The Modern Movement” in architecture, which seems generally
> to be agreed to have ended in 1960 :-)); similarly, it's not surprising if
> people are confused when they Google for a book on “Modern Fortran”, and
> the top hit is a book published in 2011[1] (admittedly if you scroll down
> you find that there’s a later edition including Fortran 2018), the second
> hit one published in 2012 [2]. (Of course, YMMV).
>
> So, perhaps we should now recognise that these standards will outlive us,
> and instead of trying to emphasise modernity, stick to absolute names
> (“Fortran 2018 Explained”…)
>
> The same problem likely applies to OpenMP (though at least Intel’s "OpenMP
> Offload Basics" online course [3] is not called “Modern OpenMP” :-)).
> And the free tutorial we’ll have at the UK & Europe OpenMP Developers’
> conference is "OpenMP for Computational Scientists: From serial Fortran
> to thousand-way parallelism on GPUs using OpenMP” [4]
>
> [1]
> https://www.amazon.co.uk/Explained-Numerical-Mathematics-Scientific-Computation/dp/0199601429
> [2]
> https://www.amazon.co.uk/Modern-Fortran-Practice-Arjen-Markus/dp/1107603471
>
> [3]
> https://software.intel.com/content/www/us/en/develop/tools/oneapi/training/openmp-offload.html
>
> [4] https://ukopenmpusers.co.uk/
>
> Which leads to my next point - dare I say it the IT industry exists
> through churn. There is always a promotion of the new,
> which means that the old must somehow be deficient.
>
> Unfortunately the presumption seems to be that the old is deficient
> because it is old, and "my generation” didn't invent it (which is clearly
> perverse; I see no rush to replace English, French, … which are all older
> than any of our programming languages, and which adapt, as do our
> programming languages).
>
> On 19 Oct 2020, at 09:48, John Hearns  wrote:
>
> Jim you make good points here. I guess my replies are:
>
> Modern Fortran workshops exist - but they need to be promoted more widely.
> Which leads to my next point - dare I say it the IT industry exists
> through churn. There is always a promotion of the new,
> which means that the old must somehow be deficient.
> I question - are 'the young' taking up Fortran programming?
> However let's look at what drove the upturn in AI - it was being able to
> run models on a GPU in your dorm room, or hire a GPU instance on the cloud.
> But also shrink wrapped Tensorflow.
> Should we be saying to kids - hey kid, you can forecast the weather /
> design a new car with your own PC.
> Maybe a container with some relevant software and models?
>
> And now everyone will point me towards such projects
>
>
>
>
> On Mon, 19 Oct 2020 at 09:28, Jim Cownie  wrote:
>
>> One more point, which may already have been made, but in case not…
>> You are asking (my paraphrase…)
>> * “Why hasn't MPI been replaced with something higher level?”
>> * “Why hasn't Fortran been replaced with something higher level?”
>>
>> In that context, it seems worth pointing out that
>> * Fortran is much higher level than it used to be (e.g. operation on
>> whole arrays without needing loops was certainly not in FORTRAN IV or
>> Fortran 77)
>> * Since Fortran 2008, it has had support for the co-array features which
>> mean that you can write distributed memory codes without (explicitly) using
>> MPI, and with a syntax that looks like array indexing, rather than message
>> passing.
>>
>> There’s a general educational issue here, which is that it is much easier
>> for people to recognise that they need education to understand something if
>> that thing is something they only just heard about, whereas even if it has
>> many new features, if it’s something whose name they already know (and
>> which they did a course in 15 years ago) then they think they already know
>> all about it.
>> Fortran clearly suffers from this, but so do C++, OpenMP, …
>>
>> -- Jim
>> James Cownie 
>> Mob: +44 780 637 7146
>>
>> > On 15 Oct 2020, at 12:07, Oddo Da  wrote:
>> >
>> > On Thu, Oct 15, 2020 at 1:11 AM John Hearns  wrote:
>> > This has been a great discussion. Please keep it going.
>> >
>> > I am all out of ammo ;). In all seriousness, it is not easy to ask
>> these questions because it kind of can be interpreted as offensive - in a
>&

[Beowulf] UNCHECKED Re: [EXTERNAL] Re: Re: Spark, Julia, OpenMPI etc. - all in one place

2020-10-19 Thread John Hearns

Jim you make good points here. I guess my replies are:

Modern Fortran workshops exist - but they need to be promoted more widely.
Which leads to my next point - dare I say it the IT industry exists through
churn. There is always a promotion of the new,
which means that the old must somehow be deficient.
I question - are 'the young' taking up Fortran programming?
However let's look at what drove the upturn in AI - it was being able to
run models on a GPU in your dorm room, or hire a GPU instance on the cloud.
But also shrink wrapped Tensorflow.
Should we be saying to kids - hey kid, you can forecast the weather /
design a new car with your own PC.
Maybe a container with some relevant software and models?

And now everyone will point me towards such projects




On Mon, 19 Oct 2020 at 09:28, Jim Cownie  wrote:

> One more point, which may already have been made, but in case not…
> You are asking (my paraphrase…)
> * “Why hasn't MPI been replaced with something higher level?”
> * “Why hasn't Fortran been replaced with something higher level?”
>
> In that context, it seems worth pointing out that
> * Fortran is much higher level than it used to be (e.g. operation on whole
> arrays without needing loops was certainly not in FORTRAN IV or Fortran 77)
> * Since Fortran 2008, it has had support for the co-array features which
> mean that you can write distributed memory codes without (explicitly) using
> MPI, and with a syntax that looks like array indexing, rather than message
> passing.
>
> There’s a general educational issue here, which is that it is much easier
> for people to recognise that they need education to understand something if
> that thing is something they only just heard about, whereas even if it has
> many new features, if it’s something whose name they already know (and
> which they did a course in 15 years ago) then they think they already know
> all about it.
> Fortran clearly suffers from this, but so do C++, OpenMP, …
>
> -- Jim
> James Cownie 
> Mob: +44 780 637 7146
>
> > On 15 Oct 2020, at 12:07, Oddo Da  wrote:
> >
> > On Thu, Oct 15, 2020 at 1:11 AM John Hearns  wrote:
> > This has been a great discussion. Please keep it going.
> >
> > I am all out of ammo ;). In all seriousness, it is not easy to ask these
> questions because it kind of can be interpreted as offensive - in a
> nutshell, people may perceive what I am asking as "what have y'all been
> doing for 20 years? Nothing?".
> >
> > To the points on technical debt, may I also add re-validation?
> > Let's say you have a weather model which your institute has been running
> for 20 years.
> > If you decide to start again from fresh with code in a new language you
> are going to have to re-run known models
> > and debate whether or not they fit within error bounds of the old model.
> > That takes effort - which may of course be justified if you make gains
> in speed, flexibility or being able to use new hardware like GPUs.
> >
> > I understand all this but, of course, not everything has to do what has
> been done. Hopefully, there are plenty of people entering the field or
> coming back to it, without any technical debt.
> >
> > ___
> > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
>
>
>
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Julia on POWER9?

2020-10-16 Thread John Hearns

Hello Prentice. I think you need to come over to the Julia Discourse

https://discourse.julialang.org/t/knet-on-powerpc64le-platform/48149

On Thu, 15 Oct 2020 at 22:09, Joe Landman  wrote:

> Cool (shiny!)
> On 10/15/20 5:02 PM, Prentice Bisbal via Beowulf wrote:
>
> So while you've all been discussing Julia, etc., I've been trying to build
> and get it running on POWER9 for a cluster of AC922 nodes (same as Summit,
> but with 4 GPUs per node). After doing a combination of Google searching
> and soul-searching, I was able to get a functional version of Julia to
> build for POWER9. However, I'm not 100% sure my build is fully functional,
> as when I did 'make testall' some of the tests failed.
>
> Is there anyone on this list using or supporting the latest version of
> Julia, 1.5.2, on POWER9? If so, I'd like to compare notes. I imagine
> someone from OLCF is on this list.
>
> Based on my Internet searching, as of August 2019 Julia was being used on
> Summit on thousands of cores, but I've also seen posts from the Julia devs
> saying they can't support the POWER architecture anymore because they no
> longer have access to POWER hardware. Most of this information comes from
> the Julia GitHub or Julia Discourse conversations.
>
> --
> Joe Landman
> e: joe.land...@gmail.com
> t: @hpcjoe
> w: https://scalability.org
> g: https://github.com/joelandman
> l: https://www.linkedin.com/in/joelandman
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [EXTERNAL] Re: UNCHECKED Re: Spark, Julia, OpenMPI etc. - all in one place

2020-10-14 Thread John Hearns

This has been a great discussion. Please keep it going.

To the points on technical debt, may I also add re-validation?
Let's say you have a weather model which your institute has been running
for 20 years.
If you decide to start again from fresh with code in a new language you are
going to have to re-run known models
and debate whether or not they fit within error bounds of the old model.
That takes effort - which may of course be justified if you make gains in
speed, flexibility or being able to use new hardware like GPUs.










On Thu, 15 Oct 2020 at 05:10, Lux, Jim (US 7140) via Beowulf <
beowulf@beowulf.org> wrote:

>
>
> Well, maybe a Beowulf cluster of yugos…
>
>
>
> *From: *Beowulf  on behalf of Oddo Da <
> oddodao...@gmail.com>
> *Date: *Wednesday, October 14, 2020 at 4:15 PM
> *To: *Michael Di Domenico , "beowulf@beowulf.org"
> 
> *Subject: *[EXTERNAL] Re: [Beowulf] ***UNCHECKED*** Re: Spark, Julia,
> OpenMPI etc. - all in one place
>
>
>
> Michael, thank you, you have given me quite a lot to think about.
>
>
>
> On Wed, Oct 14, 2020 at 2:28 PM Michael Di Domenico <
> mdidomeni...@gmail.com> wrote:
>
> On Wed, Oct 14, 2020 at 2:07 PM Oddo Da  wrote:
> 
>
>
> i think the stuck state you're interpreting is a misrepresentation
> that HPC is full of stodgy greybeards who only want to run MPI code
> written in 1970's fortran.  i don't think that's the case anymore.
> HPC has branched out and includes a lot of ancillary paths, but it
> still holds onto its heritage, which is something I appreciate.  HPC
> has never been about flash, it's about solving the world's hardest
> problems.  You don't always need a porsche, sometimes a yugo works
> just as well
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] experience with HPC running on OpenStack

2020-06-30 Thread John Hearns

Jorg, I would back up what Matt Wallis says. What benefits would Openstack
bring you ?
Do you need to set up a flexible infrastructure where clusters can be
created on demand for specific projects?

Regarding Infiniband the concept is SR-IOV. This article is worth reading:
https://docs.openstack.org/neutron/pike/admin/config-sriov.html

I would take a step back and look at your storage technology and which is
the best one to be going forward with.
Also look at the proceeding sof the last STFC Computing Insights where
Martyn Guest presented  a lot of
benchmarking results   on AMD Rome
Page 103 onwards in this report
http://purl.org/net/epubs/manifestation/46387165/DL-CONF-2020-001.pdf

On Tue, 30 Jun 2020 at 12:21, Jörg Saßmannshausen <
sassy-w...@sassy.formativ.net> wrote:

> Dear all,
>
> we are currently planning a new cluster and this time around the idea was
> to
> use OpenStack for the HPC part of the cluster as well.
>
> I was wondering if somebody has some first hand experiences on the list
> here.
> One of the things we currently are not so sure about it is InfiniBand (or
> another low latency network connection but not ethernet): Can you run HPC
> jobs
> on OpenStack which require more than the number of cores within a box? I
> am
> thinking of programs like CP2K, GROMACS, NWChem (if that sounds familiar
> to
> you) which utilise these kind of networks very well.
>
> I cam across things like MagicCastle from Computing Canada but as far as I
> understand it, they are not using it for production (yet).
>
> Is anybody on here familiar with this?
>
> All the best from London
>
> Jörg
>
>
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] experience with HPC running on OpenStack

2020-06-30 Thread John Hearns

The video is here. From 04:00 onwards

https://fosdem.org/2020/schedule/event/magic_castle/

"OK your cluster will be available in about 20 minutes"

On Tue, 30 Jun 2020 at 14:27, INKozin  wrote:

> And that's how you deploy an HPC cluster!
>
> On Tue, 30 Jun 2020 at 14:21, John Hearns  wrote:
>
>> I saw Magic Castle being demonstrated lve at FOSDEM this year.
>> It is more a Terraform/ansible setup for configuring clusters on demand.
>>
>> The person demonstrating it called a Google Home assistant with a voice
>> command and asked it to build and deploy a cluster - which it did!
>>
>> On Tue, 30 Jun 2020 at 12:21, Jörg Saßmannshausen <
>> sassy-w...@sassy.formativ.net> wrote:
>>
>>> Dear all,
>>>
>>> we are currently planning a new cluster and this time around the idea
>>> was to
>>> use OpenStack for the HPC part of the cluster as well.
>>>
>>> I was wondering if somebody has some first hand experiences on the list
>>> here.
>>> One of the things we currently are not so sure about it is InfiniBand
>>> (or
>>> another low latency network connection but not ethernet): Can you run
>>> HPC jobs
>>> on OpenStack which require more than the number of cores within a box? I
>>> am
>>> thinking of programs like CP2K, GROMACS, NWChem (if that sounds familiar
>>> to
>>> you) which utilise these kind of networks very well.
>>>
>>> I cam across things like MagicCastle from Computing Canada but as far as
>>> I
>>> understand it, they are not using it for production (yet).
>>>
>>> Is anybody on here familiar with this?
>>>
>>> All the best from London
>>>
>>> Jörg
>>>
>>>
>>>
>>> ___
>>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>
>> ___
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] experience with HPC running on OpenStack

2020-06-30 Thread John Hearns

I saw Magic Castle being demonstrated lve at FOSDEM this year.
It is more a Terraform/ansible setup for configuring clusters on demand.

The person demonstrating it called a Google Home assistant with a voice
command and asked it to build and deploy a cluster - which it did!

On Tue, 30 Jun 2020 at 12:21, Jörg Saßmannshausen <
sassy-w...@sassy.formativ.net> wrote:

> Dear all,
>
> we are currently planning a new cluster and this time around the idea was
> to
> use OpenStack for the HPC part of the cluster as well.
>
> I was wondering if somebody has some first hand experiences on the list
> here.
> One of the things we currently are not so sure about it is InfiniBand (or
> another low latency network connection but not ethernet): Can you run HPC
> jobs
> on OpenStack which require more than the number of cores within a box? I
> am
> thinking of programs like CP2K, GROMACS, NWChem (if that sounds familiar
> to
> you) which utilise these kind of networks very well.
>
> I cam across things like MagicCastle from Computing Canada but as far as I
> understand it, they are not using it for production (yet).
>
> Is anybody on here familiar with this?
>
> All the best from London
>
> Jörg
>
>
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Neocortex unreal supercomputer

2020-06-12 Thread John Hearns

Will it dream of electric sheep when they turn out the lights and let it
sleep?

https://www.lanl.gov/discover/news-release-archive/2020/June/0608-artificial-brains.php


On Fri, 12 Jun 2020 at 01:16, Jonathan Engwall <
engwalljonathanther...@gmail.com> wrote:

> This machine is planned, or possibly being built in Pittsburg. It sounds
> impossible with a CPU approximatly 8 inches on each side, if square, having
> thousands of cores and needing hundreds of 100 gigabit cards to its slave
> machines.
>
> https://www.hpcwire.com/2020/06/09/neocortex-will-be-first-of-its-kind-80-core-ai-supercomputer/
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [External] Re: Intel Cluster Checker

2020-04-30 Thread John Hearns

Thanks Chris.  I worked in one place which was setting up Reframe. It
looked to be complicated to get running.
Has this changed?

On Thu, 30 Apr 2020 at 20:09, Chris Samuel  wrote:

> On 4/30/20 6:54 am, John Hearns wrote:
>
> > That is a four letter abbreviation...
>
> Ah you mean an ETLA (Extended TLA).
>
> I've not used ICC but we do use Reframe (from CSCS) at work for testing
> both between maintenances on our test system for changes we're making
> and also after the maintenance as a checkout before opening the system
> back up to users. It's proved very useful.
>
> All the best,
> Chris
> --
> Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [External] Re: Intel Cluster Checker

2020-04-30 Thread John Hearns

That is a four letter abbreviation...  Intel clearly needed to expand the
namespace.

On Thu, 30 Apr 2020 at 14:35, Prentice Bisbal  wrote:

> Intel abbreviates the cluster checker as "clck"
> On 4/30/20 5:13 AM, Jim Cownie wrote:
>
> Bewarel of a TLA collision here. ICC is normally the Intel C Compiler, or
> C/C++ compiler suite (since you invoke the C compiler as “icc”). :-)
>
> On 30 Apr 2020, at 08:37, John Hearns  wrote:
>
> Thanks Prentice. Iw as discussing this only to days ago...
> I used the older version of ICC when working at XMA int the UK.
> When the version as changed I found it a lot more difficult to implement.
>
> I looked two days ago and the project seems to be revived, and
> incorporated into oneAPI
> Is anyone using the latest versions?
>
> In answer to your question ICC does not take a huge amount of time.
> I would say overnight perhaps, I cant really remember.
>
>
> On Wed, 29 Apr 2020 at 21:07, Prentice Bisbal via Beowulf <
> beowulf@beowulf.org> wrote:
>
>> Beowulfers,
>>
>> Have any of you used the Intel Cluster Checker? I've been tasked with
>> using it, and I think I have it running, but the documentation isn't
>> very good. I was wondering how long a typical run on some cluster nodes
>> should take.
>>
>> Prentice
>>
>> ___
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
>
> -- Jim
> James Cownie 
> Mob: +44 780 637 7146
>
>
>
>
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Intel Cluster Checker

2020-04-30 Thread John Hearns

Thanks Prentice. Iw as discussing this only to days ago...
I used the older version of ICC when working at XMA int the UK.
When the version as changed I found it a lot more difficult to implement.

I looked two days ago and the project seems to be revived, and incorporated
into oneAPI
Is anyone using the latest versions?

In answer to your question ICC does not take a huge amount of time.
I would say overnight perhaps, I cant really remember.

On Wed, 29 Apr 2020 at 21:07, Prentice Bisbal via Beowulf <
beowulf@beowulf.org> wrote:

> Beowulfers,
>
> Have any of you used the Intel Cluster Checker? I've been tasked with
> using it, and I think I have it running, but the documentation isn't
> very good. I was wondering how long a typical run on some cluster nodes
> should take.
>
> Prentice
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] HPC for community college?

2020-02-21 Thread John Hearns via Beowulf

Thinking about the applications to be run at a community college, the
concept of a local weather forecast has been running around in my head
lately.
The concept would be to install and run WRF, perhaps overnight, and produce
a weather forecast in the morning.
I suppose this hinges on WRF having a sufficiently small scale for local
forecasting and on being able to download
input data every day.

Your thoughts please?






On Sat, 22 Feb 2020 at 03:43, Douglas Eadline  wrote:

>
> That is the idea behind the Limulus systems -- a personal (or group) small
> turn-key cluster that can deliver local HPC performance.
> Users can learn HPC software, administration, and run production
> codes on performance hardware.
>
> I have been calling these "No Data Center Needed"
> computing systems (or as is now the trend "Edge" computing).
> These systems have a different power/noise/heat envelope
> than a small pile of data center servers (i.e. you can use
> them next to your desk, in a lab or classroom, at home etc.)
>
> Performance is optimized to fit in an ambient power/noise/heat
> envelope. Basement Supercomputing recently started shipping
> updated systems with uATX blades and 65W Ryzen processors
> (with ECC), more details are on the data sheet (web page not
> updated to new systems just yet)
>
>
> https://www.basement-supercomputing.com/download/limulus-data-sheets/Limulus_ALL.pdf
>
> Full disclosure, I work with Basement Supercomputing.
>
> --
> Doug
>
> >
> > Is there a role for a modest HPC cluster at the community college?
> >
> > ___
> > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >
>
>
> --
> Doug
>
>
>
> --
> Doug
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Build Recommendations - Private Cluster

2019-08-21 Thread John Hearns via Beowulf

A Transputer cluster? Squ! I know John Taylor (formerly Meiko/Quadrics)
very well.
Perhaps send me a picture off-list please?


On Wed, 21 Aug 2019 at 06:55, Richard Edwards  wrote:

> Hi John
>
> No doom and gloom.
>
> It's in a purpose built workshop/computer room that I have; 42U Rack,
> cross draft cooling which is sufficient and 32AMP Power into the PDU’s. The
> equipment is housed in the 42U Rack along with a variety of other machines
> such as Sun Enterprise 4000 and a 30 CPU Transputer Cluster. None of it
> runs 24/7 and not all of it is on at the same time, mainly because of the
> cost of power :-/
>
> Yeah the Tesla 1070’s scream like a banshee…..
>
> I am planning on running it as power on, on demand setup, which I already
> do through some HP iLo and APC PDU Scripts that I have for these machines.
> Until recently I have been running some of them as a vSphere cluster and
> others as standalone CUDA machines.
>
> So that’s one vote for OpenHPC.
>
> Cheers
>
> Richard
>
> On 21 Aug 2019, at 3:45 pm, John Hearns via Beowulf 
> wrote:
>
> Add up the power consumption for each of those servers. If you plan on
> installing this in a domestic house or indeed in a normal office
> environment you probably wont have enough amperage in the circuit you
> intend to power it from.
> Sorry to be all doom and gloom.
> Also this setup will make a great deal of noise. If in a domestic setting
> put it in the garage.
> In an office setting the obvious place is a comms room but be careful
> about the ventilation.
> Office comms rooms often have a single wall mounted air conditioning unit.
> Make SURE to run a temperature shutdown script.
> This air con unit WILL fail over a weekend.
>
> Regarding the software stack I would look at OpenHPC. But that's just me.
>
>
>
>
>
> On Wed, 21 Aug 2019 at 06:09, Dmitri Chubarov 
> wrote:
>
>> Hi,
>> this is a very old hardware and you would have to stay with a very
>> outdated software stack as 1070 cards are not supported by the recent
>> versions of NVIDIA Drivers and old versions of NVIDIA drivers do not play
>> well with modern kernels and modern system libraries.Unless you are doing
>> this for digital preservation, consider dropping 1070s out of the equation.
>>
>> Dmitri
>>
>>
>> On Wed, 21 Aug 2019 at 06:46, Richard Edwards  wrote:
>>
>>> Hi Folks
>>>
>>> So about to build a new personal GPU enabled cluster and am looking for
>>> peoples thoughts on distribution and management tools.
>>>
>>> Hardware that I have available for the build
>>> - HP Proliant DL380/360 - mix of G5/G6
>>> - HP Proliant SL6500 with 8 GPU
>>> - HP Proliant DL580 - G7 + 2x K20x GPU
>>> -3x Nvidia Tesla 1070 (4 GPU per unit)
>>>
>>> Appreciate people insights/thoughts
>>>
>>> Regards
>>>
>>> Richard
>>> ___
>>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>
>> ___
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
>
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Build Recommendations - Private Cluster

2019-08-20 Thread John Hearns via Beowulf

Add up the power consumption for each of those servers. If you plan on
installing this in a domestic house or indeed in a normal office
environment you probably wont have enough amperage in the circuit you
intend to power it from.
Sorry to be all doom and gloom.
Also this setup will make a great deal of noise. If in a domestic setting
put it in the garage.
In an office setting the obvious place is a comms room but be careful about
the ventilation.
Office comms rooms often have a single wall mounted air conditioning unit.
Make SURE to run a temperature shutdown script.
This air con unit WILL fail over a weekend.

Regarding the software stack I would look at OpenHPC. But that's just me.

On Wed, 21 Aug 2019 at 06:09, Dmitri Chubarov 
wrote:

> Hi,
> this is a very old hardware and you would have to stay with a very
> outdated software stack as 1070 cards are not supported by the recent
> versions of NVIDIA Drivers and old versions of NVIDIA drivers do not play
> well with modern kernels and modern system libraries.Unless you are doing
> this for digital preservation, consider dropping 1070s out of the equation.
>
> Dmitri
>
>
> On Wed, 21 Aug 2019 at 06:46, Richard Edwards  wrote:
>
>> Hi Folks
>>
>> So about to build a new personal GPU enabled cluster and am looking for
>> peoples thoughts on distribution and management tools.
>>
>> Hardware that I have available for the build
>> - HP Proliant DL380/360 - mix of G5/G6
>> - HP Proliant SL6500 with 8 GPU
>> - HP Proliant DL580 - G7 + 2x K20x GPU
>> -3x Nvidia Tesla 1070 (4 GPU per unit)
>>
>> Appreciate people insights/thoughts
>>
>> Regards
>>
>> Richard
>> ___
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

[Beowulf] Cray Shasta Software

2019-08-17 Thread John Hearns via Beowulf

https://www.scientific-computing.com/news/cray-announces-shasta-software

Joe Landman, would you care to tell us more?
The integration of Kubernetes and batch system sounds interesting.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Lustre on google cloud

2019-07-31 Thread John Hearns via Beowulf

The RadioFreeHPC crew are listening to this thread I think! A very relevant
podcast

https://insidehpc.com/2019/07/podcast-is-cloud-too-expensive-for-hpc/

Re Capital One, here is an article from the Register. I think this is going
off topic.
https://www.theregister.co.uk/2019/07/30/capital_one_hacked/




On Thu, 1 Aug 2019 at 01:45, Gerald Henriksen  wrote:

> On Wed, 31 Jul 2019 04:10:12 +, you wrote:
>
> >They now have Lustre through FSx or what ever AWS have called it. I am
> not sure you guys have heard about the capital one data breach but at times
> im still rather weary of the cloud.
>
> Not sure what the Capital One data breach has to do with the cloud, it
> was (yet again?) misconfigured software that allowed the theft.
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Lustre on google cloud

2019-07-26 Thread John Hearns via Beowulf

) Terabyte scale data movement into or out of the cloud is not scary in
2019. You can move data into and out of the cloud at basically the line
rate of your internet connection as long as you take a little care in
selecting and tuning your firewalls and inline security devices.  Pushing
1TB/day etc.  into the cloud these days is no big deal and that level of
volume is now normal for a ton of different markets and industries.

Amazon will of course also send you a semi trailer full of hard drives to
import your data...  The web page says "Contact Sales for pricing"

On Fri, 26 Jul 2019 at 12:26, Chris Dagdigian  wrote:

>
> Coming back late to this thread as yesterday was a travel/transit day ...
> some additional thoughts
>
> 1) I also avoid the word "cloud bursting" these days because it's been
> tarred by marketing smog and does not mean much. The blunt truth is that
> from a technical perspective having a hybrid premise/cloud HPC is very
> simple. The hard part is data -- either moving volumes back and forth or
> trying to maintain a consistent shared file system at WAN-scale networking
> distances.
>
> The only successful life science hybrid HPC environments I've really seen
> repeatedly are the ones that are chemistry or modeling focused because
> generally the chemistry folks have very small volumes of data to move but
> very large CPU requirements and occasional GPU needs. Since the data
> movement requirements are small for chemistry it's pretty easy to make them
> happy on-prem, on the cloud or on a hybrid design
>
> Not to say full on cloud bursting HPC systems don't exist at all of course
> but they are rare. I was talking with a pharma yesterday that uses HTcondor
> to span on-premise HPC with on demand AWS nodes. I just don't see that as
> often as I see distinct HPCs.
>
> My observed experience in this realm is that for life science we don't do
> a lot of WAN-spanning grids because we get killed by the gravitational pull
> of our data. We build HPC where the data resides and we keep them
> relatively simple in scope and we attempt to limit WAN scale data movement.
> For most this means that having onsite HPC and cloud HPC and we simply
> direct the workload to whichever HPC resource is closest to the data.
>
> So for Jörg -- based on what you have said I'd take a look at your
> userbase, your application mix and how your filesystem is organized. You
> may be able to set things up so that you can "burst" to the cloud for just
> a special subset of your apps, user groups or data sets. That could be your
> chemists or maybe you have a group of people who regularly compute heavily
> against a data set or set of references that rarely change -- in that case
> you may be able to replicate that part of your GPFS over to a cloud and
> send just that workload remotely, thus freeing up capacity on your local
> HPC for other work.
>
>
>
>
> 2) Terabyte scale data movement into or out of the cloud is not scary in
> 2019. You can move data into and out of the cloud at basically the line
> rate of your internet connection as long as you take a little care in
> selecting and tuning your firewalls and inline security devices.  Pushing
> 1TB/day etc.  into the cloud these days is no big deal and that level of
> volume is now normal for a ton of different markets and industries.   It's
> basically a cost and budget exercises these days and not a particularly
> hard IT or technology problem.
>
> There are two killer problems with cloud storage even though it gets
> cheaper all the time
>
> 2a) Cloud egress fees.  You get charged real money for data traffic
> leaving your cloud. In many environments these fees are so tiny as to be
> unnoticeable noise in the monthly bill. But if you are regularly moving
> terabyte or petabyte scale data into and out of a cloud provider then you
> will notice the egress fees on your bill and they will be large enough that
> you have to plan for them and optimize for cost
>
> 2b) The monthly recurring cost for cloud storage can be hard to bear at
> petascale unless you have solidly communicated all of the benefits /
> capabilities and can compare them honestly to a full transparent list of
> real world costs to do the same thing onsite.  The monthly s3 storage bill
> once you have a few petabytes in AWS is high enough that you start to catch
> yourself doing math every once in a while along the lines of "I could
> build a Lustre filesystem w/ 2x capacity for just 2-months worth of our
> cloud storage opex budget!"
>
>
>
>
>
>
> INKozin via Beowulf 
> July 26, 2019 at 4:23 AM
> I'm very much in favour of personal or team clusters as Chris has also
> mentioned. Then the contract between the user and the cloud is explicit.
> The data can be uploaded/ pre staged to S3 in advance (at no cost other
> than time) or copied directly as part of the cluster creation process. It
> makes no sense to replicate in the cloud your in-house infrastructure.
> However having a solid storage base

Re: [Beowulf] flatpack

2019-07-23 Thread John Hearns via Beowulf

Having just spouted on about snaps/flatpak I saw on the roadmap for AWS
Firecracker that snap support is to be included.
Sorry that I am conflating snap and flatpak.

On Tue, 23 Jul 2019 at 07:06, John Hearns  wrote:

> Having used Snaps on Ubuntu - which seems to be their preferred method of
> distributing some applications,
> I have a slightly different take on the containerisation angle and would
> de-emphaise that.
>
> My take is that snaps/flatpak attack the "my distro ships with gcc version
> 4.1 but I need gcc version 8.0"
> By that I mean that you replace the distro shipped gcc version at your
> peril - as far as I am concerned tiknering
> with the tested/approved gcc and glibc will end you in a world of hurt.
> (old war story - changing bash to an upgraded version left a big SuSE
> system unbootable for me).
>
> So with snaps/flatpak you should be able to give your users and developers
> up to date applications without fooling with
> the core system utilities. And this is a Good Thing (TM)
>
>
>
>
>
>
>
> On Tue, 23 Jul 2019 at 06:47, Chris Samuel  wrote:
>
>> On 22/7/19 10:40 pm, Jonathan Aquilina wrote:
>>
>> > So in a nut shell this is taking dockerization/ containerization and
>> > making it more for the every day Linux user instead of the HPC user?
>>
>> I don't think this goes as far as containers with isolation, as I think
>> that's not what they're trying to do. But it does seem they're thinking
>> along those lines.
>>
>> > It would be interesting to have a distro built around such a setup.
>>
>> I think this is targeting cross-distro applications.  With all the
>> duplication of libraries, etc, a distro using it would be quite bulky.
>>
>> Also may you have a similar security as containers have, whereby when a
>> vulnerability is found and patched in an application or library you end
>> up with lots of people out there still running the vulnerable version.
>>
>> This is why distros tend to discourage "vendoring" of libraries as that
>> tends to fossilise vulnerabilities into an application whereas if people
>> use the version provided in the distro the maintainers only need to fix
>> it in that one package and everyone who links against it benefits.
>>
>> All the best,
>> Chris
>> --
>>   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>> ___
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] flatpack

2019-07-23 Thread John Hearns via Beowulf

Having used Snaps on Ubuntu - which seems to be their preferred method of
distributing some applications,
I have a slightly different take on the containerisation angle and would
de-emphaise that.

My take is that snaps/flatpak attack the "my distro ships with gcc version
4.1 but I need gcc version 8.0"
By that I mean that you replace the distro shipped gcc version at your
peril - as far as I am concerned tiknering
with the tested/approved gcc and glibc will end you in a world of hurt.
(old war story - changing bash to an upgraded version left a big SuSE
system unbootable for me).

So with snaps/flatpak you should be able to give your users and developers
up to date applications without fooling with
the core system utilities. And this is a Good Thing (TM)

On Tue, 23 Jul 2019 at 06:47, Chris Samuel  wrote:

> On 22/7/19 10:40 pm, Jonathan Aquilina wrote:
>
> > So in a nut shell this is taking dockerization/ containerization and
> > making it more for the every day Linux user instead of the HPC user?
>
> I don't think this goes as far as containers with isolation, as I think
> that's not what they're trying to do. But it does seem they're thinking
> along those lines.
>
> > It would be interesting to have a distro built around such a setup.
>
> I think this is targeting cross-distro applications.  With all the
> duplication of libraries, etc, a distro using it would be quite bulky.
>
> Also may you have a similar security as containers have, whereby when a
> vulnerability is found and patched in an application or library you end
> up with lots of people out there still running the vulnerable version.
>
> This is why distros tend to discourage "vendoring" of libraries as that
> tends to fossilise vulnerabilities into an application whereas if people
> use the version provided in the distro the maintainers only need to fix
> it in that one package and everyone who links against it benefits.
>
> All the best,
> Chris
> --
>   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

[Beowulf] Differentiable Programming with Julia

2019-07-18 Thread John Hearns via Beowulf

Forgiveness is sought for my ongoing Julia fandom.
We have seen a lot of articles recently on industry websites such asabout
how machine learning workloads are being brought onto traditional HPC
platforms.

This paper on how Julia is bringing them together is I think significant
https://arxiv.org/pdf/1907.07587.pdf

(apology - I cannot cut and paste the abstract)

ps. Doug Eadline - if you would like a blog post about this paper I could
try. But my head will hurt trying to understand it.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] help for metadata-intensive jobs (imagenet)

2019-06-29 Thread John Hearns via Beowulf

Igor, if there are any papers published on what you are doing with these
images I would be very interested.
I went to the new London HPC and AI Meetup on Thursday, one talk was by
Odin Vision which was excellent.
Recommend the new Meetup to anyone in the area. Next meeting 21st August.

And a plug to Verne Global - they provided free Icelandic beer.

On Sat, 29 Jun 2019 at 05:43, INKozin via Beowulf 
wrote:

> Converting the files to TF records or similar would be one obvious
> approach if you are concerned about meta data. But then I d understand why
> some people would not want that (size, augmentation process). I assume you
> are are doing the training in a distributed fashion using MPI via Horovod
> or similar and it might be tempting to do file partitioning across the
> nodes. However doing so introduces a bias into minibatches (and custom
> preprocessing). If you partition carefully by mapping classes to nodes it
> may work but I also understand why some wouldn't be totally happy with
> that. Ive trained keras/TF/horovod models on imagenet using up to 6 nodes
> each with four p100/v100 and it worked reasonably well. As the training
> still took a few days copying to local NVMe disks was a good option.
> Hth
>
> On Fri, 28 Jun 2019, 18:47 Mark Hahn,  wrote:
>
>> Hi all,
>> I wonder if anyone has comments on ways to avoid metadata bottlenecks
>> for certain kinds of small-io-intensive jobs.  For instance, ML on
>> imagenet,
>> which seems to be a massive collection of trivial-sized files.
>>
>> A good answer is "beef up your MD server, since it helps everyone".
>> That's a bit naive, though (no money-trees here.)
>>
>> How about things like putting the dataset into squashfs or some other
>> image that can be loop-mounted on demand?  sqlite?  perhaps even a format
>> that can simply be mmaped as a whole?
>>
>> personally, I tend to dislike the approach of having a job stage tons of
>> stuff onto node storage (when it exists) simply because that guarantees a
>> waste of cpu/gpu/memory resources for however long the stagein takes...
>>
>> thanks, mark hahn.
>> --
>> operator may differ from spokesperson.  h...@mcmaster.ca
>> ___
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Rsync - checksums

2019-06-17 Thread John Hearns via Beowulf

Probably best asking this question over on the GPFS mailing list.

A bit of Googling reminded me of   https://www.arcastream.com/ They are
active in the UK Academic community,
not sure about your neck of the woods.
Give them a shout though and ask for Steve Mackie.
http://arcastream.com/what-we-do/

On Mon, 17 Jun 2019 at 15:39, Michael Di Domenico 
wrote:

> rsync on 10PB sounds painful.  i haven't used GPFS in a very long
> time, so i might have a gap in knowledge.  but i would be surprised if
> GPFS doesn't have a changelog, where you can watch the files that
> changed through the day and only copy the ones that did?  much like
> what robinhood does for lustre.
>
> On Mon, Jun 17, 2019 at 9:44 AM Bill Wichser  wrote:
> >
> > We have moved to a rsync disk backup system, from TSM tape, in order to
> > have a DR for our 10 PB GPFS filesystem.  We looked at a lot of options
> > but here we are.
> >
> > md5 checksums take a lot of compute time with huge files and even with
> > millions of smaller ones.  The bulk of the time for running rsync is
> > spent in computing the source and destination checksums and we'd like to
> > alleviate that pain of a cryptographic algorithm.
> >
> > Googling around, I found no mention of using a technique like this to
> > improve rsync performance.  I did find reference to a few hashing
> > algorithms though which could certainly work here (xxhash, murmurhash,
> > sbox, cityhash64).
> >
> > Rsync has certainly been around for a few years!  We are going to pursue
> > changing the current checksum algorithm and using something much faster.
> >   If anyone has done this already and would like to share their
> > experiences that would be wonderful. Ideally this could be some optional
> > plugin for rsync where users could choose which checksummer to use.
> >
> > Bill
> > ___
> > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] A careful exploit?

2019-06-13 Thread John Hearns via Beowulf

Regarding serial ports - if you have IPMI then of course you have a virtual
serial port.

I learned something new about serial ports and IPMI Serial Over LAN
recently..
First of all you have to use the kernel config option console=ttyy0
console=ttyS1,115200
This is well known.

In the bad old days you had to edit the /etc/inittab and arrange a spawn of
a getty process on /dev/ttyS1
One gotcha - /dev/ttyS1 usualy corresponds to serial port 0 in the BIOS. (I
may have that work=g but there is a mismatch)

These days there is a service which is managed under system and will
automatically detect and configure the serial terminal.
You still need the kernel console= option of course.
system enable serial-getty

The youth of today etc. Not having to solder up RS232 plugs and find the
baud rate by listening to a modem...

Blog here on serial consoles in system
http://0pointer.de/blog/projects/serial-console.html
I am genuinely impressed at how good this is - it worked first time on
Debian system

On Fri, 14 Jun 2019 at 02:39, Robert G. Brown  wrote:

> On Thu, 13 Jun 2019, Jonathan Engwall wrote:
>
> > I did not see Robert Brown's reply until J?rg Sa?manshausen quote the
> entire
> > thing.It's all gone now. I got rid of the netgear router too. My policies
> > are Drop and my rsa keys are specific. It works perfectly when nobody
> slips
> > in.
> > "Hang a console on it" this sound fscinationg. Consoles give me trouble.
> > I will read up on it.
>
> In the old days -- and I'm a relic of the old days;-) -- one did
> EVERYTHING from a console.  Even now I do most of my systems work from a
> tty interface, basically an xterm.
>
> It's a lot harder to manage now, since serial ports are dead... if the
> particular problem you're trying to debug is the network...
>
> rgb
>
> > Thank you.
> > Jonathan Engwall.
> >
> >
>
> Robert G. Brownhttp://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525 email:r...@phy.duke.edu
>
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Frontier Announcement

2019-05-09 Thread John Hearns via Beowulf

Gerald that is an excellent history.
One small thing though: "Of course the ML came along"
What came first - the chicken or the egg? Perhaps the Nvidia ecosystem made
the ML revolution possible.
You could run ML models on a cheap workstation or a laptop with an Nvidia
GPU.
Indeed I am sitting next to my Nvidia Jetson Nano - 90 dollars for a GPU
which can do deep learning.
Prior to CUDA etc. you could of course do machine learning, but it was
being done in universities.
I stand to be corrected.





On Thu, 9 May 2019 at 17:40, Gerald Henriksen  wrote:

> On Wed, 8 May 2019 14:13:51 -0400, you wrote:
>
> >On Wed, May 8, 2019 at 1:47 PM Jörg Saßmannshausen <
> >sassy-w...@sassy.formativ.net> wrote:
> >>
> >Once upon a time portability, interoperabiilty, standardization, were
> >considered good software and hardware attributes.
> >Whatever happened to them?
>
> I suspect in a lot of cases they were more ideals and goals than
> actual things.
>
> Just look at the struggles the various BSDs have in getting a lot of
> software running given the inherent Linuxisms that seem to happen.
>
> In the case of what is relevant to this discussion, CUDA, Nvidia saw
> an opportunity (and perhaps also reacted to the threat of not having
> their own CPU to counter the integrated GPU market) and invested
> heavily into making their GPUs more than simply a 3D graphics device.
>
> As Nvidia built up the libraries and other software to make life
> easier for programmers to get the most out of Nvidia hardware AMD and
> Intel ignored the threat until it was too late, and partial attempts
> at open standards struggled.
>
> And programmers, given struggling with OpenCL or other options vs
> going with CUDA with its tools and libraries, went for what gave them
> the best performance and easiest implementation (aka a win/win).
>
> Of course then ML came along and suddenly AMD and Intel couldn't
> ignore the market anymore, but they are both struggling from a distant
> 2nd place to try and replicate the CUDA ecosystem...
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Frontier Announcement

2019-05-09 Thread John Hearns via Beowulf

Seriously? Wha.. what? Someone needs to get help.
And it wasn't me. I am a member of the People's Front of Julia.

(contrived Python reference intentional)

On Wed, 8 May 2019 at 22:57, Jeffrey Layton  wrote:

> I wrote some OpenACC articles for HPC Admin Magazine. A number of
> pro-OpenMP people attacked me on twitter (you know, OpenACC sucks, OpenMP
> is great). I received a private email threatening to kill me and my family
> if I didn't stop writing about OpenACC. Given your pro-OpenMP, anti-OpenACC
> stance, using the same tone as the threatening email, I wondered if that
> email came from you.
>
>
>
> On Wed, May 8, 2019, 17:48 Richard Walsh  wrote:
>
>>
>> Huh ... ??  Weird, scary ...
>>
>> Just MHO. Dropping off this thread now ...
>>
>> rbw
>>
>> Sent from my iPhone
>>
>> On May 8, 2019, at 4:29 PM, Jeffrey Layton  wrote:
>>
>> I was just pointing out that gcc has Open ACC capability on AMD GPUs.
>>
>> I didn't realize you part of the OpenMP Nazis. Were you the one that
>> threatened me and my family because I wrote about OpenACC?
>>
>>
>>
>> On Wed, May 8, 2019, 15:48 Richard Walsh  wrote:
>>
>>>
>>> Jeffry/All,
>>>
>>> Yes ... but given the choice of using OpenACC or OpenMP (if you are not
>>> going to write CUDA-HIP code for that extra 10% of performance) which
>>> captures most (all?) of the features of OpenACC, is a standard likely to
>>> outlive OpenACC, and should run on any vendor’s  accelerators, including
>>> whatever Intel comes up with ... why would you write in OpenACC ... ??
>>>
>>> GNU supports OpenMP too ... in my view, PVM is to MPI as OpenACC is to
>>> OpenMP 4.5-5.0 ...
>>>
>>> Cheers!
>>>
>>> rbw
>>>
>>> Sent from my iPhone
>>>
>>> On May 8, 2019, at 2:36 PM, Jeffrey Layton  wrote:
>>>
>>> Don't forget that gcc supports both NV and AMD GPUs with OpenACC. That's
>>> one of the lead compilers listed on the Frontier specs.
>>>
>>> Jeff
>>>
>>>
>>> On Wed, May 8, 2019 at 3:29 PM Richard Walsh  wrote:
>>>

 All,

 Cray has deprecated support for in OpenACC in light of the OpenMP 4.5
 and 5.0 standards, and their target and data directives. NVIDIA’s PGI
 Compiler group will keep OpenACC going for a while, but on AMD devices ...
 maybe not.  That Cray will support only OpenMP on Frontier seems to be a
 logical certainty.

 So if you or yours want to run at speed on Frontier you should bone up
 on ROCm, HIP and OpenMP 4.5-5.0 ...

 :-)

 Cheers!

 rbw

 Sent from my iPhone

 > On May 8, 2019, at 12:47 PM, Jörg Saßmannshausen <
 sassy-w...@sassy.formativ.net> wrote:
 >
 > Dear all,
 >
 > I think the answer to the question lies here:
 >
 > https://en.wikipedia.org/wiki/OpenACC
 >
 > As I follow these things rather loosely, my understanding was that
 OpenACC
 > should run on both nVidia and other GPUs. So maybe that is the reason
 why it
 > is a 'pure' AMD cluster where both GPUs and CPUs are from the same
 supplier?
 > IF all of that is working out and if it is really true that you can
 compile
 > and run OpenACC code on both types of GPUs, it would a be big win for
 AMD.
 >
 > Time will tell!
 >
 > All the best from my TARDIS!
 >
 > Jörg
 >
 > Am Dienstag, 7. Mai 2019, 16:59:48 BST schrieben Sie:
 >>>  I think it is interesting that they are using AMD for
 >>>
 >>> both the CPUs and GPUs
 >>
 >> I agree. That means a LOT of codes will have to be ported from CUDA
 to
 >> whatever AMD uses. I know AMD announced their HIP interface to
 convert
 >> CUDA code into something that will run on AMD processors, but I don't
 >> know how well that works in theory. Frankly, I haven't heard anything
 >> about it since it was announced at SC a few years ago.
 >>
 >> I would not be surprised if AMD pursued this bid quite agressively,
 >> possibly at a significant loss, for the opportunity to prove their
 GPUs
 >> can compete with NVIDIA and demonstrate that codes can be
 successfully
 >> converted from CUDA to something AMD GPUs can use to demonstrate GPU
 >> users don't need to be locked in to a single vendor. If so, this
 could
 >> be a costly gamble for the DOE and AMD, but if it pays off, I
 imagine it
 >> could change AMD's fortunes in HPC.
 >>
 >>  "Win on Sunday, sell on Monday" doesn't apply just to cars.
 >>
 >> Prentice
 >>
 >>> On 5/7/19 4:43 PM, Jörg Saßmannshausen wrote:
 >>> Hi Prentice,
 >>>
 >>> that looks interesting and I hope it means I will finally get the
 neutron
 >>> structure which was measured last year there! :-)
 >>>
 >>> On a more serious note: I think it is interesting that they are
 using AMD
 >>> for both the CPUs and GPUs. It sounds at least very fast of what
 they
 >>> want to build, lets hope their design will work as planned

Re: [Beowulf] Frontier Announcement

2019-05-08 Thread John Hearns via Beowulf

I disagree. IT is a cyclical industry.
Back in the bad old days codes were written to run on IBM mainframes. Which
used the ECDIC character set.
There were Little Endian and Big Endian machines.
VAX machines had a rich set of file IO patterns. I really dont think you
could read data written on an IBM by using a VAX machine.

On Wed, 8 May 2019 at 19:43, Michael Di Domenico 
wrote:

> On Wed, May 8, 2019 at 2:14 PM Gus Correa  wrote:
> >
> > Once upon a time portability, interoperabiilty, standardization, were
> considered good software and hardware attributes.
> > Whatever happened to them?
>
> millennials?
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

2019-05-02 Thread John Hearns via Beowulf

Warming to my subject now. I really dont want to be specific about any
vendor, or cluster management package.
As I say I have had experience ranging from national contracts, currently
at a company with tens of thousands of cpus worldwide,
down to installing half rack HPC clusters for customers, and informally
supporting half rack sized clusters where the users did not have formal
support.

When systems are bought the shiny bit is the hardware - much is made of the
latest generation CPUs, GPUS etc.
Buyers try to get as much hardare as they can for the price - usually
ending up as CPU core count or HPL performance.
They will swallow support contracts as they dont want to have a big failure
and have their management (Academic or industrial)
asking what the heck just happened and why the heck you are running without
support.
The hardware support is provided by the vendors, and their regional
distributors.
So from the point of view of a systems vendor hardware support is the
responsibility of the distributor or hardware vendor.

What DOES get squeezed is the HPC software stack support and the
applications level support.
After all - how hard can it be?
The sales guys told me that Intel now has 256 core processors with built in
AI which will run any software faster
then you can type 'run'.
The new guy with the beard has a laptop which uses this Ubuntu operating
system - and its all free.
Why do we need to pay $$$ for this cluster OS?

On Thu, 2 May 2019 at 17:18, John Hearns  wrote:

> Chris, I have to say this. I have worked for smaller companies, and have
> worked for cluster integrators.
> For big University sized and national labs the procurement exercise will
> end up with a well defined support arrangement.
>
> I have seen, in once company I worked at, an HPC system arrive which I was
> not responsible for.
> This system was purchased by the IT department, and was intended to run
> Finite Element software.
> The hardware came from a Tier 1 vendor, but it was integrated by a small
> systems integrator.
> Yes, they installed a software stack and demonstrated that it would run
> Abaqus.
> But beyond that there was no support for getting other applications
> running. And no training that I could see in diagnosing faults.
>
> I am not going to name names, but I suspect experiences like that are
> common.
> Companies want to procure kit for as little as possible. Tier 1 vendors
> and white box vendors want to make the sales.
> But no-one wants to pay for Bright Cluster Manager, for example.
> So the end user gets at best a freeware solution like Rocks, or at worst
> some Kickstarted setup which installs an OS,
> the CentOS supplied IB drivers and MPI, and Gridengine slapped on top of
> that.
>
> This leads to an unsatisfying experience on the part of the end users, and
> also for the engineers of the integrating company.
>
> Which leads me to say that we see the rise of HPC in the cloud services-
> AWS,  OnScale, Rescale, Verne Global etc. etc.
> And no wonder - you should be getting a much more polished and ready to go
> infrastructure, even though you cant physically touch it.
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, 2 May 2019 at 17:08, Christopher Samuel  wrote:
>
>> On 5/2/19 8:40 AM, Faraz Hussain wrote:
>>
>> > So should I be paying Mellanox to help? Or is it a RedHat issue? Or is
>> > it our harware vendor, HP who should be involved??
>>
>> I suspect that would be set out in the contract for the HP system.
>>
>> The clusters I've been involved in purchasing in the past have always
>> required support requests to go via the immediate vendor and they then
>> arrange to put you in contact with others where required.
>>
>> All the best,
>> Chris
>> --
>>Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>> ___
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

2019-05-02 Thread John Hearns via Beowulf

Chris, I have to say this. I have worked for smaller companies, and have
worked for cluster integrators.
For big University sized and national labs the procurement exercise will
end up with a well defined support arrangement.

I have seen, in once company I worked at, an HPC system arrive which I was
not responsible for.
This system was purchased by the IT department, and was intended to run
Finite Element software.
The hardware came from a Tier 1 vendor, but it was integrated by a small
systems integrator.
Yes, they installed a software stack and demonstrated that it would run
Abaqus.
But beyond that there was no support for getting other applications
running. And no training that I could see in diagnosing faults.

I am not going to name names, but I suspect experiences like that are
common.
Companies want to procure kit for as little as possible. Tier 1 vendors and
white box vendors want to make the sales.
But no-one wants to pay for Bright Cluster Manager, for example.
So the end user gets at best a freeware solution like Rocks, or at worst
some Kickstarted setup which installs an OS,
the CentOS supplied IB drivers and MPI, and Gridengine slapped on top of
that.

This leads to an unsatisfying experience on the part of the end users, and
also for the engineers of the integrating company.

Which leads me to say that we see the rise of HPC in the cloud services-
AWS,  OnScale, Rescale, Verne Global etc. etc.
And no wonder - you should be getting a much more polished and ready to go
infrastructure, even though you cant physically touch it.

On Thu, 2 May 2019 at 17:08, Christopher Samuel  wrote:

> On 5/2/19 8:40 AM, Faraz Hussain wrote:
>
> > So should I be paying Mellanox to help? Or is it a RedHat issue? Or is
> > it our harware vendor, HP who should be involved??
>
> I suspect that would be set out in the contract for the HP system.
>
> The clusters I've been involved in purchasing in the past have always
> required support requests to go via the immediate vendor and they then
> arrange to put you in contact with others where required.
>
> All the best,
> Chris
> --
>Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

2019-05-02 Thread John Hearns via Beowulf

Pleas tell us the history of the overall system.
Was it bought as hardware only from a supplier? Or was it delivered as an
already set up system with operating system, applications, Infiniband
drivers etc?

I would also look at Qlustar
https://www.qlustar.com/book/qlustar/summary

and Bright https://www.brightcomputing.com/

Bright will certainly give you excellent support.





On Thu, 2 May 2019 at 17:02, John Hearns  wrote:

> You ask some damned good questions there.
> I will try to answer them from the point of view of someone who has worked
> as an HPC systems integrator and supported HPC systems,
> both for systems integrators and within companies.
>
> We will start with HP. Did you buy those systems direct from HP as
> servers, or did you buy a configured HPC system,
> complete with Infiniband networking and with a software stack?
> If you bought bare metal servers then you are out of luck regarding
> support, other than hardware failures.
> HP now incorporate SGI, and their support is fantastic. Great people work
> for HP and SGI. But they aren't responsible for your install.
>
> If however you bought an integrated HPC system this will normally be
> integrated by a smaller company, usually in your country.
> Is this the case here?  Then yes the integrator should be providing
> support.
> HOWEVER you have elected to remove their installed OS and upgrade by
> yourself. If I was the integrator I would give advice,
> but refuse to support the upgrade unless it was recommended by us, and you
> have a continuing support contract.
>
> You are using CentOS. The CentOS team are great guys - I know the founder
> quite well, and know people who work for RedHat.
> You have chosen CentOS - Community Supported Operating System. Join the
> CentOS HPC SIG perhaps and ask for help.
> But you don't get support from RedHat - as you are not using Redhat
> Enterprise Linux.
>
> Now we come to Mellanox. Mellanox support is fantastic. Formally, to open
> a support ticket with them you will need a support agreement
> on your switch. You HAVE got a support agreement - right?
> If not I have found that informal requests for support are often answered
> by Mellanox support.
>
> Failing all of those you could hire me!
> (I am being semi-serious here - I am a permanent employee at the moment,
> but I have worked as an HPC contractor int he past,
> and if I could justify it I would prefer to do HPC support on a contract
> basis).
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, 2 May 2019 at 16:45, Faraz Hussain  wrote:
>
>> Thanks. Before I go down the path of installing things willy-nilly, is
>> there some guide I should be following instead? I obviously have a
>> problem with my mellanox drivers combined with "user error"..
>>
>> So should I be paying Mellanox to help? Or is it a RedHat issue? Or is
>> it our harware vendor, HP who should be involved??
>>
>> Looks like I need support on how to get support :-)
>>
>>
>> Quoting Christopher Samuel :
>>
>> >> root@lustwzb34:/root # systemctl status rdma
>> >> Unit rdma.service could not be found.
>> >
>> > You're missing this RPM then, which might explain a lot:
>> >
>> > $ rpm -qi rdma-core
>> > Name: rdma-core
>> > Version : 17.2
>> > Release : 3.el7
>> > Architecture: x86_64
>> > Install Date: Tue 04 Dec 2018 03:58:16 PM AEDT
>> > Group   : Unspecified
>> > Size: 107924
>> > License : GPLv2 or BSD
>> > Signature   : RSA/SHA256, Tue 13 Nov 2018 01:45:22 AM AEDT, Key ID
>> > 24c6a8a7f4a80eb5
>> > Source RPM  : rdma-core-17.2-3.el7.src.rpm
>> > Build Date  : Wed 31 Oct 2018 07:10:24 AM AEDT
>> > Build Host  : x86-01.bsys.centos.org
>> > Relocations : (not relocatable)
>> > Packager: CentOS BuildSystem <http://bugs.centos.org>
>> > Vendor  : CentOS
>> > URL : https://github.com/linux-rdma/rdma-core
>> > Summary : RDMA core userspace libraries and daemons
>> > Description :
>> > RDMA core userspace infrastructure and documentation, including
>> initscripts,
>> > kernel driver-specific modprobe override configs, IPoIB network scripts,
>> > dracut rules, and the rdma-ndd utility.
>> >
>> > --
>> >   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>> > ___
>> > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
>> Com

Re: [Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

2019-05-02 Thread John Hearns via Beowulf

You ask some damned good questions there.
I will try to answer them from the point of view of someone who has worked
as an HPC systems integrator and supported HPC systems,
both for systems integrators and within companies.

We will start with HP. Did you buy those systems direct from HP as servers,
or did you buy a configured HPC system,
complete with Infiniband networking and with a software stack?
If you bought bare metal servers then you are out of luck regarding
support, other than hardware failures.
HP now incorporate SGI, and their support is fantastic. Great people work
for HP and SGI. But they aren't responsible for your install.

If however you bought an integrated HPC system this will normally be
integrated by a smaller company, usually in your country.
Is this the case here?  Then yes the integrator should be providing support.
HOWEVER you have elected to remove their installed OS and upgrade by
yourself. If I was the integrator I would give advice,
but refuse to support the upgrade unless it was recommended by us, and you
have a continuing support contract.

You are using CentOS. The CentOS team are great guys - I know the founder
quite well, and know people who work for RedHat.
You have chosen CentOS - Community Supported Operating System. Join the
CentOS HPC SIG perhaps and ask for help.
But you don't get support from RedHat - as you are not using Redhat
Enterprise Linux.

Now we come to Mellanox. Mellanox support is fantastic. Formally, to open a
support ticket with them you will need a support agreement
on your switch. You HAVE got a support agreement - right?
If not I have found that informal requests for support are often answered
by Mellanox support.

Failing all of those you could hire me!
(I am being semi-serious here - I am a permanent employee at the moment,
but I have worked as an HPC contractor int he past,
and if I could justify it I would prefer to do HPC support on a contract
basis).

On Thu, 2 May 2019 at 16:45, Faraz Hussain  wrote:

> Thanks. Before I go down the path of installing things willy-nilly, is
> there some guide I should be following instead? I obviously have a
> problem with my mellanox drivers combined with "user error"..
>
> So should I be paying Mellanox to help? Or is it a RedHat issue? Or is
> it our harware vendor, HP who should be involved??
>
> Looks like I need support on how to get support :-)
>
>
> Quoting Christopher Samuel :
>
> >> root@lustwzb34:/root # systemctl status rdma
> >> Unit rdma.service could not be found.
> >
> > You're missing this RPM then, which might explain a lot:
> >
> > $ rpm -qi rdma-core
> > Name: rdma-core
> > Version : 17.2
> > Release : 3.el7
> > Architecture: x86_64
> > Install Date: Tue 04 Dec 2018 03:58:16 PM AEDT
> > Group   : Unspecified
> > Size: 107924
> > License : GPLv2 or BSD
> > Signature   : RSA/SHA256, Tue 13 Nov 2018 01:45:22 AM AEDT, Key ID
> > 24c6a8a7f4a80eb5
> > Source RPM  : rdma-core-17.2-3.el7.src.rpm
> > Build Date  : Wed 31 Oct 2018 07:10:24 AM AEDT
> > Build Host  : x86-01.bsys.centos.org
> > Relocations : (not relocatable)
> > Packager: CentOS BuildSystem 
> > Vendor  : CentOS
> > URL : https://github.com/linux-rdma/rdma-core
> > Summary : RDMA core userspace libraries and daemons
> > Description :
> > RDMA core userspace infrastructure and documentation, including
> initscripts,
> > kernel driver-specific modprobe override configs, IPoIB network scripts,
> > dracut rules, and the rdma-ndd utility.
> >
> > --
> >   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
> > ___
> > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
>
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

2019-05-01 Thread John Hearns via Beowulf

On the RHEL 6.9 servers run   ibstatus   also
And sminfo

On Wed, 1 May 2019 at 16:23, John Hearns  wrote:

>  link_layer:  Ethernet
>
> E….
>
> On Wed, 1 May 2019 at 16:18, Faraz Hussain  wrote:
>
>>
>> Quoting John Hearns :
>>
>> > What does   ibstatus   give you
>>
>> [hussaif1@lustwzb33 ~]$ ibstatus
>> Infiniband device 'mlx4_0' port 1 status:
>>  default gid: fe80::::4a0f:cfff:fef5:b650
>>  base lid:0x0
>>  sm lid:  0x0
>>  state:   4: ACTIVE
>>  phys state:  5: LinkUp
>>  rate:40 Gb/sec (4X QDR)
>>  link_layer:  Ethernet
>>
>> Infiniband device 'mlx4_0' port 2 status:
>>  default gid: fe80::::4a0f:cfff:fef5:b658
>>  base lid:0x0
>>  sm lid:  0x0
>>  state:   1: DOWN
>>  phys state:  3: Disabled
>>  rate:40 Gb/sec (4X QDR)
>>  link_layer:  Ethernet
>>
>> [hussaif1@lustwzb33 ~]$
>>
>>
>>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

2019-05-01 Thread John Hearns via Beowulf

 link_layer:  Ethernet

E….

On Wed, 1 May 2019 at 16:18, Faraz Hussain  wrote:

>
> Quoting John Hearns :
>
> > What does   ibstatus   give you
>
> [hussaif1@lustwzb33 ~]$ ibstatus
> Infiniband device 'mlx4_0' port 1 status:
>  default gid: fe80::::4a0f:cfff:fef5:b650
>  base lid:0x0
>  sm lid:  0x0
>  state:   4: ACTIVE
>  phys state:  5: LinkUp
>  rate:40 Gb/sec (4X QDR)
>  link_layer:  Ethernet
>
> Infiniband device 'mlx4_0' port 2 status:
>  default gid: fe80::::4a0f:cfff:fef5:b658
>  base lid:0x0
>  sm lid:  0x0
>  state:   1: DOWN
>  phys state:  3: Disabled
>  rate:40 Gb/sec (4X QDR)
>  link_layer:  Ethernet
>
> [hussaif1@lustwzb33 ~]$
>
>
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

2019-05-01 Thread John Hearns via Beowulf

I think I he wrong track regarding the subnet manager, sorry.
What does   ibstatus   give you

On Wed, 1 May 2019 at 15:31, John Hearns  wrote:

> E..   you are not running a subnet manager?
> DO you have an Infiniband switch or are you connecting two servers
> back-to-back?
>
> Also - have you considered using OpenHPC rather tyhan installing CentOS on
> two servers?
> When you expand this manual installation is going to be painful.
>
> On Wed, 1 May 2019 at 15:05, Faraz Hussain  wrote:
>
>> > What hardware and what Infiniband switch you have
>> > Run   these commands:  ibdiagnet   smshow
>>
>> Unfortunately ibdiagnet seems to give some errors:
>>
>> [hussaif1@lustwzb34 ~]$ ibdiagnet
>> --
>> Load Plugins from:
>> /usr/share/ibdiagnet2.1.1/plugins/
>> (You can specify more paths to be looked in with
>> "IBDIAGNET_PLUGINS_PATH" env variable)
>>
>> Plugin Name   Result Comment
>> libibdiagnet_cable_diag_plugin-2.1.1  Succeeded  Plugin loaded
>> libibdiagnet_phy_diag_plugin-2.1.1Succeeded  Plugin loaded
>>
>> -
>> Discovery
>> -E- Failed to initialize
>>
>> -E- Fabric Discover failed, err=IBDiag initialize wasn't done
>> -E- Fabric Discover failed, MAD err=Failed to umad_open_port
>>
>> -
>> Summary
>> -I- Stage Warnings   Errors Comment
>> -I- Discovery   NA
>> -I- Lids Check  NA
>> -I- Links Check NA
>> -I- Subnet Manager  NA
>> -I- Port Counters   NA
>> -I- Nodes Information   NA
>> -I- Speed / Width checksNA
>> -I- Partition Keys  NA
>> -I- Alias GUIDs NA
>> -I- Temperature Sensing NA
>>
>> -I- You can find detailed errors/warnings in:
>> /var/tmp/ibdiagnet2/ibdiagnet2.log
>>
>> -E- A fatal error occurred, exiting...
>>
>>
>> I do not have smshow command , but I see there is an sminfo. It also
>> give this error:
>>
>> [hussaif1@lustwzb34 ~]$ smshow
>> bash: smshow: command not found...
>> [hussaif1@lustwzb34 ~]$ sm
>> smartctl smbcacls smbcquotas   smbspool smbtree
>> sm-notifysmpdump  smtp-sink
>> smartd   smbclientsmbget   smbtar   sminfo
>> smparquery   smpquery smtp-source
>> [hussaif1@lustwzb34 ~]$ sminfo
>> ibwarn: [10407] mad_rpc_open_port: can't open UMAD port ((null):0)
>> sminfo: iberror: failed: Failed to open '(null)' port '0'
>>
>>
>>
>> > You originally had the OpenMPI which was provided by CentOS  ??
>>
>> Correct.
>>
>> > You compiled the OpenMPI from source??
>>
>> Yes, I then compiled it from source and it seems to work ( at least
>> give reasonable numbers when running latency and bandwith tests )..
>>
>> > How are you bringing the new OpenMPI version itno your PATH ?? Are you
>> > using modules or an mpi switcher utilioty?
>>
>> Just as follows:
>>
>> export PATH=/Apps/users/hussaif1/openmpi-4.0.0/bin:$PATH
>>
>> Thanks!
>>
>> >
>> > On Wed, 1 May 2019 at 09:39, Benson Muite 
>> > wrote:
>> >
>> >> Hi Faraz,
>> >>
>> >> Have you tried any other MPI distributions (eg. MPICH, MVAPICH)?
>> >>
>> >> Regards,
>> >>
>> >> Benson
>> >> On 4/30/19 11:20 PM, Gus Correa wrote:
>> >>
>> >> It may be using IPoIB (TCP/IP over IB), not verbs/rdma.
>> >> You can force it to use openib (verbs, rdma) with (vader is for in-node
>> >> shared memory):
>> >>
>> >> mpirun --mca btl openib,self,vader ...
>> >>
>> >>
>> >> These flags may also help tell which btl (byte transport layer) is
>> >> being used:
>> >>
>> >>  --mca btl_base_verbose 30
>> >>
>> >> See these
>> >> FAQ:
>> https://www.open-mpi.org/faq/?category=openfabrics#ib-btlhttps://www.open-mpi.org/faq/?category=all#tcp-routability-1.3
>> >>
>> >> Better really ask more details in the Open MPI list. They are the pros!
>> >>
>> >> My two

Re: [Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

2019-05-01 Thread John Hearns via Beowulf

E..   you are not running a subnet manager?
DO you have an Infiniband switch or are you connecting two servers
back-to-back?

Also - have you considered using OpenHPC rather tyhan installing CentOS on
two servers?
When you expand this manual installation is going to be painful.

On Wed, 1 May 2019 at 15:05, Faraz Hussain  wrote:

> > What hardware and what Infiniband switch you have
> > Run   these commands:  ibdiagnet   smshow
>
> Unfortunately ibdiagnet seems to give some errors:
>
> [hussaif1@lustwzb34 ~]$ ibdiagnet
> --
> Load Plugins from:
> /usr/share/ibdiagnet2.1.1/plugins/
> (You can specify more paths to be looked in with
> "IBDIAGNET_PLUGINS_PATH" env variable)
>
> Plugin Name   Result Comment
> libibdiagnet_cable_diag_plugin-2.1.1  Succeeded  Plugin loaded
> libibdiagnet_phy_diag_plugin-2.1.1Succeeded  Plugin loaded
>
> -
> Discovery
> -E- Failed to initialize
>
> -E- Fabric Discover failed, err=IBDiag initialize wasn't done
> -E- Fabric Discover failed, MAD err=Failed to umad_open_port
>
> -
> Summary
> -I- Stage Warnings   Errors Comment
> -I- Discovery   NA
> -I- Lids Check  NA
> -I- Links Check NA
> -I- Subnet Manager  NA
> -I- Port Counters   NA
> -I- Nodes Information   NA
> -I- Speed / Width checksNA
> -I- Partition Keys  NA
> -I- Alias GUIDs NA
> -I- Temperature Sensing NA
>
> -I- You can find detailed errors/warnings in:
> /var/tmp/ibdiagnet2/ibdiagnet2.log
>
> -E- A fatal error occurred, exiting...
>
>
> I do not have smshow command , but I see there is an sminfo. It also
> give this error:
>
> [hussaif1@lustwzb34 ~]$ smshow
> bash: smshow: command not found...
> [hussaif1@lustwzb34 ~]$ sm
> smartctl smbcacls smbcquotas   smbspool smbtree
> sm-notifysmpdump  smtp-sink
> smartd   smbclientsmbget   smbtar   sminfo
> smparquery   smpquery smtp-source
> [hussaif1@lustwzb34 ~]$ sminfo
> ibwarn: [10407] mad_rpc_open_port: can't open UMAD port ((null):0)
> sminfo: iberror: failed: Failed to open '(null)' port '0'
>
>
>
> > You originally had the OpenMPI which was provided by CentOS  ??
>
> Correct.
>
> > You compiled the OpenMPI from source??
>
> Yes, I then compiled it from source and it seems to work ( at least
> give reasonable numbers when running latency and bandwith tests )..
>
> > How are you bringing the new OpenMPI version itno your PATH ?? Are you
> > using modules or an mpi switcher utilioty?
>
> Just as follows:
>
> export PATH=/Apps/users/hussaif1/openmpi-4.0.0/bin:$PATH
>
> Thanks!
>
> >
> > On Wed, 1 May 2019 at 09:39, Benson Muite 
> > wrote:
> >
> >> Hi Faraz,
> >>
> >> Have you tried any other MPI distributions (eg. MPICH, MVAPICH)?
> >>
> >> Regards,
> >>
> >> Benson
> >> On 4/30/19 11:20 PM, Gus Correa wrote:
> >>
> >> It may be using IPoIB (TCP/IP over IB), not verbs/rdma.
> >> You can force it to use openib (verbs, rdma) with (vader is for in-node
> >> shared memory):
> >>
> >> mpirun --mca btl openib,self,vader ...
> >>
> >>
> >> These flags may also help tell which btl (byte transport layer) is
> >> being used:
> >>
> >>  --mca btl_base_verbose 30
> >>
> >> See these
> >> FAQ:
> https://www.open-mpi.org/faq/?category=openfabrics#ib-btlhttps://www.open-mpi.org/faq/?category=all#tcp-routability-1.3
> >>
> >> Better really ask more details in the Open MPI list. They are the pros!
> >>
> >> My two cents,
> >> Gus Correa
> >>
> >>
> >>
> >> On Tue, Apr 30, 2019 at 3:57 PM Faraz Hussain 
> wrote:
> >>
> >>> Thanks, after buidling openmpi 4 from source, it now works! However it
> >>> still gives this message below when I run openmpi with verbose setting:
> >>>
> >>> No OpenFabrics connection schemes reported that they were able to be
> >>> used on a specific port.  As such, the openib BTL (OpenFabrics
> >>> support) will be disabled for this port.
> >>>
> >>>Local host:   lustwzb34
> >>>Local device: mlx4_0
> >>>Local port:   1
> >>>CPCs attempted:   rdmacm, udcm
> >>>
> >>> However, the results from my latency and bandwith tests seem to be
> >>> what I would expect from infiniband. See:
> >>>
> >>> [hussaif1@lustwzb34 pt2pt]$  mpirun -v -np 2 -hostfile ./hostfile
> >>> ./osu_latency
> >>> # OSU MPI Latency Test v5.3.2
> >>> # Size  Latency (us)
> >>> 0   1.87
> >>> 1   1.88
> >>> 2   1.93
> >>> 4   1.92
> >>> 8   1.93
> >>> 16  1.95
> >>> 32  1.93

Re: [Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

2019-05-01 Thread John Hearns via Beowulf

Hi Faraz. Could to make another summary for us?
What hardware and what Infiniband switch you have
Run   these commands:  ibdiagnet   smshow

You originally had the OpenMPI which was provided by CentOS  ??

You compiled the OpenMPI from source??
How are you bringing the new OpenMPI version itno your PATH ?? Are you
using modules or an mpi switcher utilioty?





On Wed, 1 May 2019 at 09:39, Benson Muite 
wrote:

> Hi Faraz,
>
> Have you tried any other MPI distributions (eg. MPICH, MVAPICH)?
>
> Regards,
>
> Benson
> On 4/30/19 11:20 PM, Gus Correa wrote:
>
> It may be using IPoIB (TCP/IP over IB), not verbs/rdma.
> You can force it to use openib (verbs, rdma) with (vader is for in-node
> shared memory):
>
> mpirun --mca btl openib,self,vader ...
>
>
> These flags may also help tell which btl (byte transport layer) is being used:
>
>  --mca btl_base_verbose 30
>
> See these 
> FAQ:https://www.open-mpi.org/faq/?category=openfabrics#ib-btlhttps://www.open-mpi.org/faq/?category=all#tcp-routability-1.3
>
> Better really ask more details in the Open MPI list. They are the pros!
>
> My two cents,
> Gus Correa
>
>
>
> On Tue, Apr 30, 2019 at 3:57 PM Faraz Hussain  wrote:
>
>> Thanks, after buidling openmpi 4 from source, it now works! However it
>> still gives this message below when I run openmpi with verbose setting:
>>
>> No OpenFabrics connection schemes reported that they were able to be
>> used on a specific port.  As such, the openib BTL (OpenFabrics
>> support) will be disabled for this port.
>>
>>Local host:   lustwzb34
>>Local device: mlx4_0
>>Local port:   1
>>CPCs attempted:   rdmacm, udcm
>>
>> However, the results from my latency and bandwith tests seem to be
>> what I would expect from infiniband. See:
>>
>> [hussaif1@lustwzb34 pt2pt]$  mpirun -v -np 2 -hostfile ./hostfile
>> ./osu_latency
>> # OSU MPI Latency Test v5.3.2
>> # Size  Latency (us)
>> 0   1.87
>> 1   1.88
>> 2   1.93
>> 4   1.92
>> 8   1.93
>> 16  1.95
>> 32  1.93
>> 64  2.08
>> 128 2.61
>> 256 2.72
>> 512 2.93
>> 10243.33
>> 20483.81
>> 40964.71
>> 81926.68
>> 16384   8.38
>> 32768  12.13
>> 65536  19.74
>> 131072 35.08
>> 262144 64.67
>> 524288122.11
>> 1048576   236.69
>> 2097152   465.97
>> 4194304   926.31
>>
>> [hussaif1@lustwzb34 pt2pt]$  mpirun -v -np 2 -hostfile ./hostfile
>> ./osu_bw
>> # OSU MPI Bandwidth Test v5.3.2
>> # Size  Bandwidth (MB/s)
>> 1   3.09
>> 2   6.35
>> 4  12.77
>> 8  26.01
>> 16 51.31
>> 32103.08
>> 64197.89
>> 128   362.00
>> 256   676.28
>> 512  1096.26
>> 1024 1819.25
>> 2048 2551.41
>> 4096 3886.63
>> 8192 3983.17
>> 163844362.30
>> 327684457.09
>> 655364502.41
>> 131072   4512.64
>> 262144   4531.48
>> 524288   4537.42
>> 1048576  4510.69
>> 2097152  4546.64
>> 4194304  4565.12
>>
>> When I run ibv_devinfo I get:
>>
>> [hussaif1@lustwzb34 pt2pt]$ ibv_devinfo
>> hca_id: mlx4_0
>>  transport:  InfiniBand (0)
>>  fw_ver: 2.36.5000
>>  node_guid:  480f:cfff:fff5:c6c0
>>  sys_image_guid: 480f:cfff:fff5:c6c3
>>  vendor_id:  0x02c9
>>  vendor_part_id: 4103
>>  hw_ver: 0x0
>>  board_id:   HP_1360110017
>>  phys_port_cnt:  2
>>  Device ports:
>>  port:   1
>>  state:  PORT_ACTIVE (4)
>>  max_mtu:4096 (5)
>>  active_mtu: 1024 (3)
>>  sm_lid: 0
>>  port_lid:   0
>>  port_lmc:   0x00
>>  link_layer: Ethernet
>>
>>  port:   2
>>  state:  PORT_DOWN (1)
>>  max_mtu:4096 (5)
>>  active_mtu: 1024 (3)
>>  sm_lid: 0
>>  port_lid:   0
>>

Re: [Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

2019-04-30 Thread John Hearns via Beowulf

Hello Faraz.  Please start by running this commandompi_info

On Tue, 30 Apr 2019 at 15:15, Faraz Hussain  wrote:

> I installed RedHat 7.5 on two machines with the following Mellanox cards:
>
> 87:00.0 Network controller: Mellanox Technologies MT27520 Family
> [ConnectX-3 Pro
>
> I followed the steps outlined here to verify RDMA is working:
>
>
> https://community.mellanox.com/s/article/howto-enable-perftest-package-for-upstream-kernel
>
> However, I cannot seem to get Open MPI 3.0.2 to work. When I run it, I
> get this error:
>
> --
>
> No OpenFabrics connection schemes reported that they were able to be
>
> used on a specific port. As such, the openib BTL (OpenFabrics
>
> support) will be disabled for this port.
>
>
>   Local host:  lustwzb34
>
>   Local device: mlx4_0
>
>   Local port:  1
>
>   CPCs attempted:rdmacm, udcm
>
> --
>
> Then it just hangs till I press control C.
>
> I understand this may be an issue with RedHat,  Open MPI or Mellanox.
> Any ideas to debug which place it could be?
>
> Thanks!
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] GPFS question

2019-04-30 Thread John Hearns via Beowulf

Hi Jorg. I will mail you offline.
IBM support for GPFS is excellent - so if they advise a check like that it
is needed.


On Tue, 30 Apr 2019 at 04:53, Chris Samuel  wrote:

> On Monday, 29 April 2019 3:47:10 PM PDT Jörg Saßmannshausen wrote:
>
> > thanks for the feedback. I guess it also depends how much meta-data you
> have
> > and whether or not you have zillions of small or larger files.
> > At least I got an idea how long it might take.
>
> This thread might also be useful, it is a number of years old but it does
> have some
> advice on placement of the filesystem manager before the scan and also on
> their
> experience scanning a ~1PB filesystem.
>
>
> https://www.ibm.com/developerworks/community/forums/html/topic?id=----14834266
>
> All the best,
> Chris
> --
>   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>
>
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [EXTERNAL] Re: Introduction and question

2019-03-22 Thread John Hearns via Beowulf

I matriculated (enrolled) at Glasgow University in 1981 (Scots lads and
lasses start Yoonie at a tender age!).
My Computer Science teacher was Jennifer Haselgrove.
https://en.wikipedia.org/wiki/Jenifer_Haselgrove

Wonderful lady, who of course did not have a degree in Comp Sci - as there
were none when she was an undergrad.
First lecture - algorithms and debugging.
How do you make a cup of tea?  Well - describe the steps to make a cup of
tea to your cat.
Best advice on development and debugging I have ever had. Get yourselves a
cat. Cats are wise and can debug most any program.












On Fri, 22 Mar 2019 at 17:51, Lux, Jim (337K) via Beowulf <
beowulf@beowulf.org> wrote:

> This is somewhat off topic for the list, but what you are describing is a
> phenomenon known as “signaling” – that is, the possession of the degree
> isn’t strictly required for the task at hand (an autodidact could
> potentially do it), but that possession is a signal of other
> characteristics which are deemed desirable.
>
> And yes, most well-known folks in the “computer science” business up to
> around 1980 (well known in 1970 or 1980, I mean) most likely did not have a
> degree in CS, because they weren’t very many CS programs.  It is true that
> most had degrees in Math, Physics, Engineering though.
>
>
>
>
>
>
>
>
>
>
>
> *From: *Beowulf  on behalf of "
> beowulf@beowulf.org" 
> *Reply-To: *Prentice Bisbal 
> *Date: *Thursday, March 21, 2019 at 1:32 PM
> *To: *"beowulf@beowulf.org" 
> *Subject: *[EXTERNAL] Re: [Beowulf] Introduction and question
>
>
>
> Thanks for sharing this. I was recently asked for my input in a job
> description for a new position. They wanted to make the education
> requirements a minimum of a BS in Math, Physics, Engineering, or CS. I
> recommended that they DO NOT list any education requirements for this
> position, because most of the skills they were looking for (git, make
> files, GNU autoconf, CMake, etc.), are not taught in any college curriculum
> I know of, so a formal education is no guarantee of those skills. and some
> of the best sys admins and programmers I ever met  had no formal education
> in STEM, or at all.
>
> I was overruled.
>
> --
> Prentice
>
> On 3/21/19 5:08 AM, Benson Muite wrote:
>
> "Many employers look for people who studied humanities and learned IT by
> themselves, for their wider appreciation of human values."
>
> Mark Burgess
>
> https://www.usenix.org/sites/default/files/jesa_0201_issue.pdf
>
> On 2/23/19 4:30 PM, Will Dennis wrote:
>
> Hi folks,
>
>
>
> I thought I’d give a brief introduction, and see if this list is a good
> fit for my questions that I have about my HPC-“ish” infrastructure...
>
>
>
> I am a ~30yr sysadmin (“jack-of-all-trades” type), completely self-taught
> (B.A. is in English, that’s why I’m a sysadmin :-P) and have ended up
> working at an industrial research lab for a large multi-national IT company
> (http://www.nec-labs.com). In our lab we have many research groups (as
> detailed on the aforementioned website) and a few of them are now using
> “HPC” technologies like Slurm, and I’ve become the lead admin for these
> groups. Having no prior background in this realm, I’m learning as fast as I
> can go :)
>
>
>
> Our “clusters” are collections of 5-30 servers, all collections bought
> over years and therefore heterogeneous hardware, all with locally-installed
> OS (i.e. not trad head-node with PXE-booted diskless minions) which is as
> carefully controlled as I can make it via standard OS install via Cobbler
> templates, and then further configured via config management (we use
> Ansible.) Networking is basic 10GbE between nodes (we do have Infiniband
> availability on one cluster, but it’s fell into disuse now since the
> project that has required it has ended.) Storage is one or more traditional
> NFS servers (some use ZFS, some not.) We have within the past few years
> adopted Slurm WLM for a job-scheduling system on top of these collections,
> and now are up to three different Slurm clusters, with I believe a fourth
> on the way.
>
>
>
> My first question for this list is basically “do I belong here?” I feel
> there’s a lot of HPC concepts it would be good for me to learn, so as I can
> improve the various research group’s computing environments, but not sure
> if this list is for much larger “true HPC” environments, or would be a good
> fit for a “HPC n00b” like me...
>
>
>
> Thanks for reading, and let me know your opinions :)
>
>
>
> Best,
>
> Will
>
>
>
> ___
>
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
>
> ___
>
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>
> To change your subscription (digest mode or unsubscribe) visit 
>

[Beowulf] Quantum computing

2019-03-14 Thread John Hearns via Beowulf

I think this should have a new thread.
I have taken a bit of an interest in quantum computing recently.

There are no real qubit based quantum computers which are ready for work at
the moment. There ARE demonstrators available from IBM etc.

The most advanced machine which is available for work is the D-Wave machine
which is based on an annealing process. If you can map your problem onto an
Ising annealing model then you can compute it on this machine.
I am told that in computer science theory you can map all problems onto the
Ising model (I may be wrong)
https://www.dwavesys.com/home

Rigetti also have a useable machine  https://www.rigetti.com/

All machines at the moment depend on being inside a helium dilution
refrigerator. Seemingly one of the obstacles to further progress is the
rate at which you can buy helium fridges.
(Hint - in the California gold rush the miners made pitifully little money.
The people selling spades made a lot).

There is nothing close to a card or a server which could be operated in a
traditional hot, noisy computer room.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Large amounts of data to store and process

2019-03-14 Thread John Hearns via Beowulf

Jonathan, there is absolutely no need for an apology.
Please let me explain. Julia is often referred to as a JIT compilation
language. This has bad connotations with most people - it means 'slow'.
Now lets be honest - if you have ever tried to do plots in Julia you will
end up growing old waiting for them to start up.

The context is that I was planning a short pitch promoting Julia at work. I
asked on the Discourse, and the term given was Ahead of Time Compilation.
Which is actually a better fit to what actually happens.
Call it what you like really!







On Thu, 14 Mar 2019 at 16:33, Jonathan Aquilina 
wrote:

> I do apologize there but I think what is JIT is JuliaDB side of things.
> Julia has a lot of potential for sure will be interesting to see how it
> develops as the little I have already played with it im really liking it.
>
>
>
> *From: *Beowulf  on behalf of John Hearns
> via Beowulf 
> *Reply-To: *John Hearns 
> *Date: *Thursday, 14 March 2019 at 12:12
> *To: *Beowulf Mailing List 
> *Subject: *Re: [Beowulf] Large amounts of data to store and process
>
>
>
> Jonathan, a small correction if I may. Julia is not JIT - I asked on the
> Julia discourse. A much better description is Ahead of Time compilation.
>
> Not really important, but JIT triggers a certain response with most people.
>
>
>
>
>
> On Thu, 14 Mar 2019 at 07:31, Jonathan Aquilina 
> wrote:
>
> Hi All,
>
>
>
> What sets Julia apart is it is not a compiled language but a Just In Time
> (JIT) language. I am still getting into it but it seems to be geared to
> complex and large data sets. As mentioned previously I am still working
> with a colleague on this prototype. With Julia at least there is an IDE so
> to speak for it. It is based on the ATOM IDE with a package that is
> installed specifically for Julia.
>
>
>
> I will obviously keep the list updated in regards to Julia and my
> experiences with it but the little I have looked at the language it is easy
> to write code for. Its still in its infancy as the latest version I believe
> is 1.0.1
>
>
>
> Regards,
>
> Jonathan
>
>
>
> *From:* Beowulf  *On Behalf Of *Scott Atchley
> *Sent:* 14 March 2019 01:17
> *To:* Douglas Eadline 
> *Cc:* Beowulf Mailing List 
> *Subject:* Re: [Beowulf] Large amounts of data to store and process
>
>
>
> I agree with your take about slower progress on the hardware front and
> that software has to improve. DOE funds several vendors to do research to
> improve technologies that will hopefully benefit HPC, in particular, as
> well as the general market. I am reviewing a vendor's latest report on
> micro-architectural techniques to improve performance (e.g., lower latency,
> increase bandwidth). For this study, they use a combination of DOE
> mini-apps/proxies as well as commercial benchmarks. The techniques that
> this vendor investigated showed potential improvements for commercial
> benchmarks but much less, if any, for the DOE apps, which are highly
> optimized.
>
>
>
> I will state that I know nothing about Julia, but I assume it is a
> higher-level language than C/C++ (or Fortran for numerical codes). I am
> skeptical that a higher-level language (assuming Julia is) can help. I
> believe the vendor's techniques that I am reviewing benefited commercial
> benchmarks because they are less optimized than the DOE apps. Using a
> high-level language relies on the language's compiler/interpreter and
> runtime. The developer has no idea what is happening or does not have the
> ability to improve it if profiling shows that the issue is in the runtime.
> I believe that if you need more performance, you will have to work for it
> in a lower-level language and there is no more free lunch (i.e., hoping the
> latest hardware will do it for me).
>
>
>
> Hope I am wrong.
>
>
>
>
>
> On Wed, Mar 13, 2019 at 5:23 PM Douglas Eadline 
> wrote:
>
>
> I realize it is bad form to reply ones own post and
> I forgot to mention something.
>
> Basically the HW performance parade is getting harder
> to celebrate. Clock frequencies have been slowly
> increasing while cores are multiply rather quickly.
> Single core performance boosts are mostly coming
> from accelerators. Added to the fact that speculation
> technology when managed for security, slows things down.
>
> What this means, the focus on software performance
> and optimization is going to increase because we can just
> buy new hardware and improve things anymore.
>
> I believe languages like Julia can help with this situation.
> For a while.
>
> --
> Doug
>
> >> Hi All,
> >> Basically I have sat down with my colleague and we have opted to go down

Re: [Beowulf] Large amounts of data to store and process

2019-03-14 Thread John Hearns via Beowulf

Jonathan, a small correction if I may. Julia is not JIT - I asked on the
Julia discourse. A much better description is Ahead of Time compilation.
Not really important, but JIT triggers a certain response with most people.


On Thu, 14 Mar 2019 at 07:31, Jonathan Aquilina 
wrote:

> Hi All,
>
>
>
> What sets Julia apart is it is not a compiled language but a Just In Time
> (JIT) language. I am still getting into it but it seems to be geared to
> complex and large data sets. As mentioned previously I am still working
> with a colleague on this prototype. With Julia at least there is an IDE so
> to speak for it. It is based on the ATOM IDE with a package that is
> installed specifically for Julia.
>
>
>
> I will obviously keep the list updated in regards to Julia and my
> experiences with it but the little I have looked at the language it is easy
> to write code for. Its still in its infancy as the latest version I believe
> is 1.0.1
>
>
>
> Regards,
>
> Jonathan
>
>
>
> *From:* Beowulf  *On Behalf Of *Scott Atchley
> *Sent:* 14 March 2019 01:17
> *To:* Douglas Eadline 
> *Cc:* Beowulf Mailing List 
> *Subject:* Re: [Beowulf] Large amounts of data to store and process
>
>
>
> I agree with your take about slower progress on the hardware front and
> that software has to improve. DOE funds several vendors to do research to
> improve technologies that will hopefully benefit HPC, in particular, as
> well as the general market. I am reviewing a vendor's latest report on
> micro-architectural techniques to improve performance (e.g., lower latency,
> increase bandwidth). For this study, they use a combination of DOE
> mini-apps/proxies as well as commercial benchmarks. The techniques that
> this vendor investigated showed potential improvements for commercial
> benchmarks but much less, if any, for the DOE apps, which are highly
> optimized.
>
>
>
> I will state that I know nothing about Julia, but I assume it is a
> higher-level language than C/C++ (or Fortran for numerical codes). I am
> skeptical that a higher-level language (assuming Julia is) can help. I
> believe the vendor's techniques that I am reviewing benefited commercial
> benchmarks because they are less optimized than the DOE apps. Using a
> high-level language relies on the language's compiler/interpreter and
> runtime. The developer has no idea what is happening or does not have the
> ability to improve it if profiling shows that the issue is in the runtime.
> I believe that if you need more performance, you will have to work for it
> in a lower-level language and there is no more free lunch (i.e., hoping the
> latest hardware will do it for me).
>
>
>
> Hope I am wrong.
>
>
>
>
>
> On Wed, Mar 13, 2019 at 5:23 PM Douglas Eadline 
> wrote:
>
>
> I realize it is bad form to reply ones own post and
> I forgot to mention something.
>
> Basically the HW performance parade is getting harder
> to celebrate. Clock frequencies have been slowly
> increasing while cores are multiply rather quickly.
> Single core performance boosts are mostly coming
> from accelerators. Added to the fact that speculation
> technology when managed for security, slows things down.
>
> What this means, the focus on software performance
> and optimization is going to increase because we can just
> buy new hardware and improve things anymore.
>
> I believe languages like Julia can help with this situation.
> For a while.
>
> --
> Doug
>
> >> Hi All,
> >> Basically I have sat down with my colleague and we have opted to go down
> > the route of Julia with JuliaDB for this project. But here is an
> > interesting thought that I have been pondering if Julia is an up and
> > coming fast language to work with for large amounts of data how will
> > that
> >> affect HPC and the way it is currently used and HPC systems created?
> >
> >
> > First, IMO good choice.
> >
> > Second a short list of actual conversations.
> >
> > 1) "This code is written in Fortran." I have been met with
> > puzzling looks when I say the the word "Fortran." Then it
> > comes, "... ancient language, why not port to modern ..."
> > If you are asking that question young Padawan you have
> > much to learn, maybe try web pages"
> >
> > 2) I'll just use Python because it works on my Laptop.
> > Later, "It will just run faster on a cluster, right?"
> > and "My little Python program is now kind-of big and has
> > become slow, should I use TensorFlow?"
> >
> > 3) 
> > "Dammit Jim, I don't want to learn/write Fortran,C,C++ and MPI.
> > I'm a (fill in  domain specific scientific/technical position)"
> > 
> >
> > My reply,"I agree and wish there was a better answer to that question.
> > The computing industry has made great strides in HW with
> > multi-core, clusters etc. Software tools have always lagged
> > hardware. In the case of HPC it is a slow process and
> > in HPC the whole programming "thing" is not as "easy" as
> > it is in other sectors, warp drives and transporters
> > take a little extra effort.
> >
> > 4) Then I

Re: [Beowulf] Large amounts of data to store and process

2019-03-10 Thread John Hearns via Beowulf

Also interesting to me is mixed precision arithmetic - which Julia makes
easy.
We are going to see more and more codes which will choose lower precision
to save energy, not just for running Deep Learning modesl on GPUs.

I share a code snippet not written by me. I think it is a brilliant idea.
Here a researcher in ocean modelling is able to change the types of numbers
his model uses. Run with lower precision and see what changes.
I guess this is easy in C/C++ also, but the concept is fantastic.

# NUMBER FORMAT OPTIONS
const Numtype = Float32
#const Numtype = Posit{16,2}
#const Numtype = Main.FiniteFloats.Finite16
#const Numtype = BigFloat
#setprecision(7)

Using Julias tyoe system you can change the type of the numbers.
Here the calculation is run as NumType  - and you can make NumType to be 32
bit floats or arbitrarily large floats
I see Float64 is not listed - that should be there.









On Sun, 10 Mar 2019 at 10:57, John Hearns  wrote:

> Jonathan, damn good question.
> There is a lot of debate at the moment on how 'traditional' HPC can
> co-exist with 'big data' style HPC.
>
> Regarding Julia, I am a big fan of it and it bring a task-level paradigm
> to HPC work.
> To be honest though, traditional Fortran codes will be with us forever.
> No-one is going to refactor say a weather forecasting model in a national
> centre.
> Also Python has the mindset at the moment. I have seen people in my
> company enthusiastically taking up Python.
> Not because of a measured choice after scanning dozens of learned papers
> and Reddit reviews etc.
> If that was the case then they might opt for Go or some niche language.
> No, the choice is made because their colleagues already use Python and
> pass on start up codes, and there is a huge Python community.
>
> Same with traditional HPC codes really - we all know that batch scripts
> are passed on through the generations like Holy Books,
> and most scientists dont have a clue what these scratches on clay tablets
> actually DO.
> Leading people to continue to run batch jobs which are hard wired for 12
> cores on a 20 core machine etc. etc.
>
> (*)  this is worthy of debate. In Formula 1 whenever we updated the
> version of our CFD code we re-ran a known simulation and made sure we still
> had correlation.
> It is inevitable that old versions of codes will sop being supported
>
>
>
>
>
>
>
> On Sun, 10 Mar 2019 at 09:29, Jonathan Aquilina 
> wrote:
>
>> Hi All,
>>
>> Basically I have sat down with my colleague and we have opted to go down
>> the route of Julia with JuliaDB for this project. But here is an
>> interesting thought that I have been pondering if Julia is an up and coming
>> fast language to work with for large amounts of data how will that affect
>> HPC and the way it is currently used and HPC systems created?
>>
>> Regards,
>> Jonathan
>>
>> -Original Message-
>> From: Beowulf  On Behalf Of Michael Di
>> Domenico
>> Sent: 04 March 2019 17:39
>> Cc: Beowulf Mailing List 
>> Subject: Re: [Beowulf] Large amounts of data to store and process
>>
>> On Mon, Mar 4, 2019 at 8:18 AM Jonathan Aquilina 
>> wrote:
>> >
>> > As previously mentioned we don’t really need to have anything indexed
>> so I am thinking flat files are the way to go my only concern is the
>> performance of large flat files.
>>
>> potentially, there are many factors in the work flow that ultimately
>> influence the decision as others have pointed out.  my flat file example is
>> only one, where we just repeatable blow through the files.
>>
>> > Isnt that what HDFS is for to deal with large flat files.
>>
>> large is relative.  256GB file isn't "large" anymore.  i've pushed TB
>> files through hadoop and run the terabyte sort benchmark, and yes it can be
>> done in minutes (time-scale), but you need an astounding amount of hardware
>> to do it (the last benchmark paper i saw, it was something
>> 1000 nodes).  you can accomplish the same feat using less and less
>> complicated hardware/software
>>
>> and if your dev's are willing to adapt to the hadoop ecosystem, you sunk
>> right off the dock.
>>
>> to get a more targeted answer from the numerous smart people on the list,
>> you'd need to open up the app and workflow to us.  there's just too many
>> variables ___
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>> ___
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Large amounts of data to store and process

2019-03-10 Thread John Hearns via Beowulf

Jonathan, damn good question.
There is a lot of debate at the moment on how 'traditional' HPC can
co-exist with 'big data' style HPC.

Regarding Julia, I am a big fan of it and it bring a task-level paradigm to
HPC work.
To be honest though, traditional Fortran codes will be with us forever.
No-one is going to refactor say a weather forecasting model in a national
centre.
Also Python has the mindset at the moment. I have seen people in my company
enthusiastically taking up Python.
Not because of a measured choice after scanning dozens of learned papers
and Reddit reviews etc.
If that was the case then they might opt for Go or some niche language.
No, the choice is made because their colleagues already use Python and pass
on start up codes, and there is a huge Python community.

Same with traditional HPC codes really - we all know that batch scripts are
passed on through the generations like Holy Books,
and most scientists dont have a clue what these scratches on clay tablets
actually DO.
Leading people to continue to run batch jobs which are hard wired for 12
cores on a 20 core machine etc. etc.

(*)  this is worthy of debate. In Formula 1 whenever we updated the version
of our CFD code we re-ran a known simulation and made sure we still had
correlation.
It is inevitable that old versions of codes will sop being supported

On Sun, 10 Mar 2019 at 09:29, Jonathan Aquilina 
wrote:

> Hi All,
>
> Basically I have sat down with my colleague and we have opted to go down
> the route of Julia with JuliaDB for this project. But here is an
> interesting thought that I have been pondering if Julia is an up and coming
> fast language to work with for large amounts of data how will that affect
> HPC and the way it is currently used and HPC systems created?
>
> Regards,
> Jonathan
>
> -Original Message-
> From: Beowulf  On Behalf Of Michael Di
> Domenico
> Sent: 04 March 2019 17:39
> Cc: Beowulf Mailing List 
> Subject: Re: [Beowulf] Large amounts of data to store and process
>
> On Mon, Mar 4, 2019 at 8:18 AM Jonathan Aquilina 
> wrote:
> >
> > As previously mentioned we don’t really need to have anything indexed so
> I am thinking flat files are the way to go my only concern is the
> performance of large flat files.
>
> potentially, there are many factors in the work flow that ultimately
> influence the decision as others have pointed out.  my flat file example is
> only one, where we just repeatable blow through the files.
>
> > Isnt that what HDFS is for to deal with large flat files.
>
> large is relative.  256GB file isn't "large" anymore.  i've pushed TB
> files through hadoop and run the terabyte sort benchmark, and yes it can be
> done in minutes (time-scale), but you need an astounding amount of hardware
> to do it (the last benchmark paper i saw, it was something
> 1000 nodes).  you can accomplish the same feat using less and less
> complicated hardware/software
>
> and if your dev's are willing to adapt to the hadoop ecosystem, you sunk
> right off the dock.
>
> to get a more targeted answer from the numerous smart people on the list,
> you'd need to open up the app and workflow to us.  there's just too many
> variables ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Large amounts of data to store and process

2019-03-05 Thread John Hearns via Beowulf

Talking about missing values...   Joe Landman is sure to school me again
for this one (owwwccchhh)
https://docs.julialang.org/en/v1/manual/missing/index.html

Going back to the hardware, a 250Gbyte data size is not too large to hold
in RAM.
This might be a good use case for Intel Optane persistent memory - I dont
know exactly how this works when used in a memory mode as opposed to a
block device mode.
The Diablo memory was supposed to migrate cold pages down to the lower,
slower memory.
Does Optane function similarly?





On Tue, 5 Mar 2019 at 01:02, Lux, Jim (337K) via Beowulf <
beowulf@beowulf.org> wrote:

> I'm munging through not very much satellite telemetry (a few GByte), using
> sqlite3..
> Here's some general observations:
> 1) if the data is recorded by multiple sensor systems, the clocks will
> *not* align - sure they may run NTP, but
> 2) Typically there's some sort of raw clock being recorded with the data
> (in ticks of some oscillator, typically) - that's what you can use to put
> data from a particular batch of sources into a time order.  And then you
> have the problem of reconciling the different clocks.
> 3) Watch out for leap seconds in time stamps - some systems have them
> (UTC), some do not (GPS, TAI) - a time of 23:59:60 may be legal.
> 4) you need to have a way to deal with "missing" data, whether it's time
> tags, or actual measurements - as well as "gaps in the record"
> 5) Be aware of the need to de-dupe data - same telemetry records from
> multiple sources.
>
>
> Jim Lux
> (818)354-2075 (office)
> (818)395-2714 (cell)
>
>
> -Original Message-
> From: Beowulf [mailto:beowulf-boun...@beowulf.org] On Behalf Of Jonathan
> Aquilina
> Sent: Monday, March 04, 2019 1:24 AM
> To: Fred Youhanaie ; beowulf@beowulf.org
> Subject: Re: [Beowulf] Large amounts of data to store and process
>
> Hi Fred,
>
> I and my colleague had done some research and found an extension for
> postgresql called timescaleDB, but then upon further research postgres on
> its own is good for such data as well. The thing is these are not going to
> be given to use as the data is coming in but in bulk at the end from the
> parent company.
>
> Have you used postgresql for such type's of data and how has it performed?
>
> Regards,
> Jonathan
>
> On 04/03/2019, 10:19, "Beowulf on behalf of Fred Youhanaie" <
> beowulf-boun...@beowulf.org on behalf of f...@anydata.co.uk> wrote:
>
> Hi Jonathan,
>
> It seems you're collecting metrics and time series data. Perhaps a
> time series database (TSDB) is an option for you. There are a few of these
> out there, but I don't have any personal recommendation.
>
> Cheers,
> Fred
>
> On 04/03/2019 07:04, Jonathan Aquilina wrote:
> > These would be numerical data such as integers or floating point
> numbers.
> >
> > -Original Message-
> > From: Tony Brian Albers 
> > Sent: 04 March 2019 08:04
> > To: beowulf@beowulf.org; Jonathan Aquilina 
> > Subject: Re: [Beowulf] Large amounts of data to store and process
> >
> > Hi Jonathan,
> >
> >  From my limited knowledge of the technologies, I would say that
> HBase with file pointers to the files placed on HDFS would suit you well.
> >
> > But if the files are log files, consider some tools that are suited
> for analyzing those like Kibana.
> >
> > /tony
> >
> >
> > On Mon, 2019-03-04 at 06:55 +, Jonathan Aquilina wrote:
> >> Hi Tony,
> >>
> >> Sadly I cant go into much detail due to me being under an NDA. At
> this
> >> point with the prototype we have around 250gb of sample data but
> again
> >> this data is dependent on the type of air craft. Larger aircraft and
> >> longer flights will generate a lot more data as they have  more
> >> sensors and will log more data than the sample data that I have. The
> >> sample data is 250gb for 35 aircraft of the same type.
> >>
> >> Regards,
> >> Jonathan
> >>
> >> -Original Message-
> >> From: Tony Brian Albers 
> >> Sent: 04 March 2019 07:48
> >> To: beowulf@beowulf.org; Jonathan Aquilina  >
> >> Subject: Re: [Beowulf] Large amounts of data to store and process
> >>
> >> On Mon, 2019-03-04 at 06:38 +, Jonathan Aquilina wrote:
> >>> Good Morning all,
> >>>
> >>> I am working on a project that I sadly cant go into much detail but
> >>> there will be quite large amounts of data that will be ingested by
> >>> this system and would need to be efficiently returned as output to
> >>> the end user in around 10 min or so. I am in discussions with
> >>> another partner involved in this project about the best way forward
> >>> on this.
> >>>
> >>> For me given the amount of data (and it is a huge amount of data)
> >>> that an RDBMS such as postgresql would be a major bottle neck.
> >>> Another thing that was considered flat files, and I think the best
> >>>

Re: [Beowulf] Large amounts of data to store and process

2019-03-04 Thread John Hearns via Beowulf

Jonathan, I am going to stick my neck out here. I feel that HDFS was a
'thing of its time' - people are slavishly building clusters with local
SATA drives to follow that recipe.
Current parallel filesystems have adapters which make them behave like HDFS
http://docs.ceph.com/docs/master/cephfs/hadoop/
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_hadoopconnector.htm
http://wiki.lustre.org/index.php/Running_Hadoop_with_Lustre

Also you all know what is coming next.  Julia.  (Sorry all!)
https://wilmott.com/big-time-series-analysis-with-juliadb/(I guess this
is specific to finance)
https://juliacomputing.github.io/JuliaDB.jl/latest/out_of_core/


On Mon, 4 Mar 2019 at 11:16, Jonathan Aquilina 
wrote:

> I read though that postgres can handle time shift data no problem. I am
> just concerned if the clients would want to do complex big data analytics
> on the data. At this stage we are just prototyping but things are very up
> in the air at this point I am wondering though if sticking with HDFS and
> Hadoop is the best way to go for this in terms of performance and over all
> analytical capabilities.
>
> What I am trying to understand is how Hadoop being written in java is so
> performant.
>
> Regards,
> Jonathan
>
> On 04/03/2019, 12:11, "Beowulf on behalf of Fred Youhanaie" <
> beowulf-boun...@beowulf.org on behalf of f...@anydata.co.uk> wrote:
>
> Hi Jonathan
>
> I have used PostgreSQL for collecting data, but there's nothing there
> that would be of use to you!
>
> A few years ago I set up a similar system (in a hurry) in a small
> company. The bulk data was compressed and it was made available to the
> applications via NFS (IPoIB). The applications were responsible
> for decompressing and pre/post-processing the data. Later, one of the
> developers created a PostgreSQL based system to hold all the data, he used
> C++ for all the data handling. That system was never
> used, even though all the historical data was loaded into the database!
>
> Your choice of components is going to depend on how your analytics
> software are going to access the data. If the data are being read and
> processed once, then loading into a database, then querying it
> once may not pay off.
>
> Cheers,
> Fred
>
> On 04/03/2019 09:24, Jonathan Aquilina wrote:
> > Hi Fred,
> >
> > I and my colleague had done some research and found an extension for
> postgresql called timescaleDB, but then upon further research postgres on
> its own is good for such data as well. The thing is these are not going to
> be given to use as the data is coming in but in bulk at the end from the
> parent company.
> >
> > Have you used postgresql for such type's of data and how has it
> performed?
> >
> > Regards,
> > Jonathan
> >
> > On 04/03/2019, 10:19, "Beowulf on behalf of Fred Youhanaie" <
> beowulf-boun...@beowulf.org on behalf of f...@anydata.co.uk> wrote:
> >
> >  Hi Jonathan,
> >
> >  It seems you're collecting metrics and time series data.
> Perhaps a time series database (TSDB) is an option for you. There are a few
> of these out there, but I don't have any personal recommendation.
> >
> >  Cheers,
> >  Fred
> >
> >  On 04/03/2019 07:04, Jonathan Aquilina wrote:
> >  > These would be numerical data such as integers or floating
> point numbers.
> >  >
> >  > -Original Message-
> >  > From: Tony Brian Albers 
> >  > Sent: 04 March 2019 08:04
> >  > To: beowulf@beowulf.org; Jonathan Aquilina <
> jaquil...@eagleeyet.net>
> >  > Subject: Re: [Beowulf] Large amounts of data to store and
> process
> >  >
> >  > Hi Jonathan,
> >  >
> >  >  From my limited knowledge of the technologies, I would say
> that HBase with file pointers to the files placed on HDFS would suit you
> well.
> >  >
> >  > But if the files are log files, consider some tools that are
> suited for analyzing those like Kibana.
> >  >
> >  > /tony
> >  >
> >  >
> >  > On Mon, 2019-03-04 at 06:55 +, Jonathan Aquilina wrote:
> >  >> Hi Tony,
> >  >>
> >  >> Sadly I cant go into much detail due to me being under an
> NDA. At this
> >  >> point with the prototype we have around 250gb of sample data
> but again
> >  >> this data is dependent on the type of air craft. Larger
> aircraft and
> >  >> longer flights will generate a lot more data as they have
> more
> >  >> sensors and will log more data than the sample data that I
> have. The
> >  >> sample data is 250gb for 35 aircraft of the same type.
> >  >>
> >  >> Regards,
> >  >> Jonathan
> >  >>
> >  >> -Original Message-
> >  >> From: Tony Brian Albers 
> >  >>

Re: [Beowulf] 2 starting questions on how I should proceed for a correct first micro-cluster (2-nodes) building

2019-03-03 Thread John Hearns via Beowulf

ps. If you are interested in parallelism...  there is Julia.
https://docs.julialang.org/en/v1/manual/parallel-computing/index.html

I would also advise setting up just one server and install the latest
CentOS.
You can start with some tutorials on MPI - which is the current standard
for parallelism.
you can then install OpenHPC on the same server as OpenHPC is an 'overlay'
then start building the cluster.




On Sun, 3 Mar 2019 at 09:24, John Hearns  wrote:

> I second OpenHPC. It is actively maintained and easy to set up.
>
> Regarding the hardware, have a look at Doug Eadlines Limulus clusters. I
> think they would be a good fit.
> Dougs site is excellent in general https://www.clustermonkey.net/
>
> Also some people build Raspberry Pi clusters for learning.
>
>
> On Sun, 3 Mar 2019 at 01:16, Renfro, Michael  wrote:
>
>> Heterogeneous is possible, but the slower system will be a bottleneck if
>> you have calculations that require both systems to work in parallel and
>> synchronize with each other periodically. You might also find bottlenecks
>> with your network interconnect, even on homogeneous systems.
>>
>> I’ve never used ROCKS, and OSCAR doesn’t look to have been updated in a
>> few years (maybe it doesn’t need to be). OpenHPC is a similar product, more
>> recently updated. But except for the cluster I manage now, I always just
>> just went with a base operating system for the nodes and added HPC
>> libraries and services as required.
>>
>> > On Mar 2, 2019, at 7:34 AM, Marco Ippolito 
>> wrote:
>> >
>> > Hi all,
>> >
>> > I'm developing an application which need to use tools and other
>> applications that excel in a distributed environment:
>> > - HPX ( https://github.com/STEllAR-GROUP/hpx ) ,
>> > - Kafka ( http://kafka.apache.org/ )
>> > - a blockchain tool.
>> > This is why I'm eager to learn how to deploy a beowulf cluster.
>> >
>> > I've read some info here:
>> > - https://en.wikibooks.org/wiki/Building_a_Beowulf_Cluster
>> > - https://www.linux.com/blog/building-beowulf-cluster-just-13-steps
>> > -
>> https://www-users.cs.york.ac.uk/~mjf/pi_cluster/src/Building_a_simple_Beowulf_cluster.html
>> >
>> > And I have 2 starting questions in order to clarify how I should
>> proceed for a correct cluster building:
>> >
>> > 1) My starting point is a PC, I'm working with at the moment, with this
>> features:
>> >   - Corsair Simm Memoria RAM, DDR3, PC1600, 32GB, CL10 Ven k
>> >   - Intel Ci7 Box Processore CPU 1150 i7-4790K, 4.00 GHz
>> >   - Samsung MZ-76E500B Unità SSD Interna 860 EVO, 500 GB, 2.5" SATA
>> III, Nero/Grigio
>> >   - MB ASUS H97-PLUS
>> >- lettore DVD-RW
>> >
>> >   I'm using as OS Ubuntu 18.04.01 Server Edition.
>> >
>> > On one side I read that it should be better to put in the same cluster
>> the same type of HW : PCs of the same type,
>> > but on the other side also hetherogeneous HW (server or PCs) can also
>> be deployed.
>> > Sowhich HW should I take in consideration for the second node, if
>> the features of the very first "node" are the ones above?
>> >
>> > 2) I read that some software (Rocks, OSCAR) would make the cluster
>> configuration easier and smoother. But I also read that
>> >  using the same OS,
>> > with the right same version, for all nodes, in my case Ubuntu 18.04.01
>> Server Edition, could be a safe starter.
>> > So... is it strictly necessary to use Rocks or OSCAR to correctly
>> configure the nodes network?
>> >
>> > Looking forward to your kind hints and suggestions.
>> > Marco
>> >
>> >
>> > ___
>> > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
>> Computing
>> > To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>> ___
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] 2 starting questions on how I should proceed for a correct first micro-cluster (2-nodes) building

2019-03-03 Thread John Hearns via Beowulf

I second OpenHPC. It is actively maintained and easy to set up.

Regarding the hardware, have a look at Doug Eadlines Limulus clusters. I
think they would be a good fit.
Dougs site is excellent in general https://www.clustermonkey.net/

Also some people build Raspberry Pi clusters for learning.


On Sun, 3 Mar 2019 at 01:16, Renfro, Michael  wrote:

> Heterogeneous is possible, but the slower system will be a bottleneck if
> you have calculations that require both systems to work in parallel and
> synchronize with each other periodically. You might also find bottlenecks
> with your network interconnect, even on homogeneous systems.
>
> I’ve never used ROCKS, and OSCAR doesn’t look to have been updated in a
> few years (maybe it doesn’t need to be). OpenHPC is a similar product, more
> recently updated. But except for the cluster I manage now, I always just
> just went with a base operating system for the nodes and added HPC
> libraries and services as required.
>
> > On Mar 2, 2019, at 7:34 AM, Marco Ippolito 
> wrote:
> >
> > Hi all,
> >
> > I'm developing an application which need to use tools and other
> applications that excel in a distributed environment:
> > - HPX ( https://github.com/STEllAR-GROUP/hpx ) ,
> > - Kafka ( http://kafka.apache.org/ )
> > - a blockchain tool.
> > This is why I'm eager to learn how to deploy a beowulf cluster.
> >
> > I've read some info here:
> > - https://en.wikibooks.org/wiki/Building_a_Beowulf_Cluster
> > - https://www.linux.com/blog/building-beowulf-cluster-just-13-steps
> > -
> https://www-users.cs.york.ac.uk/~mjf/pi_cluster/src/Building_a_simple_Beowulf_cluster.html
> >
> > And I have 2 starting questions in order to clarify how I should proceed
> for a correct cluster building:
> >
> > 1) My starting point is a PC, I'm working with at the moment, with this
> features:
> >   - Corsair Simm Memoria RAM, DDR3, PC1600, 32GB, CL10 Ven k
> >   - Intel Ci7 Box Processore CPU 1150 i7-4790K, 4.00 GHz
> >   - Samsung MZ-76E500B Unità SSD Interna 860 EVO, 500 GB, 2.5" SATA III,
> Nero/Grigio
> >   - MB ASUS H97-PLUS
> >- lettore DVD-RW
> >
> >   I'm using as OS Ubuntu 18.04.01 Server Edition.
> >
> > On one side I read that it should be better to put in the same cluster
> the same type of HW : PCs of the same type,
> > but on the other side also hetherogeneous HW (server or PCs) can also be
> deployed.
> > Sowhich HW should I take in consideration for the second node, if
> the features of the very first "node" are the ones above?
> >
> > 2) I read that some software (Rocks, OSCAR) would make the cluster
> configuration easier and smoother. But I also read that
> >  using the same OS,
> > with the right same version, for all nodes, in my case Ubuntu 18.04.01
> Server Edition, could be a safe starter.
> > So... is it strictly necessary to use Rocks or OSCAR to correctly
> configure the nodes network?
> >
> > Looking forward to your kind hints and suggestions.
> > Marco
> >
> >
> > ___
> > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Liquid cooling once again

2019-02-05 Thread John Hearns via Beowulf

Pah. This is nothing. This is what a systems engineer in a proper immersive
cooling data centre looks like
https://i.ytimg.com/vi/2S2aEcVbO48/maxresdefault.jpg

You've got 60 seconds to change that hard drive, or you run out of oxygen.

On Tue, 5 Feb 2019 at 16:49, Stu Midgley  wrote:

> regular updates on our blog https://dug.com/dug-blog/
>
> On Tue, Feb 5, 2019 at 10:48 AM Stu Midgley  wrote:
>
>> Yeh, those machines are in our Perth office...  Houston is coming
>> together.  A few 3.5MW transformer delivered today :)
>>
>> On Mon, Feb 4, 2019 at 7:55 PM Lux, Jim (337K) via Beowulf <
>> beowulf@beowulf.org> wrote:
>>
>>> Ars Technica talking about it..
>>>
>>>
>>>
>>>
>>> https://arstechnica.com/gadgets/2019/02/cheaper-greener-cooler-how-liquid-cooling-came-to-dominate-the-data-center/
>>>
>>>
>>>
>>> pictures of DUG’s servers
>>>
>>>
>>>
>>> --
>>>
>>>
>>> ___
>>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>
>>
>> --
>> Dr Stuart Midgley
>> sdm...@gmail.com
>>
>
>
> --
> Dr Stuart Midgley
> sdm...@gmail.com
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] A Cooler Cloud: A Clever Conduit Cuts Data Centers? Cooling Needs by 90 Percent

2019-01-28 Thread John Hearns via Beowulf

Prentice, the website refers to Open Compute racks.  "... technology has
been designed to fit into standard Open Compute racks".
So yep, 19 inch racks are not being targeted here. But OCP is pretty
widespread.
I would really like to find out if they can retrofit these to existing kit.
I suspect though that you need servers engineered to fit onto their
heatsinks.
orced Physics cooling technology has been designed to fit into Standard
Open Compute Racks. orced Physics cooling technology has been designed to
fit into Standard Open Compute Racks.


On Mon, 28 Jan 2019 at 16:48, Prentice Bisbal via Beowulf <
beowulf@beowulf.org> wrote:

> So I was thinking about this over the weekend (because I apparently have
> nothing better to do with my time), and I definitely think this is a
> non-starter due to the massive change in server hardware layout to
> accommodate this thing. Yes, blades, and twin form factor servers already
> required that, and they're common form factors, but those form factors were
> just a matter of shrinking or changing the layout of the motherboard but
> still look like "traditional" layouts to the untrained eye, and they were
> still designed with typical front-to-back air cooling in mind. I feel like
> re-arranging the layout of components to accomodate this thing is a just
> more of a change than the market will accept.
>
> Just my 2 cents.
>
> Prentice
>
> On 1/25/19 3:56 PM, Prentice Bisbal wrote:
>
> Eric,
>
> I was suspecting that might be the case, but the explanations in the other
> articles were way too vague to be sure of that. The NextPlatform provided
> much better pictures. If that's the case, this thing operates like a
> direct-expansion (DX) refrigeration system, where the refrigerant is air
> and does not change state from liquid to gas, like a typical DX
> refrigeration system, and the induced-draft fan provides the shaft work,
> and those tiny channels that allegedly line up the molecules act as many
> tiny offices for the throttling process. Based on the pictures in the Next
> Platform article, here is a crude drawing of cross-section of one of these
> devices that I drew in Google Draw. It should help you understand what's
> going inside this thing:
>
>
> https://docs.google.com/document/d/1UK94PxVlQtVSb2ns5TbCqHjPJ1vYSOmkGSeSorvHyaM/edit?usp=sharing
>
> Given this design, you can only have an induced-draft fan on the outlet. A
> forced-draft fan on the inlet would compress the air, heating it up and
> negating the throttling (or Joule-Thompson) effect on the low-pressure
> side.
>
> At the end of the day, thermodynamics still says X amount of shaft work
> has to be done to provide Y amount of cooling through this process, so I'm
> still skeptical of it, especially at scale.
>
> And for those of you looking for something really boring to read rather
> than work, here are the related patents. I haven't read them myself.
>
> https://patents.google.com/patent/US8414847
>
> https://patents.google.com/patent/US8986627B2
>
> https://patents.google.com/patent/US10113774B2
>
> Prentice
>
> On 1/25/19 2:26 PM, Eric Moore wrote:
>
> Actually, it looks like Joule-Thompson cooling to me (Especially given the
> "Joule Force" name). You've got the air intake (ambient), then an expansion
> nozzle, into a low-pressure region, which is created by the fan at the end.
> So the outlet velocity of the air (and thus it's kinetic energy) is higher
> than the inlet velocity, which would lower the internal energy, and thus
> the temperature. Instead the fins/nozzle/heatsink transfer heat to the
> expanding gas, which exits a little above ambient temperature. I imagine
> the drawback is you really need to get rid of that high velocity hot air,
> and can't recirculate it, or the kinetic energy would be converted back to
> thermal energy, and mess it all up. The descriptions do all involve the
> exhaust air being ducted to the outside. This article has the most
> technical detail:
> https://www.nextplatform.com/2018/12/04/the-leading-edge-of-air-cooled-servers-leads-to-the-edge/
>
> On Fri, Jan 25, 2019 at 11:33 AM Prentice Bisbal via Beowulf <
> beowulf@beowulf.org> wrote:
>
>> You all know how much I like talking about heat transfer and server
>> cooling, so I decided to do some research on this product:
>>
>> Here's their website:
>>
>> https://forcedphysics.com
>>
>> and here's their YouTube channel with 5 videos:
>>
>> https://www.youtube.com/channel/UClwWeahYGuNl0THWVz1Hyow/videos
>>
>> This is really nothing more than an air-cooled heatsink. I'm afraid I'm
>> going to have to call BS on this technology for the following reasons:
>>
>> 1. It still uses air as the primary cooling medium. I just don't think
>> air has adequate thermal conductivity or thermal capacity to serve modern
>> processor, no matter what you do to it.
>>
>> 2. In the videos, they present highly idealized tests with no control to
>> use for comparison. How do I know I wouldn't get the same results doing the
>> same

Re: [Beowulf] A Cooler Cloud: A Clever Conduit Cuts Data Centers? Cooling Needs by 90 Percent

2019-01-28 Thread John Hearns via Beowulf

Thinking about it, if they are sucking in air through very narrow slots
then sending it through an expanding chamber it will make a heck of a
noise. I wonder if you could tune each expansion pipe to a particular note,
and construct a mighty pipe organ on your data centre?
Tunes are produced as jobs go through each step in the algorithm, so
monitoring jobs is as easy as listening at the door of the data centre.
A Lovely Mozart melody indicates all is well. Beethovens Ninth with the
cannon shot at the end indicates a crashed job.

ps. If you guys read this do drop me a note   john.hea...@cgg.com

On Sat, 26 Jan 2019 at 19:30, Jonathan Engwall <
engwalljonathanther...@gmail.com> wrote:

> I checked my math. :(
> A human hair is as narrow as 17 microns...and the propulsion design was
> for a SCRAM jet. Dubbed the Aurora I think. It was to have essentially no
> moving parts.
>
> On Fri, Jan 25, 2019, 9:47 PM Jonathan Engwall <
> engwalljonathanther...@gmail.com wrote:
>
>> Hello,
>> I looked at one of the patents. It is weird.
>> Quite weird ideas about airflow or as they say 'gas' moving through a
>> channel only 500 square microns. If I am off saying that is the width and
>> height of 22 human hairs, tell me where I went wrong. And that is the
>> _intake_!
>>
>> Jonathan Engwall
>>
>>>
>>> https://patents.google.com/patent/US8414847
>>>
>>> https://patents.google.com/patent/US8986627B2
>>>
>>> https://patents.google.com/patent/US10113774B2
>>>
>>> Prentice
>>>
>>> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] A Cooler Cloud: A Clever Conduit Cuts Data Centers? Cooling Needs by 90 Percent

2019-01-25 Thread John Hearns via Beowulf

Sorry, their videos do have a fan at one end.
In the video though they do say "enables ten times the server density" - as
opposed to what?
I am keeping an open mind though.
Forced Physics guys - hint I work somewhere which has lots of servers.


On Fri, 25 Jan 2019 at 17:15, John Hearns  wrote:

> 3. Using this technology means a complete redesign of your server hardware
> and possibly your racks.
> It does say it fits in standard OpenCompute racks. But I gues the racks
> are the only thing you get to keep.
>
> I think I understand what they are getting at - that shape will cause
> expansion of the air volume, and hence cooling.
> I guess like SR71 engine spikes or something.
> But how the heck do they move the air fast enough to do the cooling
> without fans?
>
> They keep referring to external air. Which is fine. But if you ever want
> to do this make sure the external air is WELL filtered.
>
>
>
>
>
> On Fri, 25 Jan 2019 at 16:33, Prentice Bisbal via Beowulf <
> beowulf@beowulf.org> wrote:
>
>> You all know how much I like talking about heat transfer and server
>> cooling, so I decided to do some research on this product:
>>
>> Here's their website:
>>
>> https://forcedphysics.com
>>
>> and here's their YouTube channel with 5 videos:
>>
>> https://www.youtube.com/channel/UClwWeahYGuNl0THWVz1Hyow/videos
>>
>> This is really nothing more than an air-cooled heatsink. I'm afraid I'm
>> going to have to call BS on this technology for the following reasons:
>>
>> 1. It still uses air as the primary cooling medium. I just don't think
>> air has adequate thermal conductivity or thermal capacity to serve modern
>> processor, no matter what you do to it.
>>
>> 2. In the videos, they present highly idealized tests with no control to
>> use for comparison. How do I know I wouldn't get the same results doing the
>> same experiment but using a similar duct fashioned out of sheet metal.
>>
>> 3. Using this technology means a complete redesign of your server
>> hardware and possibly your racks.
>>
>> 4. None of the information in the videos or on their website really
>> explains how this technology works, and what really differentiates it from
>> any other air-cooled heat sink. Most people with a good invention are
>> usually excited to tell you how it works. Since they brag about 30
>> international patents for this, there's no need to try to protect a trade
>> secret.
>>
>> 5. This statement:
>>
>> The fins work like teeth in a comb, neatly orienting air molecules to
>> point in the same direction and arranging them into columns.
>>
>> Based on my education, this statement seems to be completely devoid of
>> science.
>>
>> This statement seems to defy the laws of physics. Last time I checked,
>> unless an atom or molecule is at absolute zero, it has movement, whether
>> it's spinning or vibrating, or both, so how can they get air molecules to
>> line up all in neat little rows, where the molecules are all pointing the
>> same way?
>>
>> This also implies very laminar flow.  As fluid velocity increases that
>> the diameter of the channel decreases, the Reynolds Number increases. As
>> the Reynold's number goes up, turbulence increases, so mathematically, I
>> would expect this flow to be tubulent, and not laminar. From my classes on
>> heat transfer, turbulent flow around the heat transfer surface increases
>> heat transfer, so laminar flow in this case wouldn't be a good thing.
>>
>> Until they can provide better comparisons with real servers in real data
>> center environments, I'm going to classify this as "snake oil"
>>
>> https://en.wikipedia.org/wiki/Snake_oil
>>
>> Prentice
>>
>> On 1/24/19 3:54 PM, chuck_pet...@selinc.com wrote:
>>
>> Well, this is interesting.
>>
>> "According to Forced Physics’ <https://forcedphysics.com/
>> [forcedphysics.com]
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__forcedphysics.com_=DwMFAw=-_uRSsrpJskZgEkGwdW-sXvhn_FXVaEGsm0EI46qilk=fawF3TRTwCqlaBkoLcxYCr4F4NRwCc64hmEgi9rHPpE=zr6lAlVphGxOQTXSElww9hGpqb9IZPik0_MN2v8Fqjs=lb4Hi9X8NKIYWe_e1RU3Cw4gr9Uz_B7n5pnCNY0ss3U=>>
>> chief technology officer, David Binger, the company’s conductor can help a
>> typical data center eliminate its need for water or refrigerants and shrink
>> its 22-MW load by 7.72 MW, which translates to an annual reduction of 67.6
>> million kWh. That data center could also save a total of US $45 million a
>> year on infrastructure, operating, and energy costs with the new syst

1 2 3 4 5 6 7 >

1 - 100 of 678 matches

Mail list logo