Re: [Beowulf] How to know if infiniband network works?

2017-08-03 Thread John Hearns via Beowulf
Fazar, I think that you have got things sorted out. However I think that the number of optiosn in OpenMPI is starting to confuse you. But do not lose heart! I have been in the same place myself many time. Specifically I am thinking on one time when a customer asked me to benthmark the latency

Re: [Beowulf] How to know if infiniband network works?

2017-08-03 Thread John Hearns via Beowulf
Faraz, do you mean the IPOIB tcp network, ie the ib0 interface? Good question. I would advise joining the Openmpi list. They are very friendly over there. I have always seen polite and helpful replies even to dumb questions there (such as the ones I ask). I actually had to do something similar

Re: [Beowulf] Supercomputing comes to the Daily Mail

2017-08-14 Thread John Hearns via Beowulf
it again > and then pull it apart to look for changes. > > Jeff > > > On Mon, Aug 14, 2017 at 4:30 AM, John Hearns via Beowulf < > beowulf@beowulf.org> wrote: > >> The Daily Mail is (shall we say) a rather right-wing daily newspaper in >> the UK. It m

Re: [Beowulf] How to debug slow compute node?

2017-08-10 Thread John Hearns via Beowulf
Faraz, I think you might have to buy me a virtual coffee. Or a beer! Please look at the hardware health of that machine. Specifically the DIMMS. I have seen this before! If you have some DIMMS which are faulty and are generating ECC errors, then if the mcelog service is enabled an interrupt is

Re: [Beowulf] How to debug slow compute node?

2017-08-10 Thread John Hearns via Beowulf
Another thing to perhaps look at. Are you seeing messages abotu thermal throttling events in the system logs? Could that node have a piece of debris caught in its air intake? I dont think that will produce a 30% drop in perfoemance. But I have caught compute nodes with pieces of packaking sucked

Re: [Beowulf] How to debug slow compute node?

2017-08-10 Thread John Hearns via Beowulf
, of course.) Less likely, but possible: + Different BIOS configuration w.r.t. the other nodes. + Poorly sat memory, IB card, etc, or cable connections. + IPMI may need a hard reset. Power down, remove the power cable, wait several minutes, put the cable back, power on. Gus Correa On 08/10/2017

Re: [Beowulf] How to debug slow compute node?

2017-08-10 Thread John Hearns via Beowulf
ps. Look at watch cat /proc/interrupts also You might get a qualitative idea of a huge rate of interrupts. On 10 August 2017 at 16:59, John Hearns wrote: > Faraz, >I think you might have to buy me a virtual coffee. Or a beer! > Please look at the hardware

[Beowulf] Supercomputing comes to the Daily Mail

2017-08-14 Thread John Hearns via Beowulf
The Daily Mail is (shall we say) a rather right-wing daily newspaper in the UK. It may give some flavour if I tell you that is most famous/infamous headline is "Hurrha for the Blackshirts" (1934) A surprisingly good article on using HPC and a visualisation wall to mode ocean currents.

Re: [Beowulf] Hyperthreading and 'OS jitter'

2017-07-26 Thread John Hearns via Beowulf
Scott, Evan, Nathan those are some really thought provoking answers. Evans point about cloud providers is especially interesting - as we are seeing more and more Openstack being used in private clouds , to create clusters 'on demand' we should be aware of this. And regarding the point "I'm not an

[Beowulf] Hyperthreading and 'OS jitter'

2017-07-22 Thread John Hearns via Beowulf
Several times in the past I have jokingly asked if there shoudl eb another lower powered CPU core ina system to run OS tasks (hello Intel - are you listening?) Also int he past there was advice to get best possible throughpur on AMD Bulldozer CPUs to run only on every second core (as they share

[Beowulf] Cluster Hat

2017-08-04 Thread John Hearns via Beowulf
Reading the Register article in the IBM IData system which was moved from Daresbury Labs to Durham Uni, one of the comments flagged this up: http://climbers.net/sbc/clusterhat-review-raspberry-pi-zero/ That's rather neat I think! So how wmany of these can we get in a 42U rack ;-)

Re: [Beowulf] Poor bandwith from one compute node

2017-08-18 Thread John Hearns via Beowulf
Joe - Leela? I did not know you were a Dr Who fan. Faraz, you really should log into your switch and look at the configuration of the ports. Find the port to which that compute node is connected by listing the MAC address table. (If you are using Bright there is an easy way to do this). Look at

Re: [Beowulf] Poor bandwith from one compute node

2017-08-17 Thread John Hearns via Beowulf
Faraz, I really suggest you examine the Intel Cluster Checker. I guess that you cannot take down a production cluster to run an entire Cluster checker run, however these are the types of faults which ICC is designed to find. You can define a smal lset of compute nodes to run on, including this

Re: [Beowulf] Varying performance across identical cluster nodes.

2017-09-14 Thread John Hearns via Beowulf
Prentice, as I understand it the problem here is that with the same OS and IB drivers, there is a big difference in performance between stateful and NFS root nodes. Throwing my hat into the ring, try looking ot see if there is an excessive rate of interrupts in the nfsroot case, coming from

Re: [Beowulf] Julia Language

2017-09-19 Thread John Hearns via Beowulf
re > fundamentally unstable under the lens of extreme scale computing. I am > always interested in the extremes when the envelope MTBF (mean time between > failures) can be pushed. > > Justin > > On Mon, Sep 18, 2017 at 4:22 AM, John Hearns via Beowulf < > beowulf@beow

Re: [Beowulf] Julia Language

2017-09-19 Thread John Hearns via Beowulf
ch Julia coding? Can you talk about your experience? >> >> I have threatened to learn it for a while but your post has prompted me >> to finally start learning Julia :) >> >> Thanks! >> >> Jeff >> >> >> On Wed, Sep 13, 2017 at 7:43 AM, J

Re: [Beowulf] Julia Language

2017-09-19 Thread John Hearns via Beowulf
for a while but your post has prompted me to > finally start learning Julia :) > > Thanks! > > Jeff > > > On Wed, Sep 13, 2017 at 7:43 AM, John Hearns via Beowulf < > beowulf@beowulf.org> wrote: > >> I see HPCwire has an article on Julia. I am a big fan of Julia

Re: [Beowulf] Julia Language

2017-09-18 Thread John Hearns via Beowulf
gt; > Thanks! > > Justin > > On Wed, Sep 13, 2017 at 7:43 AM, John Hearns via Beowulf < > beowulf@beowulf.org> wrote: > >> I see HPCwire has an article on Julia. I am a big fan of Julia, so >> though it worth pointing out. >> https://www.hpcwire.com/off-the-wire/j

Re: [Beowulf] cluster deployment and config management

2017-09-05 Thread John Hearns via Beowulf
Regarding Rocks clusters, permit me to vent a little. In my last employ we provided Rocks clusters Rocks is firmly embedded in the Redhat 6 era with out of date 2.6 kernels. It uses kickstart for installations (which is OK). However with modern generations of Intel processors you get a

Re: [Beowulf] cluster deployment and config management

2017-09-05 Thread John Hearns via Beowulf
Fusty? Lachlan - you really are fromt he Western Isles aren't you? Another word: 'oose' - the fluff which collects under the bed. Or inside servers. On 5 September 2017 at 08:57, Carsten Aulbert wrote: > Hi > > On 09/05/17 08:43, Stu Midgley wrote: > >

Re: [Beowulf] RAID5 rebuild, remount with write without reboot?

2017-09-05 Thread John Hearns via Beowulf
David, I have never been in that situation. However I have configured my fair share of LSI controllers so I share your pain! (I reserve my real tears for device mapper RAID). How about a mount -o remount Did you try that before rebooting? I am no expert here - in the past when I have had

[Beowulf] Julia Language

2017-09-13 Thread John Hearns via Beowulf
I see HPCwire has an article on Julia. I am a big fan of Julia, so though it worth pointing out. https://www.hpcwire.com/off-the-wire/julia-joins-petaflop-club/ Though the source of this seems old news - it is a presentation from this year's JuliaCon JuliaCon 2018 will be talking place at UCL in

Re: [Beowulf] What is rdma, ofed, verbs, psm etc?

2017-09-24 Thread John Hearns via Beowulf
Jon, ROCE is commonly used. We run GPFS over ROCE and plenty of other sites do also. To answer questions on what network ROCE needs, I guess you could run it on a 1 Gbps network with office grade network switches. What it really needs is a lossless network. Dare I saw the Mellanox word I

Re: [Beowulf] help with cables

2017-11-23 Thread John Hearns via Beowulf
My experience - go to your local pharmacy and purchase a dental probe. You use it to threaten the switches as in Dustin Hoffman in Marathon Man. Seriously, the dental probe is used for releasing the latches on those cables in tight spaces. On 23 November 2017 at 00:24, Jonathan Engwall <

Re: [Beowulf] Thoughts on git?

2017-12-19 Thread John Hearns via Beowulf
Faraz - a shorter answer. If you already have a git repository, try using Atom https://atom.io/ On 19 December 2017 at 17:40, John Hearns wrote: > Faraz, I use git every day. > We have Bitbucket here, and have linked the repositories to Jira for our > sprint planning

Re: [Beowulf] Thoughts on git?

2017-12-19 Thread John Hearns via Beowulf
Faraz, I use git every day. We have Bitbucket here, and have linked the repositories to Jira for our sprint planning and kanban. Anyway - you say something very relevant "I have never had a need to go back to an older version of my script." It is not only about rollback to older versions. If you

Re: [Beowulf] Thoughts on git?

2017-12-20 Thread John Hearns via Beowulf
Nathan, Sir - you are a prize Git. Abusive retorts aside, there is another very good use for Git. As a fan of the Julia language, I report that Julia packages are held as repositories on Github. If you want to work with an unregistered package (which is usually a development project) you bring

Re: [Beowulf] Bright Cluster Manager

2018-05-09 Thread John Hearns via Beowulf
> All of a sudden simple “send the same command to all nodes” just doesn’t work. And that’s what will inevitably be the case as we scale up in the HPC world – there will always be dead or malfunctioning nodes. Jim, this is true. And 'we' should be looking to the webscale generation for the

[Beowulf] Alternatives To MPI Workshop

2018-05-09 Thread John Hearns via Beowulf
As a fan of the Julia language, I jsut saw this announcement on the Julia forum. Sounds mighty interesting! https://discourse.julialang.org/t/cfp-parallel-applications-workshop-alternatives-to-mpi-supercomputing-2018/10762 http://sourceryinstitute.github.io/PAW/ Higher-level parallel

Re: [Beowulf] Fault tolerance & scaling up clusters (was Re: Bright Cluster Manager)

2018-05-17 Thread John Hearns via Beowulf
Roland, the OpenHPC integration IS interesting. I am on the OpenHPC list and look forward to the announcement there. On 17 May 2018 at 15:00, Roland Fehrenbacher wrote: > > "J" == Lux, Jim (337K) writes: > > J> The reason I hadn't looked at

[Beowulf] Project Natick

2018-06-06 Thread John Hearns via Beowulf
https://www.bbc.com/news/technology-44368813 https://natick.research.microsoft.com/ I must admit my first thoughts on hearing an item about this on Radio Scotland is that now that humans have laid waste to the surface of the Earth we are going to boil the oceans. My second thought is for the

Re: [Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

2018-06-11 Thread John Hearns via Beowulf
Skylar Thomson wrote: >Unfortunately we don't have a mechanism to limit >network usage or local scratch usage, but the former is becoming less of a >problem with faster edge networking, and we have an opt-in bookkeeping mechanism >for the latter that isn't enforced but works well enough to keep

[Beowulf] Clearing out scratch space

2018-06-12 Thread John Hearns via Beowulf
In the topic on avoiding fragmentation Chris Samuel wrote: >Our trick in Slurm is to use the slurmdprolog script to set an XFS project >quota for that job ID on the per-job directory (created by a plugin which >also makes subdirectories there that it maps to /tmp and /var/tmp for the >job) on the

Re: [Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

2018-06-12 Thread John Hearns via Beowulf
> However, I do think Scott's approach is potentially very useful, by directing > jobs < full node to one end of a list of nodes and jobs that want full nodes > to the other end of the list (especially if you use the partition idea to > ensure that not all nodes are accessible to small jobs).

Re: [Beowulf] OT, X11 editor which works well for very remote systems

2018-06-08 Thread John Hearns via Beowulf
> VNC takes over the console on the remote machine. What if somebody else is using that, or there isn't one (headless server)? David, are you sure about that? I did at lot of work in F1 on VNC to workstations... as I remember VNC sessions are not on the 'root window' by default. I did a lot of

Re: [Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

2018-06-08 Thread John Hearns via Beowulf
Chris, good question. I can't give a direct asnwer there, but let me share my experiences. In the past I managed SGI ICE clusters and a large memory UV system with PBSPro queuing. The engineers submitted CFD solver jobs using scripts, and we only allowed them to use a multiple of N cpus, in fact

Re: [Beowulf] Fwd: Project Natick

2018-06-07 Thread John Hearns via Beowulf
The report interestingly makes a comparison to cruise lines and the US Navy having large IT infrastructures at sea. I guess cruise ships of course have servers plus satcomms, as do warships. But the thought of the SOSUS sonar chain comes to mind... then again those electronics will be down a lot

Re: [Beowulf] Working for DUG, new thead

2018-06-13 Thread John Hearns via Beowulf
Bill's question re. "the cluster is slow"is fantastic. That covers people skills in addition to technical skills. ps. best of luck. You never know who you might end up working with ;-) On 13 June 2018 at 20:46, Andrew Latham wrote: > Bill's question is good and I have heard it many times.

Re: [Beowulf] Working for DUG, new thead

2018-06-13 Thread John Hearns via Beowulf
Jonathan, if you have taken the interest to join this list then there is no need to be terrified. I have learned that people who are enthusiastic are quite rare. Also interviews are a two way street - this is your opportunity to find out about the role. Hopefully you will be enthused about it and

Re: [Beowulf] Clearing out scratch space

2018-06-12 Thread John Hearns via Beowulf
ata set coul dnever be re-analyzed in that level of detail, but the important aspects for the physics are stored long term. On 12 June 2018 at 15:41, Ellis H. Wilson III wrote: > On 06/12/2018 04:06 AM, John Hearns via Beowulf wrote: > >> In the topic on avoiding fr

Re: [Beowulf] Fwd: Project Natick

2018-06-10 Thread John Hearns via Beowulf
Stuart Midgley works for DUG? They are currently recruiting for an HPC manager in London... Interesting... On 7 June 2018 at 23:32, Chris Samuel wrote: > On Friday, 8 June 2018 1:38:11 AM AEST John Hearns via Beowulf wrote: > > > The report interestingly makes a comparison to

Re: [Beowulf] Working for DUG, new thead

2018-06-19 Thread John Hearns via Beowulf
>If a sys admin position involves shell prorgramming/scripting, knowing the details of a specific programming language or processor are secondary, but thinking like a >programmer is skill not everyone has or can develop.Just last week a wrote a Lua script without knowing a thing about Lua. I

Re: [Beowulf] batch systems connection

2018-05-29 Thread John Hearns via Beowulf
Mikhail, if it was me I would choose which batch system I liked most.. then install that on both clusters. As you already have an installation on one cluster then cloning that should be a lot less effort. You can then configure a routing queue between the clusters. On 29 May 2018 at 09:58,

Re: [Beowulf] OT, X11 editor which works well for very remote systems?

2018-06-07 Thread John Hearns via Beowulf
> Makes me think. 1st workshop. Can’t ever be the first time this question has been asked. Also David, absolutely not OT. Very much on topic. Maybe... For my contribution if you use Windows then MobaXterm is an excellent tool. IT wraps up Putty, VNC, Cygwin X server etc. etc in one package. For

Re: [Beowulf] OT, X11 editor which works well for very remote systems?

2018-06-07 Thread John Hearns via Beowulf
As I am on the subject, it can be hard to assess exactly what the problem is with remote graphics. Remember that a squeaky wheel gets more attention. I was involved with one link to a site in Europe which was using CAD remotely. I really was never sure whether or not the users were just unhappy

Re: [Beowulf] OT, X11 editor which works well for very remote systems?

2018-06-07 Thread John Hearns via Beowulf
Tony, not to be rude but not really. Teradici is more than thin terminals. They apply smart compression, which I am told compresses textual parts of the screen differently to graphics. They also have 'buidl to lossless' for slower links - so if you rotate a model it is blurry then sharpens up to

Re: [Beowulf] Project Natick

2018-06-06 Thread John Hearns via Beowulf
formance per watt than a > general processor, reducing the heatload in the capsule vs. doing the same > workload with only x86 processors. Does any one know what the intended > workload of this system is? > > On 06/06/2018 08:16 AM, John Hearns via Beowulf wrote: > > https:/

Re: [Beowulf] Project Natick

2018-06-07 Thread John Hearns via Beowulf
Thinking about submarines, I mentioned a UK secure site on another thread. That site may or may not have been something to do with submarines. I have never been on board a submarine, however if I was faced with the problem of cooling on board one I would shy away from what that article implies, ie

Re: [Beowulf] OT, X11 editor which works well for very remote systems?

2018-06-07 Thread John Hearns via Beowulf
> > On 2018-06-07 09:29, John Hearns via Beowulf wrote: > > Tony, not to be rude but not really. > > Teradici is more than thin terminals. They apply smart compression, > > which I am told compresses textual parts of the screen differently to > > graphics. > >

Re: [Beowulf] FPGA storage accelerator

2018-06-06 Thread John Hearns via Beowulf
> It’s so interesting to look at what was old being new again though. Lovely insight as always Joe! Indeed. Ideas always come around again in computing. My aphorism - always follow the herd. Look at what everyone is buying an implementing. Don't delve too deeply into the technical minutae of a

Re: [Beowulf] batch systems connection

2018-05-29 Thread John Hearns via Beowulf
Prentice - duuuh of course. As I remember Redhat changed the place where a source RPM places its source files. It was under /usr/src I think it now unpacks locally under roots home directory. Once can also unpack the files from an RPM using: rpm2cpio packagename.rpm | cpio -idv I cant

Re: [Beowulf] Working for DUG, new thead

2018-06-19 Thread John Hearns via Beowulf
This thread is going fast! Prentice Bisbal wrote: > I often wonder if that misleading marketing is one of the reasons why the Xeon Phi has already been canned. I know a lot of people who were excited for the Xeon Phi, but > I don't know any who ever bought the Xeon Phis once they came out. In

Re: [Beowulf] Working for DUG, new thead

2018-06-19 Thread John Hearns via Beowulf
Jim Lux wrote: > I've been intrigued recently about using GPUs for signal processing kinds of things.. There's not much difference between calculating vertices of triangles and doing FIR filters. Rather than look at hardware per se, how about learning about the Julia language for this task? I

Re: [Beowulf] Working for DUG, new thead

2018-06-19 Thread John Hearns via Beowulf
I should do my research... The Celeste project is the poster child for Julia https://www.nextplatform.com/2017/11/28/julia-language-delivers-petascale-hpc-performance/ They use up to 8092 Xeon Phi nodes at NERSC with threads... The per thread runtime graph is interesting there. Only a small

Re: [Beowulf] Bright Cluster Manager

2018-05-02 Thread John Hearns via Beowulf
Robert, I have had a great deal of experience with Bright Cluster Manager and I am happy to share my thoughts. My experience with Bright has been as a system integrator in the UK, where I deployed Bright for a government defence client, for a university in London and on our in-house cluster

Re: [Beowulf] Bright Cluster Manager

2018-05-02 Thread John Hearns via Beowulf
Chris Samuel says: >I've not used it, but I've heard from others that it can/does supply > schedulers like Slurm, but (at least then) out of date versions. Chris, this is true to some extent. When a new release of Slurm or, say, Singularity is out you need to wait for Bright to package it up and

Re: [Beowulf] Bright Cluster Manager

2018-05-03 Thread John Hearns via Beowulf
Jorg, I did not know that you used Bright. Or I may have forgotten! I thought you were a Debian fan. Of relevance, Bright 8 now supports Debian. You commented on the Slurm configuration file being changed. I found during the install at Greenwich, where we put in a custom slurm.conf, that

Re: [Beowulf] Bright Cluster Manager

2018-05-03 Thread John Hearns via Beowulf
Regarding storage, Chris Dagdigian comments: >And you know what? After the Isilon NAS was deployed the management of *many* petabytes of single-namespace storage was now handled by the IT Director in his 'spare time' -- And the five engineers who used to do nothing > >but keep ZFS from falling

Re: [Beowulf] Bright Cluster Manager

2018-05-03 Thread John Hearns via Beowulf
I agree with Doug. The way forward is a lightweight OS with containers for the applications. I think we need to learn from the new kids on the block - the webscale generation. They did not go out and look at how massive supercomputer clusters are put together. No, they went out and build scale out

Re: [Beowulf] nVidia revealed as evil

2018-01-06 Thread John Hearns via Beowulf
Tim, I hear that just before Christmas the Sanger had a wardrobe installed, filled with fur coats. Also a supply of Turkish Delight. On 3 January 2018 at 17:17, Tim Cutts wrote: > I am henceforth renaming my datacentre the “magical informatics cupboard” > > Tim > > On

Re: [Beowulf] [upgrade strategy] Intel CPU design bug & security flaw - kernel fix imposes performance penalty

2018-01-06 Thread John Hearns via Beowulf
Disabling branch prediction - that in itself will have an effect on performance. One thing I read about the hardware is that the table which holds the branch predictions is shared between processes running on the same CPU core. That is part of the attack process - the malicious process has

[Beowulf] Variable precision arithmetic

2018-01-06 Thread John Hearns via Beowulf
I did not want to hijack the thread on Nvidia cards. Doug Eadline as usual makes a very relevant point: "BTW, I find it interesting one of the most popular codes run on Nvidia GPUs is Amber (MD). It has been optimized to use SP when it can and many Amber users turn off ECC because it slows down

Re: [Beowulf] [upgrade strategy] Intel CPU design bug & security flaw - kernel fix imposes performance penalty

2018-01-06 Thread John Hearns via Beowulf
I guess we have all seen this: https://access.redhat.com/articles/3307751 If not, 'HPC Workloads' (*) such as HPL are 2-5% affected. However as someone who recently installed a lot of NVMe drives for a fast filesystem, the 8-19% performance hit on random IO to NVMe drives is not pleasing. (*)

Re: [Beowulf] What is a "dark market"

2018-01-10 Thread John Hearns via Beowulf
Jonathan,Marcus Hutchins is the British security expert who shut down the WannaCry ransomware, which was ravaging across the British National Health Service amongst many other organisations. He decoded the packets it was sending, then registered the unusual domain name which acted as a 'kill

Re: [Beowulf] [upgrade strategy] Intel CPU design bug & security flaw - kernel fix imposes performance penalty

2018-01-05 Thread John Hearns via Beowulf
mplain about perfs... So another question : what is your global > > strategy about upgrades on your clusters ? Do you upgrade it as often as > > you can ? One upgrade every X months (due to the downtime issue) ... ? > > > > Thanks, > > Best regardsRémy. > > > &g

Re: [Beowulf] cluster authentication part II

2018-01-15 Thread John Hearns via Beowulf
Jorg, I do not have the answer for you. One comment I have is that the GUI login will use different PAM modules from the command line ssh login. If you are looking for differences between your CentOS machine and Ubuntu I would also start by listing the PAM modules. I speak as someone who has a

[Beowulf] Infiniband switch topology display

2018-01-16 Thread John Hearns via Beowulf
I may hav easked this quesiton in the past. Does anyone know of a utility to display Infiniband switch topology and the links between switches? I am familiar with the command line tools, I more mean taking the outout of these tools and making a graphical plot. John H

Re: [Beowulf] Infiniband switch topology display

2018-01-19 Thread John Hearns via Beowulf
nox.com/docs/DOC-2379 > >> <https://na01.safelinks.protection.outlook.com/?url= > https%3A%2F%2Fcommunity.mellanox.com%2Fdocs%2FDOC- > 2379=02%7C01%7Cbabbott%40rutgers.edu%7Cf143cb999eb94c7f464808d55d05 > 4008%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0% > 7C636517199498683449=XOFgx4HgaddvLloo60SbW

Re: [Beowulf] Large Dell, odd IO delays

2018-02-14 Thread John Hearns via Beowulf
Hmmm... I will also chip in with my favourite tip Look at the sysctl for min_free_kbytesIt is often set very low. Increase this substantially. It will do no harm to your system (unless you set it ti an absurd value!) You should be looking at the vm dirty ratios etc. also On 15 February 2018

Re: [Beowulf] Heterogeneity in a tiny (two-system cluster)?

2018-02-16 Thread John Hearns via Beowulf
Ted, I would go for the more modern system. you say yourself the first system is two years old. In one or two years it will be out of warranty, and if a component breaks you will have to decide to buy that component or just junk they system. Actually, having said that you should look at the

Re: [Beowulf] Heterogeneity in a tiny (two-system cluster)?

2018-02-16 Thread John Hearns via Beowulf
Oh, and while you are at it. DO a bit of investigation on how the FVCOM model is optimised for use with AVX vectorisation. Hardware and clock speeds alone don't cut it. On 16 February 2018 at 09:39, John Hearns wrote: > Ted, > I would go for the more modern system. you

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread John Hearns via Beowulf
Oh, and use the Adaptive computing HPL calculator to get your input file. Thanks Adaptive guys! On 22 February 2018 at 16:44, Michael Di Domenico wrote: > i can't speak to AMD, but using HPL 2.1 on Intel using the Intel > compiler and the Intel MKL, i can hit 90% without

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread John Hearns via Beowulf
Prentice, I echo what Joe says. When doing benchmarking with HPL or SPEC benchmarks, I would optimise the BIOS settings to the highest degree I could. Switch off processor C) states As Joe says you need to look at what the OS is runnign in the background. I would disable the Bright cluster manager

Re: [Beowulf] Cluster Authentication (LDAP,NIS,AD)

2017-12-28 Thread John Hearns via Beowulf
Skylar, I admit my ignorance. What is a program map? Where I work now extensively uses automounter maps for bind mounts. I may well learn something useful here. On 28 December 2017 at 15:28, Skylar Thompson wrote: > We are an AD shop, with users, groups, and

Re: [Beowulf] allinea

2017-12-28 Thread John Hearns via Beowulf
Emmm.. you have contacted Allinea, right? Good bunch of guys. Debugger: (Irish) The person who sold you the system. (with due credit to the Devild Data Processing Dictionary) On 28 December 2017 at 17:19, Michael Di Domenico wrote: > i'm having trouble with allinea

Re: [Beowulf] Intel CPU design bug & security flaw - kernel fix imposes performance penalty

2018-01-03 Thread John Hearns via Beowulf
Thanks Chris. In the past there have been Intel CPU 'bugs' trumpeted, but generally these are fixed with a microcode update. This looks different, as it is a fundamental part of the chips architecture. However the Register article says: "It allows normal user programs – to discern to some extent

Re: [Beowulf] Intel motherboard BMC

2018-06-21 Thread John Hearns via Beowulf
ld not find anything we were doing > wrong > at the time. Quite often I found it is a really minor, but important bit > one > is missing out and later you think: why did I miss that. > > Thanks for your suggestions! > > Jörg > > Am Donnerstag, 21. Juni 2018, 12:07:48 BST

Re: [Beowulf] HPE iLO4 BMC authentication bypass

2018-06-21 Thread John Hearns via Beowulf
Oh, I just love that hacker with the black mask on hunched over the laptop (page 6). That's a fail straight away. As soon as you see someone on your campus with a black mask on you know he/she is up to no good. Regarding separate physical IPMI networks I have seen it done both ways. One site I

Re: [Beowulf] Intel motherboard BMC

2018-06-21 Thread John Hearns via Beowulf
tries to > boot > > > from > > > there and shuts off the other NIC which could explain that behaviour. > > > However, even disabling it in the BIOS did not solve the problem. > > > > > > I guess I will need to do some debugging here but without some go

Re: [Beowulf] Intel motherboard BMC

2018-06-21 Thread John Hearns via Beowulf
Jorg, recalling my experience with Intel. I did not come across the problem with IP address versus Hostname which you have. However I do recall that I had to configure the Admin user and the privilege level for that user on the LAN interface. In that case the additional BMC modules were being

Re: [Beowulf] Intel motherboard BMC

2018-06-21 Thread John Hearns via Beowulf
Hello Jorg. As you know I have worked a lot with Supermicro machines. I also installed Intel machines for Greenwich University, so I have experience of setting up IPMI on them. I will take time to try to understand your problem! Also Intel provides excellent documentation for all its products.

Re: [Beowulf] Intel motherboard BMC

2018-06-21 Thread John Hearns via Beowulf
B network. I was just wondering whether for some reason at one >> stage >> > > of >> > > the boot process the kernel recognises the IB card and then tries to >> boot >> > > from >> > > there and shuts off the other NIC which could expla

Re: [Beowulf] Jupyter and EP HPC

2018-07-29 Thread John Hearns via Beowulf
Cough. Julia. Cough. On Fri, 27 Jul 2018 8:47 pm Lux, Jim (337K), wrote: > I’ve just started using Jupyter to organize my Pythonic ramblings.. > > > > What would be kind of cool is to have a high level way to do some > embarrassingly parallel python stuff, and I’m sure it’s been done, but my >

Re: [Beowulf] Jupyter and EP HPC

2018-07-29 Thread John Hearns via Beowulf
https://github.com/JuliaParallel/ClusterManagers.jl Sorry for the terse reply. Warm evening sitting beside the Maschsee in Hannover. Modelling beer evaporation. On Fri, 27 Jul 2018 8:47 pm Lux, Jim (337K), wrote: > I’ve just started using Jupyter to organize my Pythonic ramblings.. > > > >

Re: [Beowulf] Lustre Upgrades

2018-07-27 Thread John Hearns via Beowulf
Jörg, then the days of the Tea Clipper Races should be revived. We have just the ship for it already. Powered by green energy, and built in Scotland of course. https://en.wikipedia.org/wiki/Cutty_Sark Just fill her hold with hard drives and set sail. Aaar me hearties. I can just see HPC types

Re: [Beowulf] Lustre Upgrades

2018-07-26 Thread John Hearns via Beowulf
ient memory). > You > are in trouble if it requires both. :-) > > All the best from a still hot London > > Jörg > > Am Dienstag, 24. Juli 2018, 17:02:43 BST schrieb John Hearns via Beowulf: > > Paul, thanks for the reply. > > I would like to ask, if I may. I ra

Re: [Beowulf] Lustre Upgrades

2018-07-27 Thread John Hearns via Beowulf
Jim, thankyou for that link. It is quite helpful! I have a poster accepted for the Julia Conference in two weeks time. My proposal is to discuss computers just like that - on the Manhattan project etc. Then to show how Julia can easily be used to solve the equation for critical mass from the Los

Re: [Beowulf] shared compute/storage WAS: Re: Lustre Upgrades

2018-07-26 Thread John Hearns via Beowulf
>in theory you could cap the performance interference using VM's and >cgroup controls, but i'm not sure how effective that actually is (no >data) in HPC. I looked quite heavily at performance capping for RDMA applications in cgroups about a year ago. It is very doable, however you need a recent

Re: [Beowulf] shared compute/storage WAS: Re: Lustre Upgrades

2018-07-26 Thread John Hearns via Beowulf
For VM substitute 'container' - since containerisation is intimately linked with cgroups anyway. Google 'CEPH Docker' and there is plenty of information. Someone I work with tried out CEPH on Dockerr the other day, and got into some knots regarding access to the actual hardware devices. He then

Re: [Beowulf] shared compute/storage WAS: Re: Lustre Upgrades

2018-07-26 Thread John Hearns via Beowulf
t 9:30 AM, John Hearns via Beowulf > wrote: > >>in theory you could cap the performance interference using VM's and > >>cgroup controls, but i'm not sure how effective that actually is (no > >>data) in HPC. > > > > I looked quite heavily at performance cappi

Re: [Beowulf] Lustre Upgrades

2018-07-26 Thread John Hearns via Beowulf
workload. BeeOND > is a really cool product, although the focus seems to be on more making it > easy to get "quick-and-dirty" BeeGFS system running on the compute nodes > than on maximum performance. > > > On Thu, Jul 26, 2018 at 3:53 AM, John Hearns via Beowulf < &g

Re: [Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA

2018-08-18 Thread John Hearns via Beowulf
Rather more seriously, this is a topic which is well worth discussing, What are best practices on patching HPC systems? Perhaps we need a separate thread here. I will throw in one thought, which I honestly do not want to see happening. I recently took a trip to Bletchley Park in the UK. On

Re: [Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA

2018-08-18 Thread John Hearns via Beowulf
*To patch, or not to patch, that is the question:* Whether 'tis nobler in the mind to suffer The loops and branches of speculative execution, Or to take arms against a sea of exploits And by opposing end them. To die—to sleep, No more; and by a sleep to say we end The heart-ache and the thousand

Re: [Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA

2018-08-23 Thread John Hearns via Beowulf
https://www.theregister.co.uk/2018/08/21/intel_cpu_patch_licence/ https://perens.com/2018/08/22/new-intel-microcode-license-restriction-is-not-acceptable/ On Tue, 21 Aug 2018 at 16:18, Lux, Jim (337K) wrote: > > > On 8/21/18, 1:37 AM, "Beowulf on behalf of Chris Samuel" < >

Re: [Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA

2018-08-23 Thread John Hearns via Beowulf
My bad. The license has been updated now https://www.theregister.co.uk/2018/08/23/intel_microcode_license/ On Thu, 23 Aug 2018 at 20:11, John Hearns wrote: > https://www.theregister.co.uk/2018/08/21/intel_cpu_patch_licence/ > > >

Re: [Beowulf] New Spectre attacks - no software mitigation - what impact for HPC?

2018-07-17 Thread John Hearns via Beowulf
This article is well worth a read, on European Exascale projects https://www.theregister.co.uk/2018/07/17/europes_exascale_supercomputer_chips/ The automotive market seems to have got mixed in there also! The main thrust dual ARM based and RISC-V Also I like the plexiglass air shroud pictured

Re: [Beowulf] New Spectre attacks - no software mitigation - what impact for HPC?

2018-07-17 Thread John Hearns via Beowulf
I guess I am not going to explain myself very clearly here. Maybe I wont make a coherent point. I think I read on The Next Platform at the time a comment along the lines of - "as CPU Mhz speeds cannot continue to rise, smart engineers who design CPUs have had to come up with mechanisms to increase

Re: [Beowulf] New Spectre attacks - no software mitigation - what impact for HPC?

2018-07-17 Thread John Hearns via Beowulf
In a partial answer to my own question it looks like Likwid can count Issued versus Retired instructions https://github.com/RRZE-HPC/likwid/wiki/TutorialStart On 17 July 2018 at 10:10, John Hearns wrote: > I guess I am not going to explain myself very clearly here. Maybe I wont > make a

[Beowulf] ServerlessHPC

2018-07-24 Thread John Hearns via Beowulf
All credit goes to Pim Schravendijk for coining a new term on Twitter today https://twitter.com/rdwrt https://twitter.com/rdwrt/status/1021761796498182144?s=03 We will all be doing it in six months time. ___ Beowulf mailing list, Beowulf@beowulf.org

Re: [Beowulf] SSD performance

2018-07-21 Thread John Hearns via Beowulf
You don't say for which purpose the SSDs are being used. In your laptop? As system disks in HPC compute nodes? Journalling drives in a parallel filesystem? Data storage drives in a parallel filesystem? Over on the CEPH mailing list there are regular topics on choice of SSDs. I would advise going

  1   2   3   >