Re: [Beowulf] Fwd: Project Natick

2018-06-08 Thread Jonathan Aquilina
If I’m not mistaken said person mention is a subscriber on this list Sent from my iPhone > On 08 Jun 2018, at 15:52, Prentice Bisbal wrote: > > >> On 06/07/2018 05:32 PM, Chris Samuel wrote: >>> On Friday, 8 June 2018 1:38:11 AM AEST John Hearns via Beowulf wrote: >>> >>> The report

Re: [Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

2018-06-08 Thread Bill Abbott
We set PriorityFavorSmall=NO and PriorityWeightJobSize to some appropriately large value in slurm.conf, which helps. We also used to limit the number of total jobs a single user could run to something like 30% of the cluster, so a user could run a single mpi job that takes all nodes, but

Re: [Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

2018-06-08 Thread David Mathog
This isn't quite the same issue, but several times I have observed a large multiCPU machine lock up because the accounting records associates with a zillion tiny rapidly launched jobs made an enormous /var/account/pacct file and filled the small root filesystem. Actually it wasn't usually

Re: [Beowulf] Project Natick

2018-06-08 Thread Prentice Bisbal
On 06/07/2018 05:21 PM, Chris Samuel wrote: On Friday, 8 June 2018 1:03:06 AM AEST Prentice Bisbal wrote: And I'm definitely not an expert on Orkney, but I did work with a guy from Scotland, and I'm pretty sure he had stories about how sparsely populated Orkney was due to the rugged terrain

Re: [Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

2018-06-08 Thread Paul Edmon
Yeah this one is tricky.  In general we take the wildwest approach here, but I've had users use --contiguous and their job takes forever to run. I suppose one method would would be enforce that each job take a full node and parallel jobs always have contiguous.  As I recall Slurm will

Re: [Beowulf] Fwd: Project Natick

2018-06-08 Thread Prentice Bisbal
On 06/07/2018 05:32 PM, Chris Samuel wrote: On Friday, 8 June 2018 1:38:11 AM AEST John Hearns via Beowulf wrote: The report interestingly makes a comparison to cruise lines and the US Navy having large IT infrastructures at sea. Some oil & gas companies have HPC systems onboard their survey

Re: [Beowulf] OT, X11 editor which works well for very remote systems

2018-06-08 Thread Stu Midgley
I run resilio sync to provide a my own cloud-like sync of all my scripts/code to all my offices/machines. It is fast enough that I edit with TextMate on my mac and by the time I switch to a terminal window, its already synced etc. -- Dr Stuart Midgley sdm...@gmail.com

Re: [Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

2018-06-08 Thread Andrew Mather
Hi Chris, > Message: 2 > Date: Fri, 08 Jun 2018 17:21:56 +1000 > From: Chris Samuel > To: beowulf@beowulf.org > Subject: [Beowulf] Avoiding/mitigating fragmentation of systems by > small jobs? > Message-ID: <2427060.afPWsf2KXH@quad> > Content-Type: text/plain; charset="us-ascii" > >

[Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

2018-06-08 Thread Chris Samuel
Hi all, I'm curious to know what/how/where/if sites do to try and reduce the impact of fragmentation of resources by small/narrow jobs on systems where you also have to cope with large/wide parallel jobs? For my purposes a small/narrow job is anything that will fit on one node (whether a

Re: [Beowulf] OT, X11 editor which works well for very remote systems

2018-06-08 Thread John Hearns via Beowulf
> VNC takes over the console on the remote machine. What if somebody else is using that, or there isn't one (headless server)? David, are you sure about that? I did at lot of work in F1 on VNC to workstations... as I remember VNC sessions are not on the 'root window' by default. I did a lot of

Re: [Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

2018-06-08 Thread John Hearns via Beowulf
Chris, good question. I can't give a direct asnwer there, but let me share my experiences. In the past I managed SGI ICE clusters and a large memory UV system with PBSPro queuing. The engineers submitted CFD solver jobs using scripts, and we only allowed them to use a multiple of N cpus, in fact

Re: [Beowulf] Project Natick

2018-06-08 Thread Lux, Jim (337K)
I imagine it would have to be filtered, too, to keep small marine life and debris from clogging up the piping. Filtered through a Millipore filter? Keeping things from growing on your underwater stuff is tough. Of course, you could just make it out of copper, good thermal

Re: [Beowulf] Fwd: Project Natick

2018-06-08 Thread Lux, Jim (337K)
On 6/7/18, 8:27 AM, "Beowulf on behalf of Joe Landman" wrote: On 06/07/2018 11:18 AM, Douglas Eadline wrote: > -snip- >> i'm not sure i see a point in all this anyhow, it's a neat science >> experiment, but what's the ROI on sinking a container full of servers

Re: [Beowulf] FPGA storage accelerator

2018-06-08 Thread Lux, Jim (337K)
ATM… I remember it well- it was going to be: “100 Mbps + to the desktop!” so you don’t have rely on an ISDN PRI or a 10Mbps ethernet. The next step past ISDN (“It still doesn’t network”) although I had a BRI to my apartment, then house, for several years. We had ISDN phones at JPL for a

Re: [Beowulf] Fwd: Project Natick

2018-06-08 Thread Lux, Jim (337K)
On 6/7/18, 7:48 AM, "Beowulf on behalf of Michael Di Domenico" wrote: On Thu, Jun 7, 2018 at 10:20 AM, Prentice Bisbal wrote: > > I imagine it would have to be filtered, too, to keep small marine life and > debris from clogging up the piping. I wonder if any forms of marine

Re: [Beowulf] OT, X11 editor which works well for very remote systems?

2018-06-08 Thread Lux, Jim (337K)
From: Beowulf on behalf of Tim Cutts Date: Thursday, June 7, 2018 at 8:03 AM To: James Cuff Cc: "beowulf@beowulf.org" , mathog Subject: Re: [Beowulf] OT, X11 editor which works well for very remote systems? I think X11 was a fairly good idea, in theory, for how to deal with this problem.

Re: [Beowulf] Project Natick

2018-06-08 Thread Chris Samuel
On Saturday, 9 June 2018 6:36:05 AM AEST Lux, Jim (337K) wrote: > Where, exactly, is this... I spent a week in Orkney a few years ago (we > drove up from Glasgow, also an interesting proposition) - it's worth a > visit for the prehistoric archaeology (largest prehistoric settlement still >

Re: [Beowulf] Fwd: Project Natick

2018-06-08 Thread Lux, Jim (337K)
That is a most excellent book. And it brings to mind some of the more complex aspects of maintenance for that cluster. I think that’s actually an important area for cluster development – most of the work, to date, has been in building high performance computing in environments that are easily

Re: [Beowulf] Project Natick

2018-06-08 Thread Lux, Jim (337K)
Where, exactly, is this... I spent a week in Orkney a few years ago (we drove up from Glasgow, also an interesting proposition) - it's worth a visit for the prehistoric archaeology (largest prehistoric settlement still preserved at Skara Brae, for instance, but realistically the dig at the Ness

Re: [Beowulf] FPGA storage accelerator

2018-06-08 Thread Chris Samuel
On Saturday, 9 June 2018 5:57:02 AM AEST Lux, Jim (337K) wrote: > ATM… I remember it well- it was going to be: “100 Mbps + to the desktop!” > so you don’t have rely on an ISDN PRI or a 10Mbps ethernet. Ah yes, I had an OC3 link to my DEC Alpha from the network we built, 155Mbps to my desktop