Re: [Beowulf] NFS alternative for 200 core compute (beowulf) cluster

2023-08-10 Thread Renfro, Michael
As the definitely-not-proud owner of a 2016 purchase of a 60-bay disk shelf 
attached to a single server with an Infiniband connection back to 54 compute 
nodes, NFS on spinning disks can definitely handle 5 40-core jobs, but your 
particular setup really can’t. Mine has hit its limits at times as well, but 
it’s about the IOPS from the disk array, the speed of the SAS cable connecting 
the disk shelf to the server, everything *but* NFS itself.

Swapping to NVMe should make a world of difference on its own, as long as you 
don’t have a bottleneck of 1 Gb Ethernet between your storage and the compute 
capacity.

From: Beowulf  on behalf of leo camilo 

Date: Thursday, August 10, 2023 at 3:04 PM
To: Jeff Johnson 
Cc: Bernd Schubert , Beowulf@beowulf.org 

Subject: Re: [Beowulf] NFS alternative for 200 core compute (beowulf) cluster

External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.


Awesome, thanks for the info!
Best,

leo

On Thu, 10 Aug 2023 at 22:01, Jeff Johnson 
mailto:jeff.john...@aeoncomputing.com>> wrote:
Leo,

Both BeeGFS and Lustre require a backend file system on the disks themselves. 
Both Lustre and BeeGFS support ZFS backend.

--Jeff


On Thu, Aug 10, 2023 at 1:00 PM leo camilo 
mailto:lhcam...@gmail.com>> wrote:
Hi there,
thanks for your response.

BeeGFS indeed looks like a good call option, though realistically I can only 
afford to use a single node/server for it.
Would it be feasible to use zfs as volume manager coupled with BeeGFS for the 
shares, or should I write zfs off all together?
thanks again,
best,
leo

On Thu, 10 Aug 2023 at 21:29, Bernd Schubert 
mailto:bernd.schub...@fastmail.fm>> wrote:


On 8/10/23 21:18, leo camilo wrote:
> Hi everyone,
>
> I was hoping I would seek some sage advice from you guys.
>
> At my department we have build this small prototyping cluster with 5
> compute nodes,1 name node and 1 file server.
>
> Up until now, the name node contained the scratch partition, which
> consisted of 2x4TB HDD, which form an 8 TB striped zfs pool. The pool is
> shared to all the nodes using nfs. The compute nodes and the name node
> and compute nodes are connected with both cat6 ethernet net cable and
> infiniband. Each compute node has 40 cores.
>
> Recently I have attempted to launch computation from each node (40 tasks
> per node), so 1 computation per node.  And the performance was abysmal.
> I reckon I might have reached the limits of NFS.
>
> I then realised that this was due to very poor performance from NFS. I
> am not using stateless nodes, so each node has about 200 GB of SSD
> storage and running directly from there was a lot faster.
>
> So, to solve the issue,  I reckon I should replace NFS with something
> better. I have ordered 2x4TB NVMEs  for the new scratch and I was
> thinking of :
>
>   * using the 2x4TB NVME in a striped ZFS pool and use a single node
> GlusterFS to replace NFS
>   * using the 2x4TB NVME with GlusterFS in a distributed arrangement
> (still single node)
>
> Some people told me to use lustre,but I reckon that might be overkill.
> And I would only use a single fileserver machine(1 node).
>
> Could you guys give me some sage advice here?
>

So glusterfs is using fuse, which doesn't have the best performance
reputation (although hopefully not for long - feel free to search for
"fuse" + "uring").

If you want to avoid complexity of Lustre, maybe look into BeeGFS. Well,
I would recommend to look into it anyway (as former developer I'm biased
again ;) ).


Cheers,
Bernd
___
Beowulf mailing list, Beowulf@beowulf.org sponsored 
by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


--
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Data Destruction

2021-09-29 Thread Renfro, Michael
I have to wonder if the intent of the DUA is to keep physical media from 
winding up in the wrong hands. If so, if the servers hosting the parallel 
filesystem (or a normal single file server) is physically secured in a data 
center, and the drives are destroyed on decommissioning, that might satisfy the 
requirements.

From: Beowulf  on behalf of Paul Edmon via Beowulf 

Date: Wednesday, September 29, 2021 at 9:15 AM
To: Scott Atchley 
Cc: Beowulf Mailing List 
Subject: Re: [Beowulf] Data Destruction

External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.



The former.  We are curious how to selectively delete data from a parallel 
filesystem.  For example we commonly use Lustre, ceph, and Isilon in our 
environment.  That said if other types allow for easier destruction of 
selective data we would be interested in hearing about it.

-Paul Edmon-
On 9/29/2021 10:06 AM, Scott Atchley wrote:
Are you asking about selectively deleting data from a parallel file system 
(PFS) or destroying drives after removal from the system either due to failure 
or system decommissioning?

For the latter, DOE does not allow us to send any non-volatile media offsite 
once it has had user data on it. When we are done with drives, we have a very 
big shredder.

On Wed, Sep 29, 2021 at 9:59 AM Paul Edmon via Beowulf 
mailto:beowulf@beowulf.org>> wrote:
Occassionally we get DUA (Data Use Agreement) requests for sensitive
data that require data destruction (e.g. NIST 800-88). We've been
struggling with how to handle this in an era of distributed filesystems
and disks.  We were curious how other people handle requests like this?
What types of filesystems to people generally use for this and how do
people ensure destruction?  Do these types of DUA's preclude certain
storage technologies from consideration or are there creative ways to
comply using more common scalable filesystems?

Thanks in advance for the info.

-Paul Edmon-

___
Beowulf mailing list, Beowulf@beowulf.org sponsored 
by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] HPC for community college?

2020-02-22 Thread Renfro, Michael
Late to the party, but I’m working with some XSEDE folks on adapting their 
XSEDE Compatible Basic Cluster (XCBC) based on Warewulf and OpenHPC to 
environments that don’t have 24/7 compute hardware.

The goal being to have an OHPC installation running on regular Windows lab 
computers when possible. Dual-booting the PCs between a regular Windows install 
and a diskless OHPC setup. And almost entirely non-invasive to a 
Windows-centric IT environment.

Waiting on approval on some DHCP server changes before we do a test in a real 
lab of ours.

On Feb 22, 2020, at 1:54 AM, John Hearns via Beowulf  
wrote:

Thinking about the applications to be run at a community college, the concept 
of a local weather forecast has been running around in my head lately.
The concept would be to install and run WRF, perhaps overnight, and produce a 
weather forecast in the morning.
I suppose this hinges on WRF having a sufficiently small scale for local 
forecasting and on being able to download
input data every day.

Your thoughts please?






On Sat, 22 Feb 2020 at 03:43, Douglas Eadline 
mailto:deadl...@eadline.org>> wrote:

That is the idea behind the Limulus systems -- a personal (or group) small
turn-key cluster that can deliver local HPC performance.
Users can learn HPC software, administration, and run production
codes on performance hardware.

I have been calling these "No Data Center Needed"
computing systems (or as is now the trend "Edge" computing).
These systems have a different power/noise/heat envelope
than a small pile of data center servers (i.e. you can use
them next to your desk, in a lab or classroom, at home etc.)

Performance is optimized to fit in an ambient power/noise/heat
envelope. Basement Supercomputing recently started shipping
updated systems with uATX blades and 65W Ryzen processors
(with ECC), more details are on the data sheet (web page not
updated to new systems just yet)

https://www.basement-supercomputing.com/download/limulus-data-sheets/Limulus_ALL.pdf

Full disclosure, I work with Basement Supercomputing.

--
Doug

>
> Is there a role for a modest HPC cluster at the community college?
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org 
> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>


--
Doug



--
Doug

___
Beowulf mailing list, Beowulf@beowulf.org sponsored 
by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] [EXTERNAL] Re: Have machine, will compute: ESXi or bare metal?

2020-02-10 Thread Renfro, Michael
At least in my case, I don’t do anything VM-specific for my setups, and treat 
them as close to bare metal as I can:

- I start with a router VM (pfSense, shorewall, etc,)
- I set up one or more dumb layer 2 switch interconnects among the router and 
other nodes as needed
- I start provisioning management and other nodes: setting up DHCP and PXE, no 
cloning of installed VMs, etc.
- I work over ssh as soon as it’s available

On Feb 10, 2020, at 7:55 AM, Lux, Jim (US 337K) via Beowulf 
 wrote:

One comment on “building a cluster with VMs”

Part of bringing up a cluster is learning how to manage the interconnects, and 
loading software into the nodes, and then finding the tools to manage a bunch 
of different machines simultaneously, as well as issues around shared network 
drives, boot images, etc.

I would think (but have not tried) that the multi-VM approach is a bit too 
unrealistically easy – I assume you can do MPI between VMs, so you could 
certainly practice with parallel coding.  But it seems that spinning up 
identical instances, all that can see the same host resources, on the same 
machine with the same display and keyboard kind of bypasses a lot of the hard 
stuff.

OTOH, If you want a cheap experience at getting the booting working, 
controlling multiple machines, learning pdsh, etc. you could just get 3 or 4 
Rpis or beagles, and face all the problems of a real cluster (including 
managing a rat’s nest of wires and cables)



From: Beowulf  on behalf of 
"jaquil...@eagleeyet.net" 
Date: Sunday, February 9, 2020 at 10:30 PM
To: "Renfro, Michael" , "beowulf@beowulf.org" 

Subject: [EXTERNAL] Re: [Beowulf] Have machine, will compute: ESXi or bare 
metal?

Hi Guys just piggy backing on this thread

I am considering upgrading my pc to 64gb of ram and setting it up as a win 10 
based hyper-v host. Would you say this is a good way to learn how to put a 
cluster together with out the need to invest in a small number of servers? My 
pc is a ryzen 5 3600 6 core 12 thread cpu motherboard is an msi b450 tomahawk 
max gaming motherboard currently 32gb ddr4 3200 upgradable to 64.

Let me know your thoughts.

Regards,
Jonathan Aquilina

EagleEyeT
Phone +356 20330099
Sales – sa...@eagleeyet.net<mailto:sa...@eagleeyet.net>
Support – supp...@eagleeyet.net

From: Beowulf  On Behalf Of Renfro, Michael
Sent: Monday, 10 February 2020 03:17
To: beowulf@beowulf.org
Subject: Re: [Beowulf] Have machine, will compute: ESXi or bare metal?

No reason you can’t, especially if you’re not interested in benchmark runs 
(there’s a chance that if you ran a lot of heavily-loaded VMs, there could be 
CPU contention on the host).

Any cluster development work I’ve done lately has used VMware VMs exclusively.



On Feb 9, 2020, at 7:10 PM, Mark Kosmowski 
 wrote:

External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.



I purchased a Cisco UCS C460 M2 (4 @ 10 core Xeons, 128 GB total RAM) for $115 
in my local area.  If I used ESXi (free license), I am limited to 8 vcpu per 
VM.  Could I make a virtual Beowulf cluster out of some of these VMs?  I'm 
thinking this way I can learn cluster admin without paying the power bill for 
my ancient Opteron boxes and also scratch my illumos itch while computing on 
Linux.

Thank you!
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] Have machine, will compute: ESXi or bare metal?

2020-02-09 Thread Renfro, Michael
No reason you can’t, especially if you’re not interested in benchmark runs 
(there’s a chance that if you ran a lot of heavily-loaded VMs, there could be 
CPU contention on the host).

Any cluster development work I’ve done lately has used VMware VMs exclusively.

On Feb 9, 2020, at 7:10 PM, Mark Kosmowski 
 wrote:



External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.



I purchased a Cisco UCS C460 M2 (4 @ 10 core Xeons, 128 GB total RAM) for $115 
in my local area.  If I used ESXi (free license), I am limited to 8 vcpu per 
VM.  Could I make a virtual Beowulf cluster out of some of these VMs?  I'm 
thinking this way I can learn cluster admin without paying the power bill for 
my ancient Opteron boxes and also scratch my illumos itch while computing on 
Linux.

Thank you!

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] HPC demo

2020-01-13 Thread Renfro, Michael
The homepage for your company specifically advertises HPC services and 
expertise. Which upper management would need the demo, and are there any 
applications they’re interested in?

On Jan 13, 2020, at 6:35 PM, John McCulloch  wrote:
I recently inherited management of a cluster and my knowledge is limited to a 
bit of Red Hat. I need to figure out a demo for upper management graphically 
demonstrating the speed up of running a parallel app on one x86 node versus 
multiple nodes up to 36. They have dual Gold 6132 procs and Mellanox EDR 
interconnect. Any suggestions would be appreciated.

Respectfully,
John McCulloch | PCPC Direct, Ltd.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] 2 starting questions on how I should proceed for a correct first micro-cluster (2-nodes) building

2019-03-02 Thread Renfro, Michael
Heterogeneous is possible, but the slower system will be a bottleneck if you 
have calculations that require both systems to work in parallel and synchronize 
with each other periodically. You might also find bottlenecks with your network 
interconnect, even on homogeneous systems.

I’ve never used ROCKS, and OSCAR doesn’t look to have been updated in a few 
years (maybe it doesn’t need to be). OpenHPC is a similar product, more 
recently updated. But except for the cluster I manage now, I always just just 
went with a base operating system for the nodes and added HPC libraries and 
services as required.

> On Mar 2, 2019, at 7:34 AM, Marco Ippolito  wrote:
> 
> Hi all,
> 
> I'm developing an application which need to use tools and other applications 
> that excel in a distributed environment: 
> - HPX ( https://github.com/STEllAR-GROUP/hpx ) , 
> - Kafka ( http://kafka.apache.org/ )
> - a blockchain tool.
> This is why I'm eager to learn how to deploy a beowulf cluster.
> 
> I've read some info here:
> - https://en.wikibooks.org/wiki/Building_a_Beowulf_Cluster
> - https://www.linux.com/blog/building-beowulf-cluster-just-13-steps
> - 
> https://www-users.cs.york.ac.uk/~mjf/pi_cluster/src/Building_a_simple_Beowulf_cluster.html
> 
> And I have 2 starting questions in order to clarify how I should proceed for 
> a correct cluster building:
> 
> 1) My starting point is a PC, I'm working with at the moment, with this 
> features:
>   - Corsair Simm Memoria RAM, DDR3, PC1600, 32GB, CL10 Ven k  
>   - Intel Ci7 Box Processore CPU 1150 i7-4790K, 4.00 GHz  
>   - Samsung MZ-76E500B Unità SSD Interna 860 EVO, 500 GB, 2.5" SATA III, 
> Nero/Grigio  
>   - MB ASUS H97-PLUS  
>- lettore DVD-RW
> 
>   I'm using as OS Ubuntu 18.04.01 Server Edition.
> 
> On one side I read that it should be better to put in the same cluster the 
> same type of HW : PCs of the same type, 
> but on the other side also hetherogeneous HW (server or PCs) can also be 
> deployed.
> Sowhich HW should I take in consideration for the second node, if the 
> features of the very first "node" are the ones above?
> 
> 2) I read that some software (Rocks, OSCAR) would make the cluster 
> configuration easier and smoother. But I also read that 
>  using the same OS,
> with the right same version, for all nodes, in my case Ubuntu 18.04.01 Server 
> Edition, could be a safe starter.
> So... is it strictly necessary to use Rocks or OSCAR to correctly configure 
> the nodes network?
> 
> Looking forward to your kind hints and suggestions.
> Marco
> 
> 
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf