[Beowulf] Storage Best Practices

2018-02-19 Thread Richter, Brian J {BIS}
Hey All, I was hoping to get some recommendations for Storage. Last year we set up our first HPC and I'm looking for a good strategy moving forward for Storage. We set up a dedicated space on the cluster for Storage that has 5.5 TB of space. This space can be quickly chewed up depending on the

Re: [Beowulf] Storage Best Practices

2018-02-19 Thread Skylar Thompson
For our larger groups, we'll meet with them regularly to discuss their space usage (and other IT needs). Even that's unlikely to be frequent enough, so we direct usage alerts to their designated "data manager" if they're getting close to running out of space. Aside from regularly clearing out

Re: [Beowulf] Varying performance across identical cluster nodes.

2018-02-19 Thread Prentice Bisbal
I know this is an old topic. I'm catching up on months' worth of mailing list mail right now. On 09/17/2017 09:09 PM, Christopher Samuel wrote: On 15/09/17 04:45, Prentice Bisbal wrote: I'm happy to announce that I finally found the cause this problem: numad. Very interesting, it sounds

Re: [Beowulf] Storage Best Practices

2018-02-19 Thread Tim Cutts
No policy here. People can keep stuff as long as they like. I don’t agree with that lack of policy, but that’s where we are. We did propose a 90 day limit, about 10 years ago. It lasted about, er, 90 days, before faculty started screaming. ☺ Tim On 19/02/2018, 15:31, "Beowulf on behalf of

Re: [Beowulf] What is rdma, ofed, verbs, psm etc?

2018-02-19 Thread Prentice Bisbal
On 09/19/2017 12:24 PM, Peter Kjellström wrote: On Tue, 19 Sep 2017 09:27:55 -0600 Faraz Hussain wrote: I have never understood what these acronyms are. I've been involved with HPC on the applications side for many years and hear these terms pop up now and then. I've

Re: [Beowulf] Varying performance across identical cluster nodes.

2018-02-19 Thread Prentice Bisbal
Finally catching up months and months of beowulf e-mails. On 09/18/2017 05:20 AM, Håkon Bugge wrote: On 18 Sep 2017, at 03:09, Christopher Samuel wrote: On 15/09/17 04:45, Prentice Bisbal wrote: I'm happy to announce that I finally found the cause this problem: numad.

Re: [Beowulf] Storage Best Practices

2018-02-19 Thread Adam DeConinck
The best success I've seen is a mix of strategies: some amount storage that's considered "permanent" with aggressive quota limits (per user or per project), plus a larger no-quota space that's cleaned up based on some retention policy. I've used this basic scheme in small environments, and also

Re: [Beowulf] Storage Best Practices

2018-02-19 Thread Adam DeConinck
The best success I've seen is a mix of strategies: some amount storage that's considered "permanent" with aggressive quota limits (per user or per project), plus a larger no-quota space that's cleaned up based on some retention policy. I've used this basic scheme in small environments, and also