from:"Joe Landman"

[Beowulf] position adverts?

2024-02-22 Thread Joe Landman


Hi fellow beowulfers

  I don't know if its bad form to post job adverts here.  Day job 
(@AMD) is looking for lots of HPC (and AI) folks, think 
debugging/support/etc. .  Happy to talk with anyone about this.


  Regards

Joe

--
Joe Landman
e:joe.land...@gmail.com
t: @hpcjoe
w:https://scalability.org
g:https://github.com/joelandman
l:https://www.linkedin.com/in/joelandman
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Your thoughts on the latest RHEL drama?

2023-06-26 Thread Joe Landman

 RHEL. There used to be Scientific Linux (SL), which was 
maintained by the DOE at FermiLab, but FermiLab decided that the world 
didn't need both SL and CentOS, since they were essentially the same 
thing. Not long after, RHEL eliminates CentOS as a competitor by 
changing it to "CentOS Stream" so it's no longer a competitor to RHEL. 
CentOS Stream is now a development version of sorts for RHEL, but I 
thought that was exactly what Fedora was for.


5. When Alma and Rocky pop-up to fill the void created by the killing 
of CentOS, RH does what it can to eliminate their access from RHEL 
source code so they can't be competitiors to RHEL, which brings us to 
today.


Somewhere around event #3 is when I started viewing RHEL from as the 
MS of the Linux world for obvious reasons. It seems that RH is 
determined to make RHEL a monopoly of the "Enterprise Linux" market. 
Yes, I know there's Ubuntu and SLES, but Ubuntu is viewed as a desktop 
more than a server OS (IMO), and SLES hasn't really caught on, at 
least not in the US.


I feel that every time the open-source community ratchets up efforts 
to preserve free alternatives to RHEL, RH ratchets up their efforts to 
eliminate any competition, so trying to stick with a free alternative 
to RHEL is ultimately going to be futile, so know is a good time to 
consider changing to a different line of Linux distro.


The price of paying for RHEL subscriptions isn't the only concern. 
Besides cost, one of the reasons Linux has become the de facto OS for 
HPC was how quickly/easily/cheaply it could be ported to new hardware. 
Don Becker wrote or modified many of the Linux Ethernet drivers that 
existed in the mid/late 90s so they could be used for Beowulf 
clusters, for example. When the Itanium processor came out, I remember 
reading that a Linux developer was able to port Linux to the Itanium 
and got Linux running on it in only a matter of hours.


With RH (and IBM?) so focused on market dominance/profits, it's not a 
stretch to think they they'll eventually "say no" to supporting 
anything other than x86 and POWER processors, since the other 
processors don't have enough market share to make it profitable, or 
compete with IBM's offerings.  I mean, right now it's extremely rare 
to find any commercial application that supports anything other than 
x86_64 (other than Mac applications that now support Apple's M 
processors, which is a relatively new development).


My colleagues here agree with my conclusions about the future of RHEL 
and, we are certainly giving the thought of moving away from RHEL some 
serious consideration, but it's certainly not going to be cheap or 
easy. What are you thinking/doing about this?


--
Prentice

___
Beowulf mailing list,Beowulf@beowulf.org  sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) 
visithttps://beowulf.org/cgi-bin/mailman/listinfo/beowulf


--
Joe Landman
e:joe.land...@gmail.com
t: @hpcjoe
w:https://scalability.org
g:https://github.com/joelandman
l:https://www.linkedin.com/in/joelandman
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [External] Re: old sm/sgi bios

2023-03-23 Thread Joe Landman

I remember in 1999/2000 I did a simple BLAST benchmark on my SGI Pentium 
workstation running linux, and one on my R10k box. Similar clock 
speeds.  Pentium was 2x faster in Int heavy calcs. Did similar things 
with FP, and discussed the results with my team and engineering.


None of them were happy.

I left SGI in 2001, after a discussion with the SVP of engineering, 
Warren Pratt, who insisted that the future was not linux clusters, but 
large beefy many cpu shared memory servers.  I put my money/future where 
my mouth was.


Today, I run on rackmount 1 and 2U boxes containing 80-128 CPU (cores), 
and 2TB physical ram.


So yeah.  He was right.

But they run linux, and are x86_64 arch.


On 3/23/23 15:08, Prentice Bisbal via Beowulf wrote:


Yeah, that whole situation was frustrating. From 2004 -2007, I was 
working for a pharmaceutical startup supporting their Computer-Aided 
Drug Discovery (CADD) team. Had I been hired before the director of 
CADD, it would have been a 100% Linux shop. Instead, as soon as he was 
hired he started insisting, and circulated a memo, stating that Linux 
was still a toy for hobbyists and "not ready for primetime" (he used 
that exact quote). So we spent tens of thousands of dollars on two 
Octanes and the 8-way Origin 350. I got a Linux workstation as a 
proof-of-concept, and that HP Workstation running Linux that cost only 
a few thousand dollars ran circles around those SGI boxes, and when 
cost was factored in FLOPS/$ was like 10x better than the SGI hardware 
at that point. And all of that hardware was bought used ("remarketed") 
from SGI, so new hardware would have compared significantly worse in 
terms of value.


Also, it turned out the director of CADD owned a nontrivial amount of 
SGI stocks, so not only was an he over-the-hill curmudgeon afraid of 
new technology, there was also a pretty clear conflict of interest for 
him to be pushing SGI, even though I'm sure our small purchase did 
nothing to improve SGI stock value.


On 3/23/23 2:58 PM, Joe Landman wrote:


They had laid off all the good people doing workstations by then, I 
think they outsourced design/production to ODMs by that time.  MIP 
processors were long in the tooth in 1999, never mind 2007.


Having been at SGI from 1995-2001, I can tell you the reason MIPS 
sucked wind at that point, was the good ship Itanic sunk Alien and 
Beast processors.  Those design teams left, and we didn't have much 
for post R10k, other than respins and shrinks of R10k.  Which were 
renamed R12k, R14k ...


Beast would have been relevant (near EOL though) in 2007.


On 3/23/23 14:53, Prentice Bisbal via Beowulf wrote:


Between 2003 and 2007, I worked with a lot of O2s, Octanes, an 8-way 
Origin 350, and even a Tezro. I don't miss those days.


I always felt like the design of their workstations was done by the 
same people who design Playskool toys rather than professional 
hardware.


Prentice Bisbal
Senior HPC Engineer
Computational Sciences Department
Princeton Plasma Physics Laboratory
Princeton, NJ
https://cs.pppl.gov
https://www.pppl.gov
On 3/23/23 1:08 PM, Ryan Novosielski via Beowulf wrote:
Seriously. I have an Indy and an Octane2 laying around. That’s not 
even an SGI. :-P


--
#BlackLivesMatter

|| \\UTGERS, 
 |---*O*---

||_// the State |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ 
RBHS Campus
||  \\    of NJ | Office of Advanced Research Computing - MSB 
A555B, Newark

     `'

On Mar 23, 2023, at 13:07, Michael DiDomenico 
 wrote:


ack, irix flashbacks... :) fortunately, this machine isn't quite that
old, circa 2013

On Thu, Mar 23, 2023 at 1:02 PM Darren Wise via Beowulf
 wrote:


Hello,

I don't personally have such myself but anything SGI even with 
today's

exotics Ians about the most knowledgable I know of globally and
seriously dying out within this sector, I hazard a guess your SM 
board

would be considered quite new compared to other systems.

http://www.sgidepot.co.uk/sgidepot/


Kind Regards,
Darren Wise
Research Engineer, Mechatronics
https://wisecorp.co.uk, .us & .ru

On 23/03/2023 16:51, Michael DiDomenico wrote:

does anyone happen to have an old sgi / supermicro bios for an
X9DRG-QF+ motherboard squirreled away somewhere?  sgi is long gone,
hpe might have something still but who knows where.  i reached 
out to

supermicro, but i suspect they'll say no.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin 
Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin 
Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/

Re: [Beowulf] [External] Re: old sm/sgi bios

2023-03-23 Thread Joe Landman

They had laid off all the good people doing workstations by then, I 
think they outsourced design/production to ODMs by that time. MIP 
processors were long in the tooth in 1999, never mind 2007.


Having been at SGI from 1995-2001, I can tell you the reason MIPS sucked 
wind at that point, was the good ship Itanic sunk Alien and Beast 
processors.  Those design teams left, and we didn't have much for post 
R10k, other than respins and shrinks of R10k.  Which were renamed R12k, 
R14k ...


Beast would have been relevant (near EOL though) in 2007.


On 3/23/23 14:53, Prentice Bisbal via Beowulf wrote:


Between 2003 and 2007, I worked with a lot of O2s, Octanes, an 8-way 
Origin 350, and even a Tezro. I don't miss those days.


I always felt like the design of their workstations was done by the 
same people who design Playskool toys rather than professional hardware.


Prentice Bisbal
Senior HPC Engineer
Computational Sciences Department
Princeton Plasma Physics Laboratory
Princeton, NJ
https://cs.pppl.gov
https://www.pppl.gov
On 3/23/23 1:08 PM, Ryan Novosielski via Beowulf wrote:
Seriously. I have an Indy and an Octane2 laying around. That’s not 
even an SGI. :-P


--
#BlackLivesMatter

|| \\UTGERS,  |---*O*---
||_// the State |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ 
RBHS Campus
||  \\    of NJ | Office of Advanced Research Computing - MSB 
A555B, Newark

     `'

On Mar 23, 2023, at 13:07, Michael DiDomenico 
 wrote:


ack, irix flashbacks... :) fortunately, this machine isn't quite that
old, circa 2013

On Thu, Mar 23, 2023 at 1:02 PM Darren Wise via Beowulf
 wrote:


Hello,

I don't personally have such myself but anything SGI even with today's
exotics Ians about the most knowledgable I know of globally and
seriously dying out within this sector, I hazard a guess your SM board
would be considered quite new compared to other systems.

http://www.sgidepot.co.uk/sgidepot/


Kind Regards,
Darren Wise
Research Engineer, Mechatronics
https://wisecorp.co.uk, .us & .ru

On 23/03/2023 16:51, Michael DiDomenico wrote:

does anyone happen to have an old sgi / supermicro bios for an
X9DRG-QF+ motherboard squirreled away somewhere?  sgi is long gone,
hpe might have something still but who knows where.  i reached out to
supermicro, but i suspect they'll say no.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin 
Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin 
Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf



___
Beowulf mailing list,Beowulf@beowulf.org  sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) 
visithttps://beowulf.org/cgi-bin/mailman/listinfo/beowulf


___
Beowulf mailing list,Beowulf@beowulf.org  sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) 
visithttps://beowulf.org/cgi-bin/mailman/listinfo/beowulf


--
Joe Landman
e:joe.land...@gmail.com
t: @hpcjoe
w:https://scalability.org
g:https://github.com/joelandman
l:https://www.linkedin.com/in/joelandman
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [External] Re: old sm/sgi bios

2023-03-23 Thread Joe Landman

Nah ... that honor belongs to HPUX or AIX. If you used them, you would 
understand.


Irix was my daily driver for ~7 years, until 1999 when I switched to 
Linux (and stayed there up to this day).


Work machine is a MacOS to Linux machines.  Home machines 70-30 linux to 
mac.  No windows.



On 3/23/23 14:50, Prentice Bisbal via Beowulf wrote:

irix

Worst. Unix. Ever.

On 3/23/23 1:07 PM, Michael DiDomenico wrote:

ack, irix flashbacks... :) fortunately, this machine isn't quite that
old, circa 2013

On Thu, Mar 23, 2023 at 1:02 PM Darren Wise via Beowulf
 wrote:

Hello,

I don't personally have such myself but anything SGI even with today's
exotics Ians about the most knowledgable I know of globally and
seriously dying out within this sector, I hazard a guess your SM board
would be considered quite new compared to other systems.

http://www.sgidepot.co.uk/sgidepot/


Kind Regards,
Darren Wise
Research Engineer, Mechatronics
https://wisecorp.co.uk, .us & .ru

On 23/03/2023 16:51, Michael DiDomenico wrote:

does anyone happen to have an old sgi / supermicro bios for an
X9DRG-QF+ motherboard squirreled away somewhere?  sgi is long gone,
hpe might have something still but who knows where.  i reached out to
supermicro, but i suspect they'll say no.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin 
Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin 
Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


--
Joe Landman
e:joe.land...@gmail.com
t: @hpcjoe
w:https://scalability.org
g:https://github.com/joelandman
l:https://www.linkedin.com/in/joelandman
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] old sm/sgi bios

2023-03-23 Thread Joe Landman


See https://www.supermicro.com/products/archive/motherboard/x9drg-qf

On 3/23/23 12:51, Michael DiDomenico wrote:

does anyone happen to have an old sgi / supermicro bios for an
X9DRG-QF+ motherboard squirreled away somewhere?  sgi is long gone,
hpe might have something still but who knows where.  i reached out to
supermicro, but i suspect they'll say no.
___
Beowulf mailing list,Beowulf@beowulf.org  sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) 
visithttps://beowulf.org/cgi-bin/mailman/listinfo/beowulf


--
Joe Landman
e:joe.land...@gmail.com
t: @hpcjoe
w:https://scalability.org
g:https://github.com/joelandman
l:https://www.linkedin.com/in/joelandman
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] milan and rhel7

2022-06-29 Thread Joe Landman

Egads ... if you are still running a 3 series kernel in production ... 
backports or not ... .  The user space stack around that kernel 
(RHEL/CentOS 7) is positively prehistoric.


I'm quite serious about this.  The 4.x series are old now.

New hardware comes out requiring new support in the kernel all the 
time.  If you don't provide for a sufficiently up to date kernel with 
associated drivers, and changed kernel structures to correctly use these 
things, I'd say its a crap shoot as to whether or not it would work at 
all, never mind with its full capability.



On 6/29/22 09:52, Kilian Cavalotti wrote:

On Wed, Jun 29, 2022 at 3:54 AM Mikhail Kuzminsky  wrote:

Yes, RHEL requires upgrading to 8.3 or later to work with EPYC 7003
https://access.redhat.com/articles/5899941. Officially CentOS 7
doesn't support this hardware either.

And yet, Red Hat silently backports Milan-specific bits to 7.9 kernels, like:
- Rudimentary support for AMD Milan - Call init_amd_zn() om Family 19h
processors (BZ#2019218)
in kernel-3.10.0-1160.53.1 (https://access.redhat.com/errata/RHSA-2022:0063)

So yes, in practice, el7 distributions run perfectly fine on Milan
CPUs. You won't have complete support for things like EDAC, but as far
as booting and running the kernel, it works fine:

# uname -r
3.10.0-1160.53.1.el7.x86_64

# lscpu -y
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):32
On-line CPU(s) list:   0-31
Thread(s) per core:1
Core(s) per socket:32
Socket(s): 1
NUMA node(s):  2
Vendor ID: AuthenticAMD
CPU family:25
Model: 1
Model name:AMD EPYC 7543 32-Core Processor
Stepping:  1
CPU MHz:   2794.847
BogoMIPS:  5589.69
Virtualization:AMD-V
L1d cache: 32K
L1i cache: 32K
L2 cache:  512K
L3 cache:  32768K
NUMA node0 CPU(s): 0-15
NUMA node1 CPU(s): 16-31
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx
mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl
nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor
ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx
f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a
misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core
perfctr_nb bpext perfctr_l2 cpb cat_l3 cdp_l3 invpcid_single hw_pstate
sme retpoline_amd ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep
bmi2 invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt
xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local
clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale
vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq overflow_recov
succor smca

Cheers,


--
Joe Landman
e:joe.land...@gmail.com
t: @hpcjoe
w:https://scalability.org
g:https://github.com/joelandman
l:https://www.linkedin.com/in/joelandman
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Anaconda distribution sowing FUD to generate sales?

2022-04-13 Thread Joe Landman

I've got general negative thoughts about conda, based upon $dayjob's use 
of it.  I always enjoy trying to build something which depends upon a 
conda-ized library which has been pooly built/packaged ... yeah, good times.



As for their bait and switch, they do need to cover network costs, and 
if they are making the mistake of using cloud storage for this, then 
their egress/storage costs are likely significant. If you have to use 
them, and really have no choice in the matter, it is better to support 
them and enable them to stay in business, than let them whither and 
die.  The latter guarantees some future flag days where you have to 
start switching out quickly.



Hence a point about a plan B ...


On 4/13/22 12:11, Prentice Bisbal via Beowulf wrote:


Recently, one of my users go this e-mail from a commercial account rep 
at anaconda.com:



Hi [User]
I'm reaching out because I've noticed we are one of [Employer's 
Name]'s preferred tools and also to offer guidance in navigating our 
new Anaconda Terms of Service, as there are changes for the 
commercial use of Anaconda. Based off my research, [Employer's 
Name]is mirroring quite a few packages in the past few months.


We remain deeply dedicated to OSS, and that cost is funded by the 
long tail of our enterprise products and users. In short, we changed 
our Terms of Service to prohibit commercial use of our Public Facing 
Repo (repo.anaconda.com <http://repo.anaconda.com>) channel without a 
paid license.


We'd like to discuss how your organization can remain compliant and 
discuss some options moving forward.
Are you or someone in your IT department available to chat? Book time 
with me [link to online scheduling service 
removed]<https://anaconda.getoutreach.com/c/Cody_Foxwell>

Cheers,
[salesperson's name]


Have any of you received an e-mail like this?

Since I work at an academic, government research site, I don't think 
we fall into the commercial category, so I'm pretty sure we're safe, 
but I still don't like this attempt to monetize open-source software 
like this. I'm not an open-source zealot like RMS, but I don't like 
when people take open-source software, try to monetize it it like this.


What's interesting is their approach here - they are not trying to 
keep open-source software from your directly - they're saying you 
can't use their *repo* to get that software. So you can have your 
open-source software, but to get it from the dealer to your house, you 
need to pay a toll to use the roads.


I don't like this because many people now rely on conda, and conda 
only has value because of the repo. If people using conda knew that 
this might be a problem, perhaps they would have stuck with the 
python.org distribution of Python and pip.


The other think I don't like, is that you can't find any of this 
information on the anaconda.com website. Even after knowing these 
terms and conditions applied, I couldn't find any warnings about this 
on the product pages for the Anaconda Distribution. It's as if they're 
deliberately hiding this information from potential downloaders of 
Anaconda. I only found it by going directly to 
https://repo.anaconda.com, where they do have links prominently 
displayed.


This seems like a trap to me. You download anaconda, completely 
unaware of these terms and conditions, and then use conda to install 
the packages you need, unknowingly violating their license..


Your thoughts?

Prentice


--
Prentice

___
Beowulf mailing list,Beowulf@beowulf.org  sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) 
visithttps://beowulf.org/cgi-bin/mailman/listinfo/beowulf


--
Joe Landman
e:joe.land...@gmail.com
t: @hpcjoe
w:https://scalability.org
g:https://github.com/joelandman
l:https://www.linkedin.com/in/joelandman
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] SC21 Beowulf Bash

2021-11-09 Thread Joe Landman


Pid1 game ... heh !

On 11/9/21 2:13 PM, Douglas Eadline wrote:

Here is the Hybrid Beowulf Bash info

  https://beowulfbash.com/



--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] AMD and AVX512

2021-06-21 Thread Joe Landman


On 6/21/21 9:20 AM, Jonathan Engwall wrote:

I have followed this thinking "square peg, round hole."
You have got it again, Joe. Compilers are your problem.



Erp ... did I mess up again?

System architecture has been a problem ... making a processing unit 
10-100x as fast as its support components means you have to code with 
that in mind.  A simple `gfortran -O3 mycode.f` won't necessarily 
generate optimal code for the system ( but I swear ... -O3 ... it says 
it on the package!)


Way back at Scalable, our secret sauce was largely increasing IO 
bandwidth and lowering IO latency while coupling computing more tightly 
to this massive IO/network pipe set, combined with intelligence in the 
kernel on how to better use the resources.  It was simply a better 
architecture.  We used the same CPUs.  We simply exploited the design 
better.


End result was codes that ran on our systems with off-cpu work (storage, 
networking, etc.) could push our systems far harder than competitors.  
And you didn't have to use a different ISA to get these benefits.  No 
recompilation needed, though we did show the folks who were interested, 
how to get even better performance.


Architecture matters, as does implementation of that architecture.  
There are costs to every decision within an architecture.  For AVX512, 
along comes lots of other baggage associated with downclocking, etc.  
You have to do a cost-benefit analysis on whether or not it is worth 
paying for that baggage, with the benefits you get from doing so.  Some 
folks have made that decision towards AVX512, and have been enjoying the 
benefits of doing so (e.g. willing to pay the costs).  For the general 
audience, these costs represent a (significant) hurdle one must overcome.


Here's where awesome compiler support would help.  FWIW, gcc isn't that 
great a compiler.  Its not performance minded for HPC. Its a reasonable 
general purpose standards compliant (for some subset of standards) 
compilation system.  LLVM is IMO a better compiler system, and its 
clang/flang are developing nicely, albeit still not really HPC focused.  
Then you have variants built on that.  Like the Cray compiler, Nvidia 
compiler and AMD compiler. These are HPC focused, and actually do quite 
well with some codes (though the AMD version lags the Cray and Nvidia 
compilers). You've got the Intel compiler, which would be a good general 
compiler if it wasn't more of a marketing vehicle for Intel processors 
and their features (hey you got an AMD chip?  you will take the slowest 
code path even if you support the features needed for the high 
performance code path).


Maybe, someday, we'll get a great HPC compiler for C/Fortran.


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

[Beowulf] AMD and AVX512

2021-06-20 Thread Joe Landman

 still take time, so I can see
situations where for large amounts of data using CPUs would be preferred
over GPUs.

AMD's latest chips support PCI 4 while Intel is still stuck on PCI 3,
which may or may not mean a difference.


It does.  IO and memory bandwidth/latency are very important, and oft 
overlooked aspects of performance.  If you have a choice of doubling IO 
and memory bandwidth at lower latency (usable by everyone) vs adding an 
AVX512 unit or two (usable by a small fraction of a percent of all 
users), which would net you, as an architect, the best "bang for the buck"?




But what despite all of the above and the other replies, it is AMD who
has been winning the HPC contracts of late, not Intel.


There's a reason for that.  I will admit I have a devil of a time trying 
to convince people that higher clock frequency for computing matters 
only to a small fraction of operations, especially ones waiting on 
(slow) RAM and (slower) IO.  Make the RAM and IO faster (lower latency, 
higher bandwidth), and the system will be far more performant.




--

Joe Landman
e:joe.land...@gmail.com
t: @hpcjoe
w:https://scalability.org
g:https://github.com/joelandman
l:https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] head node abuse

2021-03-26 Thread Joe Landman

Near term, could you make sure their groups are covered in 
/etc/security/limits.conf  ?


Something like

    @users    soft    cpu    5 # 5 min of cpu time

    @users    hard    cpu    10 # 5 min of cpu time

?

Way ... way back when I helped others do this stuff, I set up isolated 
login VMs.  Everyone got their own.  I could limit them to 1 core, 4GB 
ram, and they had queue access (this was in the PBS days).  I lit up 
like 20 of these small VMs per "login node", each with their own IP.  
The VMs could idle fairly nicely.  We could rate limit their (virtual) 
networks using tc, and throttle their disk IO.


Haven't done that in a while.  These days, you might look at login 
containers for the same thing.  Just set up limits on the memory/network 
/cpu for those and off to the races you go (SMOP :D )



On 3/26/21 9:56 AM, Michael Di Domenico wrote:

does anyone have a recipe for limiting the damage people can do on
login nodes on rhel7.  i want to limit the allocatable cpu/mem per
user to some low value.  that way if someone kicks off a program but
forgets to 'srun' it first, they get bound to a single core and don't
bump anyone else.

i've been poking around the net, but i can't find a solution, i don't
understand what's being recommended, and/or i'm implementing the
suggestions wrong.  i haven't been able to get them working.  the most
succinct answer i found is that per user cgroup controls have been
implemented in systemd v239/240, but since rhel7 is still on v219
that's not going to help.  i also found some wonkiness that runs a
program after a user logs in and hacks at the cgroup files directly,
but i couldn't get that to work.

supposedly you can override the user-{UID}.slice unit file and jam in
the cgroup restrictions, but I have hundreds of users clearly that's
not maintainable

i'm sure others have already been down this road.  any suggestions?
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [External] RIP CentOS 8 [EXT]

2020-12-08 Thread Joe Landman

I've built clusters with many of these: Debian, Ubuntu, RHEL, CentOS, 
SUSE, etc.  I got the least pain using Debian, while SUSE was the 
hardest, though RHEL was right behind it.


On 12/8/20 4:55 PM, Tim Cutts wrote:

We did use Debian at Sanger for several years.  The main reason for switching 
away from it (I’m talking about 2008 here) was a desire to have a common OS 
across desktops and servers.  Debian’s extremely purist stance on open source 
device drivers made it a pain on desktops and laptops, because it just didn’t 
work with most of the latest hardware as a result.  So we used Ubuntu instead, 
which allowed closed source drivers.

I thought of Ubuntu, at the time, as “Debian with added pragmatism”

Tim


On 8 Dec 2020, at 21:50, Jörg Saßmannshausen  
wrote:

Dear all,

what I never understood is: why are people not using Debian?

I done some cluster installation (up to 100 or so nodes) with Debian, more or
less out of the box, and I did not have any issue with it. I admit, I might
have missed out something I don't know about, the famous unkown-unkowns, but
by enlarge the clusters were running rock solid with no unusual problem.
I did not use Lustre or GPFS etc. on it, I only played around a bit with BeeFS
and some GlusterFS in a small scale.

Just wondering, as people mentioned Ubuntu.

All the best from a dark London

Jörg

Am Dienstag, 8. Dezember 2020, 21:12:02 GMT schrieb Christopher Samuel:

On 12/8/20 1:06 pm, Prentice Bisbal via Beowulf wrote:

I wouldn't be surprised if this causes Scientific Linux to come back
into existence.

It sounds like Greg K is already talking about CentOS-NG (via the ACM
SIGHPC syspro Slack):

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_posts_gmkurtzer-5Fcentos-2Dproject-2Dshifts-2Dfocus-2Dto-2Dcent=DwIGaQ=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50=jLfA-668qdAa9HzPD-HBTocn7f-NX1ASGLHzPe9-pDs=WBEGOmHjR0oZG9TYnByQgXZhZUgAgNYGw6ENauHRt34=
os-stream-activity-6742165208107761664-Ng4C

All the best,
Chris



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://urldefense.proofpoint.com/v2/url?u=https-3A__beowulf.org_cgi-2Dbin_mailman_listinfo_beowulf=DwIGaQ=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50=jLfA-668qdAa9HzPD-HBTocn7f-NX1ASGLHzPe9-pDs=sWhmJZWjbMIRK4zSFnuL9kIlUUDTxkBGjmG8M6jKK4w=





--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] perl with OpenMPI gotcha?

2020-11-20 Thread Joe Landman



On 11/20/20 4:43 PM, David Mathog wrote:

[...]



Also, searching turned up very little information on using MPI with perl.
(Lots on using MPI with other languages of course.)
The Parallel::MPI::Simple module is itself almost a decade old.
We have a batch manager but I would prefer not to use it in this case.
Is there some library/method other than MPI which people typically use 
these days for this sort of compute cluster process control with Perl 
from the head node?



I can't say I've ever used Perl and MPI.  I suppose it is doable, but if 
you were doing it, I'd recommend encapsulating it with FFI::Platypus 
(https://metacpan.org/pod/FFI::Platypus).


This however, doesn't seem tp be your problem per se.  Your problem 
sounds like "how do I launch a script on N compute nodes at once, and 
wait for it to complete".


If I have that correct, then you want to learn about pdsh 
(https://github.com/chaos/pdsh and info here: 
https://www.rittmanmead.com/blog/2014/12/linux-cluster-sysadmin-parallel-command-execution-with-pdsh/ 
).


I write most of my admin scripts in perl, and you can use pdsh as a 
function within them.


However ...

MCE::Loop is your friend.

Combine that with something like this:

    $mounts=`ssh -o ConnectTimeout=20 $node grep o2ib /proc/mounts`;

and you can get pdsh-like control directly in Perl without invoking pdsh.

The general template looks like this:

   #!/usr/bin/env perl

   use strict;
   use MCE::Loop;

   MCE::Loop->init(
   max_workers => 25, chunk_size => 1
   );

   my $nfile=shift;

   # grab file contents into @nodes array
   my @nodes;
   chomp(@nodes = split(/\n/,`cat $nfile`));

   # looping over nodes, max_workers at a time
   mce_loop {
   my ($mce, $chunk_ref, $chunk_id) = @_;
   # do stuff to node $_
   } @nodes;


This will run 25 copies (max_workers) of the loop body over the @nodes 
array.  Incorporate the ssh bit above in the #do stuff area, and you get 
basically what I think you are asking for.


FWIW, I've been using this pattern for a few years, most recently on 
large supers over the past few months.






Thanks,

David Mathog



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Julia on POWER9?

2020-10-16 Thread Joe Landman

1.5.x is a definite speed improvement over the 1.4.x.  That said, there 
is some weirdness I am running into with the whole CUDA vs CuArrays and 
other dependencies in other modules (DiffEqGPU).



On 10/16/20 10:18 AM, Scott Atchley wrote:


% hostname -f

login1.summit.olcf.ornl.gov <http://login1.summit.olcf.ornl.gov>


% module avail |& grep julia

forge/19.0.4             ibm-wml-ce/1.6.1-1 *julia*/1.4.2           
(E)    ppt/2.4.0-beta2               (D) vampir/9.5.0       (D)


[*atchley*@*login1*]*~ *% module avail julia


 
/sw/summit/modulefiles/core 



julia/1.4.2 (*E*)


Where:

*E*: Experimental



On Thu, Oct 15, 2020 at 5:02 PM Prentice Bisbal via Beowulf 
mailto:beowulf@beowulf.org>> wrote:


So while you've all been discussing Julia, etc., I've been trying to
build and get it running on POWER9 for a cluster of AC922 nodes
(same as
Summit, but with 4 GPUs per node). After doing a combination of
Google
searching and soul-searching, I was able to get a functional
version of
Julia to build for POWER9. However, I'm not 100% sure my build is
fully
functional, as when I did 'make testall' some of the tests failed.

Is there anyone on this list using or supporting the latest
version of
Julia, 1.5.2, on POWER9? If so, I'd like to compare notes. I imagine
someone from OLCF is on this list.

Based on my Internet searching, as of August 2019 Julia was being
used
on Summit on thousands of cores, but I've also seen posts from the
Julia
devs saying they can't support the POWER architecture anymore because
they no longer have access to POWER hardware. Most of this
information
comes from the Julia GitHub or Julia Discourse conversations.

-- 
Prentice


___
Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Julia on POWER9?

2020-10-15 Thread Joe Landman


Cool (shiny!)

On 10/15/20 5:02 PM, Prentice Bisbal via Beowulf wrote:
So while you've all been discussing Julia, etc., I've been trying to 
build and get it running on POWER9 for a cluster of AC922 nodes (same 
as Summit, but with 4 GPUs per node). After doing a combination of 
Google searching and soul-searching, I was able to get a functional 
version of Julia to build for POWER9. However, I'm not 100% sure my 
build is fully functional, as when I did 'make testall' some of the 
tests failed.


Is there anyone on this list using or supporting the latest version of 
Julia, 1.5.2, on POWER9? If so, I'd like to compare notes. I imagine 
someone from OLCF is on this list.


Based on my Internet searching, as of August 2019 Julia was being used 
on Summit on thousands of cores, but I've also seen posts from the 
Julia devs saying they can't support the POWER architecture anymore 
because they no longer have access to POWER hardware. Most of this 
information comes from the Julia GitHub or Julia Discourse conversations.



--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] First cluster in 20 years - questions about today

2020-02-03 Thread Joe Landman



On 2/1/20 10:21 PM, Mark Kosmowski wrote:


Should I consider Solaris or illumos?


No.

I've tried to use modern tooling on Illumos (OpenSolaris fork) in the 
form of SmartOS, and it just isn't supported in most build 
environments.  Solaris is dead.  The level of effort to build basic 
tools is herculean.  There are some packages in various package 
managers, but as noted, you will find almost no community support for 
these, and updates, if any, will be spotty at best.


The *BSDs have similar issues, albeit with a slightly larger user base 
than Illumos/SmartOS.  Porting to them has been (for me anyway) a 
maddening affair.


These systems are highly opinionated, and their opinions don't align 
with current software tooling.  Which means you spend significant 
amounts of time chasing idiosyncrasies versus getting real work done.


It definitely is heart breaking, but sometimes you have to let some 
platforms go.



I do plan on using ZFS, especially for the data node, but I want as 
much redundancy as I can get, since I'm going to be using used 
hardware.  Will the fancy Solaris cluster tools be useful?


No.  Remember that ZFS doesn't like to share RAM, so you'd have to work 
fairly hard to contain its caching effort.  It can be done on Linux, but 
generally, you want to be careful with system architecture if you are 
going to combine storage/compute and leverage ZFS.



Also, once I get running, while I'm getting current with theory and 
software may I inquire here about taking on a small, low priority 
academic project to make sure the cluster side is working good?


Thank you all for still being here!

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] 10G and rsync

2020-01-02 Thread Joe Landman



On 1/2/20 10:39 AM, Michael Di Domenico wrote:

On Thu, Jan 2, 2020 at 10:35 AM Chris Dagdigian  wrote:

- I noticed you did not test small file / metadata operations. My past
experience has found that this was the #1 cause of slowness in rsync and
other file transfers. iperf and IOR tests are all well and good but you
should run something like MDTEST to hammer the system on metadata and
small file handling. If you are moving lots of tiny files or hundreds of
thousands of directories etc this could be your problem
- Single stream over 10gig has never been great for me doing big data
movement. I get way more throughput by using rsync in parallel to
migrate multi-stream either from a single 10gig connected host or a
cluster of them

thanks.  this is definitely not a metadata issue, i'm only moving
100-200 files in total.  this is strictly a single stream transfer
problem with rsync

if i parallel the rsync, i can in fact increase the performance, but
the process that's doing the transfer (part of a bigger system) can't
handle that.

it just seemed unfathomable to me that rsync can't transfer at wire
past 1G speeds..


Hrm ... at home, with my little arista 10GbE switch and two machines, I 
regularly hit about 180 MB/s between machines (with SATA drives) for 
large xfers with rsync.  This is about the actual read/write speed of 
the drives in this case.


Have you measured this?  Your slowest portion of the chain is likely to 
be related to your bottleneck.  Disk reads often can be that, especially 
if you are sharing actively used disks on one side or the other...





___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] 10G and rsync

2020-01-02 Thread Joe Landman



On 1/2/20 10:26 AM, Michael Di Domenico wrote:

does anyone know or has anyone gotten rsync to push wire speed
transfers of big files over 10G links?  i'm trying to sync a directory
with several large files.  the data is coming from local disk to a
lustre filesystem.  i'm not using ssh in this case.  i have 10G
ethernet between both machines.   both end points have more then
enough spindles to handle 900MB/sec.

i'm using 'rsync -rav --progress --stats -x --inplace
--compress-level=0 /dir1/ /dir2/' but each file (which is 100's of
GB's) is getting choked at 100MB/sec


A few thoughts

1) are you sure your traffic is traversing the high bandwidth link?  
Always good to check ...


2) how many files are you xfering?  Are these generally large files or 
many small files, or a distribution with a long tail towards small 
files?  The latter two will hit your metadata system fairly hard, and in 
the case of Lustre, performance will depend critically upon the MDS/MDT 
architecture and implementation. FWIW, the big system I was working on 
setting up late last year, we hit MIOP level reads/writes, but then 
again, this was architected correctly.


3) wire speed xfers are generally the exception unless you are doing 
large sequential single files.   There are tricks you can do to enable 
this, but they are often complex.  You can use the array of 
writers/readers, and leverage parallelism, but you risk invoking 
congestion/pause throttling on your switch.





running iperf and dd between the client and the lustre hits 900MB/sec,
so i fully believe this is an rsync limitation.

googling around hasn't lent any solid advice, most of the articles are
people that don't check the network first...

with the prevalence of 10G these days, i'm surprised this hasn't come
up before, or my google-fu really stinks.  which doesn't bode well
given its the first work day of 2020 :(
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] 40kw racks?

2019-10-21 Thread Joe Landman

FWIW, have a look at scalematrix rack enclosures.  I saw them last week.  
Can get to 50kW as far as I understand.


Disclosure: I met with them last week as part of the day job.  No financial 
relationship with them.  Just interesting tech.


On October 21, 2019 11:30:16 AM Michael Di Domenico 
 wrote:



On Mon, Oct 21, 2019 at 2:16 PM Jeff Johnson
 wrote:


You should look at DDC’s racks. Self contained up to 52KW.
https://ddcontrol.com/s-series/dynamic-density-control/


very interesting, but they look big.  i'm not sure we have the space for those
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf




___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

[Beowulf] Brief OT: Open positions

2019-07-26 Thread Joe Landman


Hi folks

  A brief note, one of my colleagues has 2 open positions in the US; 
one in Houston TX, the other in Vicksburg MS.  These are 
hardware/software maintenance on a number of mid sized supercomputers, 
clusters, and storage.


  I have some cloud HPC needs (compute, storage, networking) in my 
group as well.  More standard "cloudy" things there (yes, $dayjob does 
cloud!).


  Please ping me on my email in .sig or at $dayjob.  Email there is my 
first initial + last name at cray dot com.  Thanks, and back to your 
regularly scheduled cluster/super ... :D


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] help for metadata-intensive jobs (imagenet)

2019-06-28 Thread Joe Landman



On 6/28/19 1:47 PM, Mark Hahn wrote:

Hi all,
I wonder if anyone has comments on ways to avoid metadata bottlenecks
for certain kinds of small-io-intensive jobs.  For instance, ML on 
imagenet,

which seems to be a massive collection of trivial-sized files.

A good answer is "beef up your MD server, since it helps everyone".
That's a bit naive, though (no money-trees here.)

How about things like putting the dataset into squashfs or some other 
image that can be loop-mounted on demand?  sqlite?  perhaps even a format

that can simply be mmaped as a whole?

personally, I tend to dislike the approach of having a job stage tons of
stuff onto node storage (when it exists) simply because that guarantees a
waste of cpu/gpu/memory resources for however long the stagein takes...


I'd suggest something akin to a collection of ramdisks using zram 
distributed across your nodes.  Then put a beegfs file system atop 
those.  Stage in the images.  Run.


This is cheap compared to building the storage you actually need for 
this workload.


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

[Beowulf] OT: open positions in HPC, Cloud, networking, services and support etc

2019-05-01 Thread Joe Landman


Hi folks,

  Apologies for OT conversation, I'll keep it very brief.  My team is 
looking for excellent HPC/Cloud/networking/support folk. Feel free to 
ping me at email below or jlandman over at cray dot com, and I can point 
you to the URLs.


  And, I forgot to mention, I'm now over at Cray as Director of Cloud 
Services and DevOps !


  Thanks!

Joe

--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

[Beowulf] LFortran ... a REPL/Compiler for Fortran

2019-03-24 Thread Joe Landman


See https://docs.lfortran.org/ .   Figured Jeff Layton would like this :D


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Considering BeeGFS for parallel file system

2019-03-18 Thread Joe Landman



On 3/18/19 12:02 PM, Prentice Bisbal via Beowulf wrote:


Will,

Several years ago,when I was at Rutgers, Joe Landman's company, 
Scalable Informatics (RIP), was trying to sell be on BeeGFS over 
Lustre and GPFS. At the time, I was not interested. Why not? BeeGFS 
was still relatively new, and Lustre  and GPFS had larger install 
bases, and




Yeah ... those were the days ...


therefore bigger track records. I was the only System admin in a group 
with aspirations to be the one-stop shop for a very large research 
institution, and to become a national-level HPC center. As a result, I 
was more risk-adverse than I normally would be.  I didn't want to take 
a risk with a relatively unproven system, no matter how good it's 
performance was. I also wanted to use a system where there was an 
abundance of other sys admins with expertise I could lean on if I 
needed to.


Fast forward 4-5 years, and the situation has completely changed. At 
SC18, it seemed every booth was using or promoting BeeGFS, and 
everyone was saying good things about it. If were in the same 
situation today, I wouldn't hesitate to consider BeeGFS.


In fact, I feel bad for not giving it a closer look at the time, 
because it's clear Joe and his team at his late company were on to 
something and were clearly ahead of their time with promoting BeeGFS.




Thanks.  I seem to be hearing this from many corners these days.

In simple terms, there are a few different risks.  One is making a 
choice on a smaller entity, a smaller market share participant providing 
the services/support you need.  Another risk is following the herd and 
giving up the advantages that an alternative choice might bring.


BeeGFS is not a market risk.  It is an established player, it works 
well, the team is excellent, the code is excellent.  Note: I have no 
financial interest in BeeGFS's success/failure in the market as I am not 
working on selling high performance storage and computing systems.


The smaller companies selling integrated packages of BeeGFS atop 
reasonable designed hardware should likely be some of your first stops 
in considering your options.  They are far more focused/dependent upon 
business than the big shops, and this translates into usually far 
superior service.  I'd suggest avoiding the majority of resellers, as 
their business strategy is to steer you in a particular direction versus 
working with you to design what you need (the smaller shops do this).


If you want to do this yourself atop your existing kit, go for it.  Its 
not hard to set up/configure.



--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Large amounts of data to store and process

2019-03-05 Thread Joe Landman



On 3/4/19 8:00 PM, Lux, Jim (337K) via Beowulf wrote:

I'm munging through not very much satellite telemetry (a few GByte), using 
sqlite3..
Here's some general observations:
1) if the data is recorded by multiple sensor systems, the clocks will *not* 
align - sure they may run NTP, but
2) Typically there's some sort of raw clock being recorded with the data (in 
ticks of some oscillator, typically) - that's what you can use to put data from 
a particular batch of sources into a time order.  And then you have the problem 
of reconciling the different clocks.
3) Watch out for leap seconds in time stamps - some systems have them (UTC), 
some do not (GPS, TAI) - a time of 23:59:60 may be legal.
4) you need to have a way to deal with "missing" data, whether it's time tags, or actual 
measurements - as well as "gaps in the record"
5) Be aware of the need to de-dupe data - same telemetry records from multiple 
sources.


Being satellite data, I am assuming you have relativistic corrections to 
the time, depending upon orbit, accuracy of the clock, data analysis 
needs, etc. . [1][2]


Missing data, of various types may be handled in data frame packages.  
R, Julia, and I think Python can all handle this without too much pain.


[1] https://gssc.esa.int/navipedia/index.php/Relativistic_Clock_Correction

[2] http://www.astronomy.ohio-state.edu/~pogge/Ast162/Unit5/gps.html

--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Large amounts of data to store and process

2019-03-04 Thread Joe Landman



On 3/4/19 1:55 AM, Jonathan Aquilina wrote:

Hi Tony,

Sadly I cant go into much detail due to me being under an NDA. At this point 
with the prototype we have around 250gb of sample data but again this data is 
dependent on the type of air craft. Larger aircraft and longer flights will 
generate a lot more data as they have  more sensors and will log more data than 
the sample data that I have. The sample data is 250gb for 35 aircraft of the 
same type.



You need to return your answers in ~10m or 600s, with an assumed data 
set size of 250GB or more (assuming you meant GB and not Gb).  Depending 
upon the nature of the calculation, whether or not you can perform the 
calculations on subsets, or if it requires multiple passes through the 
data in order to calculate.


I've noticed some recommendations popping up ahead of understanding what 
the rate limiting factors for returning the results from calculations 
based upon this data set.  I'd suggest focusing on the analysis needs to 
start, as this will provide some level of guidance on the system(s) 
design required to meet your objectives.


First off, do you know whether your code will meet this 600s response 
time with this 250GB data set?  I am assuming this is unknown at this 
moment, but if you have response time data for smaller data sets, you 
could construct a rough scaling study and build a simple predictive model.


Second, do you need the entire bolus of data, all 250GB, in order to 
generate a response to within required accuracy?  If not, great, and 
what size do you need?


Third, will this data set grow over time (looking at your writeup, it 
looks like this is a definite "yes")?


Fourth, does the code require physical access to all of the data bolus 
(what is needed for the calculation) locally in order to correctly operate?


Fifth, will the data access patterns for the code be streaming, 
searching, or random?  In only one of these cases would a database (SQL 
or noSQL) be a viable option.


Sixth, is your working data set size comparable to the bolus size (e.g. 
250GB)?


Seventh, can your code work correctly with sharded data (variation on 
second point)?



Now some brief "data physics".

a) (data on durable storage) 250GB @ 1GB/s -> 250s to read, once, 
assuming large block sequential read.  For a 600s response time, that 
leaves you with 350s to calculate.  Is this enough time?  Is a single 
pass (streaming) workable?


b) (data in ram) 250GB/s @ 100GB/s -> 2.5s to walk through once in 
parallel amongst multiple cores.  If multiple/many passes through data 
are required, this strongly suggests a large memory machine (512GB or 
larger).


c) if your data is shardable, and you can distribute it amongst N 
machines, the above analyses still hold, replacing the 250GB with the 
size of the shards.  If you can do this, how much information does your 
code need to share amongst the worker nodes in order to effect the 
calculation?  This will provide guidance on interconnect choices.



Basically, I am advocating focusing on the analysis needs, how the 
scale/grow, and your near/medium/long term goals with this, before you 
commit to a specific design/implementation.  Avoid the "if all you have 
is a hammer, every problem looks like a nail" view as much as possible.



--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Anybody here still use SystemImager?

2019-02-28 Thread Joe Landman



On 2/27/19 9:08 PM, David Mathog wrote:

Joe Landman wrote:

[...]


I'm about 98% of the way there now, with a mashup of parts from boel 
and Centos 7.

The initrd is pretty large though.

Wasted most of a day on a mysterious issue with "sh" (busybox) not 
responding to the keyboard with a 3.10.108 kernel built starting from 
the boel config, but it would respond using the same initrd and a 
stock Centos 7 kernel.  So 3.10.108 was recompiled with the Centos 7 
config (which makes WAY too many modules for an initrd) with the 
network drivers built into the kernel.  This fixes that problem but I 
could not tell you why.


This is a driver issue.  Likely you aren't including the hid components 
in your initramfs, or built into the kernel.


lsmod | grep hid
mac_hid    16384  0
hid_generic    16384  0
usbhid 49152  0
hid   118784  2 usbhid,hid_generic

You should make sure hid, usbhid, and hid_generic are all included/loaded.



The last thing to overcome is that in this environment the SATA disk 
is not seen/mounted, even though tty* and numerous other things are.


  modprobe sd_mod

puts sd_mod in lsmod, but no /dev/sd* show up.  Hardware detection in 
Linux has been done and redone so many times I have no idea what to 
use in a 3.*.* kernel, and the web is littered with descriptions of 
methods which no longer work.  The lspci from busybox doesn't give 
device names for humans, which isn't helping.  BOEL used
modules.pcimap for this, and that is one of the things which no longer 
exist.

The init script tries to set things up with mdev (not udev) this way:


This is, again, a driver issue.  You need to know which SATA/SAS card 
you have (including motherboard versions).


For example, for the system I am on now:

lsmod | grep sas
mpt3sas   241664  16
raid_class 16384  1 mpt3sas
scsi_transport_sas 40960  2 ses,mpt3sas

and another pure SATA system, looking at dmesg output,

[    2.133951] ahci :00:11.0: version 3.0
[    2.134248] ahci :00:11.0: AHCI 0001.0200 32 slots 4 ports 6 Gbps 
0xf impl SATA mode
[    2.134250] ahci :00:11.0: flags: 64bit ncq sntf ilck pm led clo 
pmp pio slum part


This is the ahci driver.  Most motherboards I've run into use it for 
basic SATA.





    echo /sbin/mdev > /proc/sys/kernel/hotplug || shellout
    /sbin/mdev -s || shellout

which puts a lot of things in /dev, just not the SATA.  This is on a 
Dell poweredge T110, maybe there is some driver for the SATA 
controller which isn't loading.


In both cases, it is a driver issue.  For large initramfs, it varies 
from about 710MB for everything and the kitchen sink in debian9, to 
about 1.5GB for CentOS7.


root@zoidberg:/data/tiburon/diskless/images/nyble# ls -alF centos7/
total 2736520
drwxr-xr-x 2 root root    138 Jun 15  2018 ./
drwxr-xr-x 4 root root 36 Apr 25  2018 ../
-rw-r--r-- 1 root root 1436202727 Jun  5  2018 initramfs-4.16.13.nlytiq.img
-rw-r--r-- 1 root root 1356007691 Jun 15  2018 initramfs-4.16.15.nlytiq.img
-rw-r--r-- 1 root root    5023504 Jun  5  2018 vmlinuz-4.16.13.nlytiq
-rw-r--r-- 1 root root    4953872 Jun 15  2018 vmlinuz-4.16.15.nlytiq

root@zoidberg:/data/tiburon/diskless/images/nyble# ls -alF debian9/
total 2607756
drwxr-xr-x 2 root root    212 Sep 15 14:53 ./
drwxr-xr-x 4 root root 36 Apr 25  2018 ../
-rw-r--r-- 1 root root 1002775823 Jun  5  2018 
initramfs-ramboot-4.16.13.nlytiq
-rw-r--r-- 1 root root  908767337 Sep 15 14:53 
initramfs-ramboot-4.18.5.nlytiq
-rw-r--r-- 1 root root  744269030 May 29  2018 
initramfs-ramboot-4.9.0-6-amd64

-rw-r--r-- 1 root root    5019408 Jun  5  2018 vmlinuz-4.16.13.nlytiq
-rw-r--r-- 1 root root    5269360 Sep 15 14:53 vmlinuz-4.18.5.nlytiq
-rw-r--r-- 1 root root    4224800 May 29  2018 vmlinuz-4.9.0-6-amd64


I PXE boot all of these.  Takes ~10s over 1GbE, much less over faster 
networks.  You should see the thing boot over 100GbE. Sadly I don't have 
100GbE at home.


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Anybody here still use SystemImager?

2019-02-26 Thread Joe Landman



On 2/26/19 12:56 PM, David Mathog wrote:

[...]

What I really need is a version of Boel which will chroot into a 
Centos 7 system.
Anybody have one?  Failing that, is there another small PXE bootable 
linux distro, more or less like Boel but with a kernel near 3.10, with 
an initrd which loads a target script (by nodename) into busybox and 
runs it?



I've never used system imager or Boel, so I can't necessarily comment on 
how to fix that.   I can point out though, that the project I built 
(https://github.com/joelandman/nyble) turns your standard CentOS7, 
Debian9, or Ubuntu18.04 distro into a PXE (or USB or local disk) 
bootable RAMdisk mounted system.  I used this (previous versions of 
this) at Scalable Informatics to boot/run infrastructure we shipped.


You can hand the system a boot time argument on a script to run after it 
comes up.  I used boot options of


        root=ram rootfstype=ramdisk 
runscript=http://path/to/simple/shell/script.sh simplenet=1 verbose 
console=ttyS1,115200n8


I do need to update the readme file on that as it is out of date.  You 
can see all the boot options in the 
https://github.com/joelandman/nyble/blob/master/nyble-ramboot-init.d 
file, of the form 'grep -q option= /proc/cmdline).


I use this for doing all my booting of immutable images.  Just need the 
kernel, and the initramfs.  I can build one for you if you want, and you 
can play with it.


--

Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] USB flash drive bootable distro to check cluster health.

2019-01-11 Thread Joe Landman



On 1/11/19 7:59 AM, Richard Chang wrote:

Hi,
I would like to know if we have or can make( or prepare) a USB 
bootable OS that we can boot in a cluster and its nodes to test all 
its functionality.


The purpose of this is to boot a new or existing cluster to check its 
health, including Infiniband network,  any cards, local hard disks, 
memory etc, so that I don't have to disturb the existing OS and its 
configuration.


If possible, it would be nice to boot the compute nodes from the 
master node.


Anyone knows of any pre-existing distribution that will do the job ? 
Or know how to do it with Centos or Ubuntu ?


FWIW: this is one of the uses cases of 
https://github.com/joelandman/nyble .  It works with CentOS, Debian, and 
Ubuntu (though I've not pushed the 18.04.1 changes yet).


I have a rudimentary USB target I was going to clean up soon, and the 
images can be centrally booted from a pxe server, and pull/run scripts 
post boot.


Runs in RAM, you can modify the distributions to your hearts content.  I 
have a few private repos here which have NVidia + MLNX + other drivers 
and related bits already built in.


I've set up many systems with this, tying it together with 
https://github.com/joelandman/tiburon for boot control.   This was 
originally used at Scalable Informatics when we were alive, and has 
evolved significantly since then.


If you want a simple pure USB distro for this, try SystemRescueCD, 
though I don't think it does Infiniband, or most drivers.



--

Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Fortran is Awesome

2018-11-29 Thread Joe Landman



On 11/29/18 10:22 AM, Stu Midgley wrote:



But yeah, C can do anything Fortran can do, and then some. People do
not write operating systems in Fortran for a reason. 



I've written a fortran-like scripting language (and the bones of a 
basic compiler) in Fortran...  everything you can do in C you can do 
in Fortran.


People often use the lack of pointers as a reason to NOT use Fortran, 
which is rubbish.  Just allocate the whole address space and go to 
town with your own pointers. Which... if you really think about it is 
all that C does. In theory the concept of a SIGSEG is only an OS 
limitation on C.  You "can" in theory just allocate any address you 
want without allocation and pre-allocation.



VMS comes to mind as a Fortran programmable OS.  I seem to remember 
other grad students ... er ... patching things ... with negative array 
indexes on Vaxen.  Though that's a while ago, and I might be suffering 
from ENOTENOUGHCOFFEE.


I wrote a command like argument processor for my fortran code like 
30-ish years ago (eek!) so I could at least pass arguments in "easily".  
I remember that was one of the things that caused me to look at C 
originally.


I love the "lets allocate all of memory and work in this giant 
heap-o-stuff" approach in Fortran.  Works great, until you have a 
routine with a slightly different view of how the memory is mapped.  
Then you get C-like pointer aliasing problems.  And debugging issues.  
Yeah, one giant heap, a memory map and a debugger.  Fun times (I had 
done quite a bit of that spelunking in the past).  I'd much rather leave 
the days of huge global common blocks alone.


Modern fortran appears to be much better at allocation and management of 
memory than C (where it is absolutely explicit). Likely it is far 
smarter on layouts as well, with various NUMA and heterogeneous 
processing systems.



--
Dr Stuart Midgley
sdm...@gmail.com <mailto:sdm...@gmail.com>

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Fortran is Awesome

2018-11-28 Thread Joe Landman



On 11/28/18 11:29 AM, Stu Midgley wrote:

I agree 100% .  You can't beat bash and fortran.



Heh ... for me it was Perl and Fortran, circa 1992-1995.  I automated 
some of my work flows.  Which was something rare back then.  Turns out 
leveraging automation for a parametric scan on long running code (back 
then, its fast as heck these days) is a very good thing.


One of my first projects in the late 80's early 90's was trying to use a 
self-consistent field code developed in the early 60's or so.  I 
literally transcribed it from the text in the library, into my editor, 
and then corrected some of the glaringly antiquated bits.  Like the tape 
rewind command.


I wound up developing my own code later, using more modern techniques.  
All in Fortran.


In some ways, I miss using it.

These days, I use Julia, Perl, C, and some Python for most of my stuff, 
though I dabble a little in go (all the cool kids are using it).


But yeah, Fortran is awesome.



On Wed, Nov 28, 2018 at 9:02 AM Paul Edmon <mailto:ped...@cfa.harvard.edu>> wrote:


Fortran is and remains an awesome language.  More people should
use it:

https://wordsandbuttons.online/fortran_is_still_a_thing.html

-Paul Edmon-

___
Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf



--
Dr Stuart Midgley
sdm...@gmail.com <mailto:sdm...@gmail.com>

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] More about those underwater data centers

2018-11-08 Thread Joe Landman



On 11/8/18 10:46 AM, Prentice Bisbal via Beowulf wrote:


One comment - my dissertation below is specifically about 
non-ebullient immersion cooling. As Jim Lux pointed out in a later 
e-mail, in ebullient cooling, some kind of surface feature to promote 
nucleation could be beneficial. Ebbulient cooling is a whole different 
beast from normal (non-ebullient) immersive cooling, since in that 
case you have changes of state and gas bubbles flowing through a liquid.


However, in all of the live and video demonstrations I've seen of 
Novec, the processors were completely bare, bubbles were forming at a 
pretty rapid rate, so again I think creating some sort of heat sink 
for this would add cost with no significant benefit.



I get to use physics ... whee!

Short version ... most (all?) heats of vaporization (the energy you have 
to pour into a liquid to turn it from a liquid to a gas at its boiling 
point temperature/pressure) are (much) higher than the energy you 
deposit into the same mass of liquid to bring it from just above 
freezing to boiling.


Cv for water is about 4.186 J/(gram * C), so 1 gram of water, going from 
just above 0 C (freezing point) to 100 C (boiling point at sea level) 
means Q = m Cv delta_T = 418.6 J.


Take that 1g of water at 100 C, and turn it into vapor at 100 C, and you 
get Q = m Hv = 2256 J.


Put another way, evaporation cooling allows you to absorb more heat 
(about 5x in this case) for the same mass.


That said, convective cooling is a somewhat different beast. Immersion 
cooling is, I believe, primarily convective in nature.


I think DUG is primarily convective based, and it is sufficient for 
their use case (please correct me if I am wrong).


There is a fundamental danger with evaporative cooling, in that one has 
to make sure one does not ignore the vapor, or potential heat induced 
reaction products of the vapor.  Fluorinert has some issues:  
https://en.wikipedia.org/wiki/Fluorinert#Toxicity if you overcook it ...


--

Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

[Beowulf] SC18

2018-11-06 Thread Joe Landman

It feels weird attending SC18, and not being an exhibitor. Definitely 
looking forward to it.


Beobash will (of course) be fun ... and I'm looking forward to (finally) 
being able to attend talks, poster sessions, panels.



--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Oh.. IBM eats Red Hat

2018-10-30 Thread Joe Landman



On 10/30/18 10:46 AM, Prentice Bisbal via Beowulf wrote:

TIL: A lot of you old-timers are really salty about systemd. ;)



Some parts I like (integrated restart).

Some parts are terrible (integrated restart).

I especially like it when a dependency is in the slightly wrong order 
and it takes forever to boot/shutdown.


Systemd is much like the RH anaconda/kickstart installer.  It is best to 
spend as little time under its control as possible.  To postpone much 
startup and shutdown stuff to scripting outside of systemd's control.  
Gee, just like init of days of old.


I've had to learn its ins and outs over the last few years, and while it 
improves some things, it makes a complete hash of others.  Its highly 
opinionated, and seeks to impose its opinion whenever possible.  
Thankfully, some of its opinions can be (for now) controlled via 
/etc/systemd/*.conf scripts.


I'll paraphrase Churchill here:  Systemd is the worst, except for all 
the rest.



--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Oh.. IBM eats Red Hat

2018-10-30 Thread Joe Landman



On 10/30/18 10:35 AM, Prentice Bisbal via Beowulf wrote:


On 10/29/2018 07:40 PM, Douglas Eadline wrote:


[...]

Yeah, after I posted that, I read more on the topic, and the whole 
cloud thing was cited repeatedly. I just find it hard to believe IBM 
is going that way, as it seems antithetical to the direction they 
moved in a few years ago: selling off the x86 business because they 
wanted to get out of the commodity computing arena where there was too 
much competition and low margins and move into more specialized 
high-margin products. I consider cloud computing to be the ultimate 
form of commodity computing.



Which problem is IBM solving by buying RH?  Basically its a customer 
base and market penetration, a set of standard software tools that are 
the business defaults, and a cash cow support revenue stream.


IMO, this was a wise move for them.  It was a good move for RH.

I know there are many who think IBM will (eventually) destroy/mess up 
RH.  This may (eventually) be.


However ... IBM would be loath to kill the goose that lays predictable 
golden eggs every quarter.  More likely than not, RH will internally 
"take over" some sections of IBM itself, to increase this cash cow.  IBM 
management has been having a tough time turning a profit over the last 
few years, even as the migrate more of their staff positions to India 
and other lower HR cost geos.


The impact on CentOS will likely, initially, be minimal.  Though, I 
fully expect that IBM will actually start enforcing the license language 
that says that CentOS cannot be distributed in a commercial context.  
They are not going to leave money on the table.


And yes, those problems could have been solved for less than $34B, but 
how quickly and easily? It's not uncommon for one business to buy 
another to address a deficiency quickly, rather than building their 
own team from the ground up.


I disagree that they could have solved the problems for less. There is 
always a cost to business decisions ... if you decide you really want to 
solve this quick, and acquisition is the right path, you have to accept 
that there are market forces that will help decide pricing for you.  Bid 
too low, and someone will swoop in.  Bid too high, and your 
board/shareholders will revolt and sue you.


All this said, Ubuntu LTS may be the only other (really) viable 
alternative.  And I expect that their phone has been ringing with at 
least 3 players I can imagine.







--
Prentice


--
Doug



Prentice

On 10/29/2018 02:43 PM, JÃ¶rg SaÃŸmannshausen wrote:

Hi all,

it is not only that, but I saw something about Mellanox today as well.

Not good news in my humble opinion. :-(

All the best

JÃ¶rg


Am Montag, 29. Oktober 2018, 07:42:48 GMT schrieb Tony Brian Albers:
https://www.reuters.com/article/us-red-hat-m-a-ibm/ibm-to-acquire-softw 


are-company-red-hat-for-34-billion-idUSKCN1N20N3

I wonder where that places us in the not too distant future..

I've worked for Big Blue, and I'm not sure the company cultures are
compatible to say the least.

/tony

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin 
Computing

To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin 
Computing

To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

--
MailScanner: Clean






___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Oh.. IBM eats Red Hat

2018-10-29 Thread Joe Landman



On 10/29/18 12:44 PM, David Mathog wrote:

[...]


It turns out that getting up to date compilers and libraries has become

quite important for those working on large distributed code bases.


Libraries are harder.  Try to build a newer one than ships with CentOS 
and it is not uncommon to end up having to build many other libraries 
(recursive dependencies) or to hit a brick wall when a kernel 
dependency surfaces.



This was my point about building things in a different tree.  I do this 
with tools I use in https://github.com/joelandman/nlytiq-base , which 
gives me a consistent set of tools regardless of the platform.


Unfortunately, some of the software integrates Conda, which makes it 
actually harder to integrate what you need.  Julia, for all its 
benefits, is actually hard to build packages for such that they don't 
use Conda.



In biology apps of late there is a distressing tendency for software 
to only be supported in a distribution form which is essentially an 
entire OS worth of libraries packaged with the one (often very small) 
program I actually want to run.  (See "bioconda".)  Most of these 
programs will build just fine from source even on CentOS 6, but often 
the only way to download a binary for them is to accept an additional 
1Gb (or more) of other stuff.



Yeah, this has become common across many fields.  Containers become the 
new binaries, so you don't have to live with/accept the platform based 
restrictions.  This was another point of mine.  And Greg K @Sylabs is 
getting free exposure here :D



--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Oh.. IBM eats Red Hat

2018-10-29 Thread Joe Landman



On 10/29/18 12:21 PM, Ryan Novosielski wrote:

[...]


So, yeah, the LTS is based upon bleeding edge.  This said, I've not
seen much broken from Ubuntu recently, though I personally dislike
netplan.io.  YAML based configuration is not a feature, rather it
is a bug.

Ubuntu 17.10 shipped with something broken related to
DNS/systemd-resolved that wasn't fixed for the entire release.
systemd-resolve will claim that a certain DNS server is in use, direct
queries to that DNS server work, but queries to the systemd-resolved
resolver return NXDOMAIN. Clearing the cache doesn't help.

Yes, you can turn that off, but I'm counting that as something broken.



Interesting.  I was unaware of it, as I usually prevent systemd from 
handling DNS.


The big ones I know of are the gcc 2.96 from RH, and the broken perl RH 
shipped for about a decade.  Doesn't surprise me that ubuntu non-LTS are 
potentially broken (bleeding edge).



--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Oh.. IBM eats Red Hat

2018-10-29 Thread Joe Landman



On 10/29/18 11:04 AM, Robert G. Brown wrote:

On Mon, 29 Oct 2018, Tony Brian Albers wrote:


I've worked for Big Blue, and I'm not sure the company cultures are
compatible to say the least.


I think it will be all right (and yes, look, I'm alive, I'm alive!).



Glad to see that!


[...]



Robert G. Brown   http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525 email:r...@phy.duke.edu -- 


Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Oh.. IBM eats Red Hat

2018-10-29 Thread Joe Landman



On 10/29/18 6:54 AM, INKozin via Beowulf wrote:
exactly my thoughts (even though i have not worked there, talking to 
its employees was enough).

it's attitude towards open source is not exactly promising.
the recent github deal comes to mind but at least MS is declaring to 
be more open towards open source.

and at least there is an alternative in that case - gitlab.
what would be an alternative to RH? certainly not a single one.


Well, Ubuntu/Canonical is a viable option if you want to be able to pay 
for support like you do with RH.  Possibly even SuSE.


FWIW, I've largely moved most of my systems to debian, as they seem to 
be the least likely to go away any time soon, and their license does not 
preclude shipping a system preloaded in a for-profit scenario.  CentOS 
does disallow this.


This said, distributions are less important these days.  You need good 
kvm and container systems for many workloads.  You don't necessarily 
need the distro distribution radius [1], which is a form of vendor lock 
in.  Basically you choose your kernel, userspace and support model to 
fit your hardware and production requirements.  Nothing in that equation 
is locked to a distro. With tools like warewulf[2] and nyble[3], 
distributions can be chosen for each job if you need, with different 
(versions of the same or different) distributions only a boot away.


As tools like Singularity[4] gain in adoption, I expect distros to focus 
on minimal cores to be a substrate for containers and VMs.


Thus the RH acquisition is a "meh" to me.


[1] 
https://scalability.org/2018/04/distribution-package-dependency-radii-or-why-distros-may-be-doomed/


[2] http://warewulf.lbl.gov/

[3] https://github.com/joelandman/nyble

[4] https://www.sylabs.io/




On Mon, 29 Oct 2018 at 07:43, Tony Brian Albers <mailto:t...@kb.dk>> wrote:


https://www.reuters.com/article/us-red-hat-m-a-ibm/ibm-to-acquire-softw
are-company-red-hat-for-34-billion-idUSKCN1N20N3

<https://www.reuters.com/article/us-red-hat-m-a-ibm/ibm-to-acquire-software-company-red-hat-for-34-billion-idUSKCN1N20N3>

I wonder where that places us in the not too distant future..

I've worked for Big Blue, and I'm not sure the company cultures are
compatible to say the least.

/tony

-- 
-- 
Tony Albers

Systems Architect
Systems Director, National Cultural Heritage Cluster
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
___
Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] If I were specifying a new custer...

2018-10-12 Thread Joe Landman




On 10/12/2018 09:38 AM, Gerald Henriksen wrote:

On Fri, 12 Oct 2018 09:24:18 +0100, you wrote:


I think the ARM/Cavium Thunder is going to see a lot of attention.
I saw a report recently from the Bristol/Cray Brunel cluster - they are
offering a range of chemistry codes and OpenFOAM,
compiled up for ARM.
Poke me and I will search for the report - I saw it on a twitter feed.

ARM essentially has 2 problems.


I'd say 3, including what you wrote.

#3  End users are generally loathe to re-compile applications for a new 
processor/architecture, unless it gives them substantial benefit.  GPU 
rewrites were virtually guaranteed, once people got over the learning 
curve, as early (minimal) efforts yielded 5-10x performance bumps.  More 
work, and a rethinking of the application gave significant benefit.


ARM doesn't and as far as I can tell, won't have this advantage. The 
only advantage it brings potentially is power consumption per cycle.  
And this advantage evaporates once you start looking at the high 
computational power chips.


Recycling an old joke on this, ARM is the CPU of the future, and always 
will be.


Its not ABI compatible, ISA compatible.  Its not "blow the doors off" 
faster.  Its not (in the performance configurations) lower power.


Exactly what is the market draw of these processors?  What niche are 
they seeking to fill, and what unique advantages does it bring? These 
are not apparent.


Just my thoughts, but I've worked with some ARM product builders in the 
past, and have been burned by the misalignment between reality and rhetoric.



--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] its going to be big

2018-10-12 Thread Joe Landman


Very nice!


On 10/12/2018 02:56 AM, Stu Midgley wrote:

https://www.datacenterknowledge.com/supercomputers/skybox-build-houston-data-center-massive-oil-and-gas-supercomputer

https://www.dug.com/blog/dug-announces-unique-cloud-service-geophysics-industry/

https://www.businesswire.com/news/home/20181011005476/en/Australia’s-DownUnder-GeoSolutions-Selects-Skybox-Datacenters-Houston



--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] SIMD exception kernel panic on Skylake-EP triggered by OpenFOAM?

2018-09-09 Thread Joe Landman

el_rapl iosf_mbi mgag200 ttm 
drm_kms_helper irqbypass syscopyarea sysfillrect crc32_pclmul 
sysimgblt iTCO_wdt fb_sys_fops ib_ucm iTCO_vendor_support 
ghash_clmulni_intel rdma_ucm dm_mod dcdbas drm ib_uverbs aesni_intel 
lrw gf128mul glue_helper ablk_helper cryptd mei_me sg lpc_ich i2c_i801 
shpchp ib_umad mei ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm 
tpm_crb acpi_pad acpi_power_meter binfmt_misc overlay(OET) osc(OE) 
mgc(OE) lustre(OE) lmv(OE) fld(OE) mdc(OE) fid(OE) lov(OE) 
ko2iblnd(OE) rdma_cm iw_cm ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) 
ib_ipoib ib_cm sd_mod sr_mod cdrom crc_t10dif crct10dif_generic hfi1 
rdmavt i2c_algo_bit ahci i2c_core crct10dif_pclmul libahci 
crct10dif_common crc32c_intel ib_core libata megaraid_sas pps_core 
libcrc32c [last unloaded: pcspkr]
2018-09-07 22:37:16 [201527.264079] CPU: 17 PID: 32227 Comm: 
shuangTwoPhaseE Tainted: P    W  OE   T 
3.10.0-862.9.1.el7.x86_64 #1
2018-09-07 22:37:16 [201527.275789] Hardware name: Dell Inc. PowerEdge 
R740/06G98X, BIOS 1.4.8 05/21/2018
2018-09-07 22:37:16 [201527.284045] task: 9e345a42eeb0 ti: 
9e2f88a0c000 task.ti: 9e2f88a0c000
2018-09-07 22:37:16 [201527.292302] RIP: 0010:[] 
[] apic_timer_interrupt+0x141/0x170
2018-09-07 22:37:16 [201527.301978] RSP: :9e345c006200 EFLAGS: 
00010082
2018-09-07 22:37:16 [201527.308091] RAX: 9e2f88a0ff70 RBX: 
7fffe0d21d78 RCX: 0090
2018-09-07 22:37:16 [201527.316032] RDX:  RSI: 
9e345c006200 RDI: 9e2f88a0ff70
2018-09-07 22:37:16 [201527.323969] RBP: 7fffe0d21d78 R08: 
0001e800 R09: 07a0
2018-09-07 22:37:16 [201527.331906] R10:  R11: 
02818868 R12: 02d20790
2018-09-07 22:37:16 [201527.339839] R13: 7fffe0d159d0 R14: 
7fffe0d15b40 R15: 7fffe0d15a20
2018-09-07 22:37:16 [201527.347772] FS:  2b835d26da00() 
GS:9e345c00() knlGS:
2018-09-07 22:37:16 [201527.356659] CS:  0010 DS:  ES:  CR0: 
80050033
2018-09-07 22:37:16 [201527.363209] CR2: 03a6ff88 CR3: 
002fdd8f6000 CR4: 007607e0
2018-09-07 22:37:16 [201527.371144] DR0:  DR1: 
 DR2: 
2018-09-07 22:37:16 [201527.379079] DR3:  DR6: 
fffe0ff0 DR7: 0400

2018-09-07 22:37:16 [201527.387010] PKRU: 5554
2018-09-07 22:37:16 [201527.390523] Call Trace:
2018-09-07 22:37:16 [201527.393780] Code: 48 39 cc 77 2f 48 8d 81 00 
fe ff ff 48 39 e0 77 23 57 48 29 e1 65 48 8b 3c 25 78 0e 01 00 48 83 
c7 28 48 29 cf 48 89 f8 48 89 e6  a4 48 89 c4 5f 48 89 e6 65 ff 04 
25 60 0e 01 00 65 48 0f 44
2018-09-07 22:37:16 [201527.415810] RIP [] 
apic_timer_interrupt+0x141/0x170

2018-09-07 22:37:16 [201527.423189]  RSP 
2018-09-07 22:37:16 [201527.428646] ---[ end trace a6a14aed798e889f ]---
2018-09-07 22:37:17 [201527.477875] Kernel panic - not syncing: Fatal 
exception
2018-09-07 22:37:17 [201527.484041] Kernel Offset: 0x2500 from 
0x8100 (relocation range: 
0x8000-0xbfff)


--8< snip snip 8<------


All the best!
Chris


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA

2018-08-18 Thread Joe Landman

FWIW: it looks like this is the CVE that keeps on giving. Yesterday some 
of the mitigation hit, and this morning a new rev of kernel with a 
single CVE patch came out.   Don't know when it might show up in distro 
kernels, but its already in mine.


We are not done with Spectre/Meltdown vulns by any stretch (no insider 
info, just a hypothesis).



On 08/18/2018 03:19 PM, Jeff Johnson wrote:
With the spate of security flaws over the past year and the impacts 
their fixes have on performance and functionality it might be 
worthwhile to just run airgapped.



On Thu, Aug 16, 2018 at 22:48 Chris Samuel <mailto:ch...@csamuel.org>> wrote:


Hi all,

Just a heads up that the 3.10.0-862.11.6.el7.x86_64 kernel from
RHEL/CentOS
that was released to address the most recent Intel CPU problem
"L1TF" seems to
break RDMA (found by a colleague here at Swinburne).   The
discovery came
about when testing the new kernel on a system running Lustre.

https://jira.whamcloud.com/browse/LU-11257

Stanford have reported it to Red Hat, but the BZ entry is locked
due to its
relationship with L1TF.

https://bugzilla.redhat.com/show_bug.cgi?id=1618452

Hope this helps folks out there..

All the best,
Chris
-- 
 Chris Samuel  : http://www.csamuel.org/ :  Melbourne, VIC




___
Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

--
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com <mailto:jeff.john...@aeoncomputing.com>
www.aeoncomputing.com <http://www.aeoncomputing.com>
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage


___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Jupyter and EP HPC

2018-07-27 Thread Joe Landman




On 07/27/2018 02:47 PM, Lux, Jim (337K) wrote:


I’ve just started using Jupyter to organize my Pythonic ramblings..

What would be kind of cool is to have a high level way to do some 
embarrassingly parallel python stuff, and I’m sure it’s been done, but 
my google skills appear to be lacking (for all I know there’s someone 
at JPL who is doing this, among the 6000 people doing stuff here).


What I’m thinking is this:

I have a high level python script that iterates through a set of data 
values for some model parameter, and farms out running the model to 
nodes on a cluster, but then gathers the results back.


So, I’d have N copies of the python model script on the nodes.

Almost like a pythonic version of pdsh.

Yeah, I’m sure I could use lots of subprocess() and execute() stuff 
(heck, I could shell pdsh), but like with all things python, someone 
has probably already done it before and has all the nice hooks into 
the Ipython kernel.




I didn't do this with ipython or python ... but this was effectively the 
way I parallelized NCBI BLAST in 1998-1999 or so.  Wrote a perl script 
to parse args, construct jobs, move data, submit/manage jobs, recover 
results, reassemble output.  SGI turned that into a product.


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Lustre Upgrades

2018-07-25 Thread Joe Landman




On 07/25/2018 04:36 PM, Prentice Bisbal wrote:


Paging Dr. Joe Landman, paging Dr. Landman...



My response was

"I'd seen/helped build/benchmarked some very nice/fast CephFS based 
storage systems in $dayjob-1.  While it is a neat system, if you are 
focused on availability, scalability, and performance, its pretty hard 
to beat BeeGFS.  We'd ($dayjob-1) deployed several very large/fast file 
systems with it on our spinning rust, SSD, and NVMe units."


at the bottom of the post.

Yes, BeeGFS compares very favorably to Lustre across performance, 
management, resiliency dimensions.  Distributed replicated metadata and 
data is possible, atop zfs, xfs, etc.  We sustained >  40GB/s in a 
single rack of spinning disk in 2014 at a customer site using it, no 
SSD/cache implicated, and using 56Gb IB throughout.  Customer wanted to 
see us sustain 46+GB/s writes, and we did.


These are some of our other results with it:

https://scalability.org/2014/05/massive-unapologetic-firepower-2tb-write-in-73-seconds/

https://scalability.org/2014/10/massive-unapologetic-firepower-part-2-the-dashboard/
(that was my first effort with Grafana, and look at the writes ... 
vertical scale is 10k MB/s, aka 10GB/s increments.


W.r.t. BeeGFS, very easy to install, you can set it up trivially on 
extra hardware to see it in action.  Won't be as fast as my old stuff, 
but that's the price people pay for not buying the good stuff when it 
was available.



Prentice
On 07/24/2018 10:19 PM, James Burton wrote:
Does anyone have any experience with how BeeGFS compares to Lustre? 
We're looking at both of those for our next generation HPC storage 
system.


Is CephFS a valid option for HPC now? Last time I played with CephFS 
it wasn't ready for prime time, but that was a few years ago.


On Tue, Jul 24, 2018 at 10:58 AM, Joe Landman <mailto:joe.land...@gmail.com>> wrote:




On 07/24/2018 10:31 AM, John Hearns via Beowulf wrote:

Forgive me for saying this, but the philosophy for software
defined storage such as CEPH and Gluster is that forklift
style upgrades should not be necessary.
When a storage server is to be retired the data is copied
onto the new server then the old one taken out of service.
Well, copied is not the correct word, as there are
erasure-coded copies of the data. Rebalanced is probaby a
better word.


This ^^

I'd seen/helped build/benchmarked some very nice/fast CephFS
based storage systems in $dayjob-1.  While it is a neat system,
if you are focused on availability, scalability, and performance,
its pretty hard to beat BeeGFS.  We'd ($dayjob-1) deployed
several very large/fast file systems with it on our spinning
rust, SSD, and NVMe units.


    -- 
Joe Landman

e: joe.land...@gmail.com <mailto:joe.land...@gmail.com>
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman
<https://www.linkedin.com/in/joelandman>


___
Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>




--
James Burton
OS and Storage Architect
Advanced Computing Infrastructure
Clemson University Computing and Information Technology
340 Computer Court
Anderson, SC 29625
(864) 656-9047


___
Beowulf mailing list,Beowulf@beowulf.org  sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) 
visithttp://www.beowulf.org/mailman/listinfo/beowulf




___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread Joe Landman




On 07/24/2018 11:06 AM, John Hearns via Beowulf wrote:

Joe, sorry to split the thread here. I like BeeGFS and have set it up.
I have worked for two companies now who have sites around the world, 
those sites being independent research units. But HPC facilities are 
in headquarters.
The sites want to be able to drop files onto local storage yet have it 
magically appear on HPC storage, and same with the results going back 
the other way.


One company did this well with GPFS and AFM volumes.
For the current company, I looked at gluster and Gluster 
geo-replication is one way only.
What do you know of the BeeGFS mirroring? Will it work over long 
distances? (Note to me - find out yourself you lazy besom)


This isn't the use case for most/all cluster file systems.   This is 
where distributed object systems and buckets rule.


Take your file, dump it into an S3 like bucket on one end, pull it out 
of the S3 like bucket on the other.  If you don't want to use get/put 
operations, then use s3fs/s3ql.  You can back this up with replicating 
EC minio stores (will take a few minutes to set up ... compare that to 
others).


The down side to this is that minio has limits of about 16TiB last I 
checked.   If you need more, replace minio with another system (igneous, 
ceph, etc.).  Ping me offline if you want to talk more.


[...]

--
Joe Landman
e:joe.land...@gmail.com
t: @hpcjoe
w:https://scalability.org
g:https://github.com/joelandman
l:https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread Joe Landman




On 07/24/2018 11:06 AM, John Hearns via Beowulf wrote:

Joe, sorry to split the thread here. I like BeeGFS and have set it up.
I have worked for two companies now who have sites around the world, 
those sites being independent research units. But HPC facilities are 
in headquarters.
The sites want to be able to drop files onto local storage yet have it 
magically appear on HPC storage, and same with the results going back 
the other way.


One company did this well with GPFS and AFM volumes.
For the current company, I looked at gluster and Gluster 
geo-replication is one way only.
What do you know of the BeeGFS mirroring? Will it work over long 
distances? (Note to me - find out yourself you lazy besom)


This isn't the use case for most/all cluster file systems.   This is 
where distributed object systems and buckets rule.


Take your file, dump it into an S3 like bucket on one end, pull it out 
of the S3 like bucket on the other.  If you don't want to use get/put 
operations, then use s3fs/s3ql.  You can back this up with replicating 
EC minio stores (will take a few minutes to set up ... compare that to 
others).


The down side to this is that minio has limits of about 16TiB last I 
checked.   If you need more, replace minio with another system (igneous, 
ceph, etc.).  Ping me offline if you want to talk more.


[...]

--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread Joe Landman




On 07/24/2018 11:06 AM, John Hearns via Beowulf wrote:

Joe, sorry to split the thread here. I like BeeGFS and have set it up.
I have worked for two companies now who have sites around the world, 
those sites being independent research units. But HPC facilities are 
in headquarters.
The sites want to be able to drop files onto local storage yet have it 
magically appear on HPC storage, and same with the results going back 
the other way.


One company did this well with GPFS and AFM volumes.
For the current company, I looked at gluster and Gluster 
geo-replication is one way only.
What do you know of the BeeGFS mirroring? Will it work over long 
distances? (Note to me - find out yourself you lazy besom)


This isn't the use case for most/all cluster file systems.   This is 
where distributed object systems and buckets rule.


Take your file, dump it into an S3 like bucket on one end, pull it out 
of the S3 like bucket on the other.  If you don't want to use get/put 
operations, then use s3fs/s3ql.  You can back this up with replicating 
EC minio stores (will take a few minutes to set up ... compare that to 
others).


The down side to this is that minio has limits of about 16TiB last I 
checked.   If you need more, replace minio with another system (igneous, 
ceph, etc.).  Ping me offline if you want to talk more.


[...]

--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread Joe Landman




On 07/24/2018 10:31 AM, John Hearns via Beowulf wrote:
Forgive me for saying this, but the philosophy for software defined 
storage such as CEPH and Gluster is that forklift style upgrades 
should not be necessary.
When a storage server is to be retired the data is copied onto the new 
server then the old one taken out of service. Well, copied is not the 
correct word, as there are erasure-coded copies of the data. 
Rebalanced is probaby a better word.


This ^^

I'd seen/helped build/benchmarked some very nice/fast CephFS based 
storage systems in $dayjob-1.  While it is a neat system, if you are 
focused on availability, scalability, and performance, its pretty hard 
to beat BeeGFS.  We'd ($dayjob-1) deployed several very large/fast file 
systems with it on our spinning rust, SSD, and NVMe units.



--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Working for DUG, new thead

2018-06-19 Thread Joe Landman




On 6/19/18 2:47 PM, Prentice Bisbal wrote:


On 06/13/2018 10:32 PM, Joe Landman wrote:


I'm curious about your next gen plans, given Phi's roadmap.


On 6/13/18 9:17 PM, Stu Midgley wrote:
low level HPC means... lots of things.  BUT we are a huge Xeon Phi 
shop and need low-level programmers ie. avx512, careful cache/memory 
management (NOT openmp/compiler vectorisation etc).


I played around with avx512 in my rzf code. 
https://github.com/joelandman/rzf/blob/master/avx2/rzf_avx512.c .  
Never really spent a great deal of time on it, other than noting that 
using avx512 seemed to downclock the core a bit on Skylake.


If you organize your code correctly, and call the compiler with the 
right optimization flags, shouldn't the compiler automatically handle 
a good portion of this 'low-level' stuff? 


I wish it would do it well, but it turns out it doesn't do a good job.   
You have to pay very careful attention to almost all aspects of making 
it simple for the compiler, and then constraining the directions it 
takes with code gen.


I explored this with my RZF stuff.  It turns out that with -O3, gcc (5.x 
and 6.x) would convert a library call for the power function into an FP 
instruction.  But it would use 1/8 - 1/4 of the XMM/YMM register width, 
not automatically unroll loops, or leverage the vector nature of the 
problem.


Basically, not much has changed in 20+ years ... you annotate your code 
with pragmas and similar, or use instruction primitives and give up on 
the optimizer/code generator.


When it comes down to it, compilers aren't really as smart as many of us 
would like.  Converting idiomatic code into efficient assembly isn't 
what they are designed for.  Rather correct assembly.  Correct doesn't 
mean efficient in many cases, and some of the less obvious optimizations 
that we might think to be beneficial are not taken. We can hand modify 
the code for this, and see if these optimizations are beneficial, but 
the compilers often are not looking at a holistic problem.


I understand that hand-coding this stuff usually still give you the 
best performance (See GotoBLAS/OpenBLAS, for example), but does your 
average HPC programmer trying to get decent performance need to 
hand-code that stuff, too?


Generally, yes.  Optimizing serial code for GPUs doesn't work well. 
Rewriting for GPUs (e.g. taking into account the GPU data/compute flow 
architecture) does work well.


--

Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Working for DUG, new thead

2018-06-13 Thread Joe Landman

I'm curious about your next gen plans, given Phi's roadmap.

On 6/13/18 9:17 PM, Stu Midgley wrote:
low level HPC means... lots of things.  BUT we are a huge Xeon Phi 
shop and need low-level programmers ie. avx512, careful cache/memory 
management (NOT openmp/compiler vectorisation etc).

I played around with avx512 in my rzf code. 
https://github.com/joelandman/rzf/blob/master/avx2/rzf_avx512.c  . Never 
really spent a great deal of time on it, other than noting that using 
avx512 seemed to downclock the core a bit on Skylake.

Which dev/toolchain are you using for Phi?  I set up the MPSS bit for a 
customer, and it was pretty bad (2.6.32 kernel, etc.).  Flaky control 
plane, and a painful host->coprocessor interface.  Did you develop your 
own?  Definitely curious.

On Thu, Jun 14, 2018 at 1:08 AM Jonathan Engwall 
<mailto:engwalljonathanther...@gmail.com>> wrote:

John Hearne wrote:
> Stuart Midgley works for DUG?  They are currently
> recruiting for an HPC manager in London... Interesting...

Recruitment at DUG wants to call me about Low Level HPC. I have at
least until 6pm.
I am excited but also terrified. My background is C and now
JavaScript, mostly online course work and telnet MUDs.
Any suggestions are very much needed.
What must a "low level HPC" know on day 1???
Jonathan Engwall
engwalljonathanther...@gmail.com
<mailto:engwalljonathanther...@gmail.com>

___
Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

--
Dr Stuart Midgley
sdm...@gmail.com <mailto:sdm...@gmail.com>

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
c: +1 734 612 4615
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Working for DUG, new thead

2018-06-13 Thread Joe Landman




On 6/13/18 9:39 PM, Stu Midgley wrote:

seismic processing - oil'n'gas not hard rock.


But ... you work with heavy metal (supercomputers that is) :D

Good to see you active here, BTW!



On Thu, Jun 14, 2018 at 3:32 AM Jonathan Engwall 
<mailto:engwalljonathanther...@gmail.com>> wrote:


Yes. Very good idea, it is a mining company in Ausralia.



--
Dr Stuart Midgley
sdm...@gmail.com <mailto:sdm...@gmail.com>


___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
c: +1 734 612 4615
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Fwd: Project Natick

2018-06-07 Thread Joe Landman




On 06/07/2018 11:18 AM, Douglas Eadline wrote:

  -snip-

i'm not sure i see a point in all this anyhow, it's a neat science
experiment, but what's the ROI on sinking a container full of servers
vs just pumping cold seawater from 100ft down


I had the same thought. You could even do a salt water/clear water
heat exchange and not have the salt water near the servers.

 From a risk perspective, failure under 100 ft of sea water
would seem to much more catastrophic vs failure on land and
cooling with pumped water (maybe I read too much N.N. Taleb).


Imagine 100kW or so ... suddenly discovering that the neat little hole 
in the pipe enables this highly conductive ionic fluid to short ... 
somewhere between 1V and 12V DC.  10's to 100's of thousands of Amps.  I 
wouldn't wanna be anywhere near that when it lets go.







--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] FPGA storage accelerator

2018-06-06 Thread Joe Landman


Ha! "Then maybe, just maybe, perhaps 640 PB ought to be enough. Maybe."

Back in 2005 or so, we had these little USB connected FPGA that we were 
using for various annotation tools.  I remember one for BLAST, and doing 
HMMer on GPUs.  I seem to remember shocking the person from NVidia at 
SC2006 on the performance of the whole GPU HMMer app.  They were busy 
talking about kernels getting 10-30x performance gain, and we had a 
whole app that did that.


That was the tail end of my time trying to raise money to build 
accelerators.  Good times.



On 06/06/2018 09:31 PM, James Cuff wrote:


Hi team,

Stumbled across this tech the other day and wrote a little piece about 
it.  Feels a bit like the old days to me. What do we think? It’s kinda 
fascinating, hope to catch up with a few ‘wulfers at ISC later this 
month.


https://www.nextplatform.com/2018/06/06/thanks-for-the-memories/

Best,

J.

--

--
Dr. James Cuff
The Next Platform
https://www.nextplatform.com/author/jamescuff/
https://linkedin.com/in/jamesdotcuff
https://twitter.com/jamesdotcuff


___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] OT, X11 editor which works well for very remote systems?

2018-06-06 Thread Joe Landman


typed on a phone, so auto-co-wrecked ... VNC


On 06/06/2018 05:56 PM, David Mathog wrote:

Thanks for all the responses.

On 06-Jun-2018 14:40, Joe Landman wrote:

When I absolutely need a gui for something this this, I'll light up
BBC over ssh session.  Performance has been good even crossing the big
pond.


What is BBC?  Google wasn't much help given the British Broadcasting 
Company's enormous web footprint.


Yes, nedit.  It is old but it still works (especially for column 
operations, which I use a lot).  Admittedly it is useless with utf 
encoded text, but 99.999% of what I do is ANSI, so that is rarely an 
issue.


Regards,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] OT, X11 editor which works well for very remote systems?

2018-06-06 Thread Joe Landman

Wait ... nedit?  I wrote my thesis with that (LaTeX) some (mumble) decades 
ago ...


On June 6, 2018 5:28:30 PM David Mathog  wrote:


Off Topic.

I need to do some work on a system 3000 miles away.  No problem
connecting to it with ssh or setting X11 forwarding, but the delays are
such that my usual editor (nedit) spends far too much time redrawing to
be useful.  Resizing a screen is particularly painful.

Are there any X11 GUI editors that are less sensitive to these issues?

If not I will just use nano or vim.

Thanks,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf




___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] OT, X11 editor which works well for very remote systems?

2018-06-06 Thread Joe Landman

When I absolutely need a gui for something this this, I'll light up BBC 
over ssh session.  Performance has been good even crossing the big pond.


This said, vim handles this nicely as well.

On June 6, 2018 5:28:30 PM David Mathog  wrote:


Off Topic.

I need to do some work on a system 3000 miles away.  No problem
connecting to it with ssh or setting X11 forwarding, but the delays are
such that my usual editor (nedit) spends far too much time redrawing to
be useful.  Resizing a screen is particularly painful.

Are there any X11 GUI editors that are less sensitive to these issues?

If not I will just use nano or vim.

Thanks,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf




___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Bright Cluster Manager

2018-05-03 Thread Joe Landman

 of machines is, as you always needs
an interconnect fabric and storage, so why not have the same for both 
types of tasks.
Maybe one further quip to stimulate some conversation. Silicon is 
cheap. No, really it is.
Your friendly Intel salesman may wince when you say that. After all 
those lovely Xeon CPUs cost north of 1000 dollars each.

But again I throw in some talking points:

power and cooling costs the same if not more than your purchase cost 
over several years


are we exploiting all the capabilities of those Xeon CPUs

































































On 3 May 2018 at 15:04, Douglas Eadline <deadl...@eadline.org 
<mailto:deadl...@eadline.org>> wrote:




Here is where I see it going

1. Computer nodes with a base minimal generic Linux OS
   (with PR_SET_NO_NEW_PRIVS in kernel, added in 3.5)

2. A Scheduler (that supports containers)

3. Containers (Singularity mostly)

All "provisioning" is moved to the container. There will be edge
cases of
course, but applications will be pulled down from
a container repos and "just run"

--
Doug


> I never used Bright.Â  Touched it and talked to a salesperson at a
> conference but I wasn't impressed.
>
> Unpopular opinion: I don't see a point in using "cluster managers"
> unless you have a very tiny cluster and zero Linux experience.Â 
These
> are just Linux boxes with a couple applications (e.g. Slurm) running on
> them.Â  Nothing special. xcat/Warewulf/Scyld/Rocks just get in
the way
> more than they help IMO.Â  They are mostly crappy wrappers
around free
> software (e.g. ISC's dhcpd) anyway.Â  When they aren't it's
proprietary
> trash.
>
> I install CentOS nodes and use
> Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my
configs and
> software.Â  This also means I'm not suck with "node images" and can
> instead build everything as plain old text files (read: write SaltStack
> states), update them at will, and push changes any time.Â  My "base
> image" is CentOS and I need no "baby's first cluster" HPC software to
> install/PXEboot it.Â  YMMV
>
>
> Jeff White
>
> On 05/01/2018 01:57 PM, Robert Taylor wrote:
>> Hi Beowulfers.
>> Does anyone have any experience with Bright Cluster Manager?
>> My boss has been looking into it, so I wanted to tap into the
>> collective HPC consciousness and see
>> what people think about it.
>> It appears to do node management, monitoring, and provisioning,
so we
>> would still need a job scheduler like lsf, slurm,etc, as well.
Is that
>> correct?
>>
>> If you have experience with Bright, let me know. Feel free to
contact
>> me off list or on.
>>
>>
>>
>> ___
>> Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>>

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.beowulf.org_mailman_listinfo_beowulf=DwIGaQ=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw=DhM5WMgdrH-xWhI5BzkRTzoTvz8C-BRZ05t9kW9SXZk=2km_EqLvNf2v9rNf8LphAYkJ-Sc_azfEyHqyDIzpLOc=kq0wdhy80VqcBCwcQAAQa0RbsgWIekhd0qU0zC81g1Q=

<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.beowulf.org_mailman_listinfo_beowulf=DwIGaQ=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw=DhM5WMgdrH-xWhI5BzkRTzoTvz8C-BRZ05t9kW9SXZk=2km_EqLvNf2v9rNf8LphAYkJ-Sc_azfEyHqyDIzpLOc=kq0wdhy80VqcBCwcQAAQa0RbsgWIekhd0qU0zC81g1Q=>
>
>
> --
> MailScanner: Clean
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>
>


-- 
Doug


-- 
MailScanner: Clean


___
Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>




___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Joe Landman

ACML is hand coded assembly.  Not likely that OpenBLAS will be much 
better.  Could be similar.  c.f. 
http://gcdart.blogspot.co.uk/2013/06/fast-matrix-multiply-and-ml.html




On 02/22/2018 05:48 PM, Prentice Bisbal wrote:
Just rebuilt OpenBLAS 0.2.20 locally on the test system with GCC 
6.1.0, and I'm only getting 91 GFLOPS. I'm pretty sure OpenBLAS 
performance should be close to ACML performance, if not better. I'll 
have to dig into this later. For now, I'm going to continue my testing 
using the ACML-based build and revisit the OpenBLAS performance later.


Prentice

On 02/22/2018 05:27 PM, Prentice Bisbal wrote:
So I just rebuilt HPL using the ACML 6.1.0 libraries with GCC 6.1.0, 
and I'm now getting 197 GFLOPS, so clearly there's a problem with my 
OpenBLAS build. I'm going to try building OpenBLAS without the 
dynamic arch support on the machine where I plan on running my tests, 
and see if that version of the library is any better.


Prentice

On 02/22/2018 09:37 AM, Prentice Bisbal wrote:

Beowulfers,

In your experience, how close does actual performance of your 
processors match up to their theoretical performance? I'm 
investigating a performances issue on some of my nodes. These are 
older systems using AMD Opteron 6274 processors. I found literature 
from AMD stating the theoretical performance of these processors is 
282 GFLOPS, and my LINPACK performance isn't coming close to that (I 
get approximately ~33% of that).  The number I often hear mentioned 
is actual performance should be ~85%. of theoretical performance is 
that a realistic number your experience?


I don't want this to be a discussion of what could be wrong at this 
point, we will get to that in future posts, I assure you!






___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Joe Landman


which compiler are you using, and what options are you compiling it with?


On 02/22/2018 11:48 AM, Prentice Bisbal wrote:

On 02/22/2018 10:44 AM, Michael Di Domenico wrote:

i can't speak to AMD, but using HPL 2.1 on Intel using the Intel
compiler and the Intel MKL, i can hit 90% without issue.  no major
tuning either

if you're at 33% i would be suspect of your math library


I'm using OpenBLAS 0.29 with dynamic architecture support,  but I'm 
thinking of switching to using ACML for this test, to remove the 
possibility that it's a problem with my OpenBLAS build.


___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Theoretical vs. Actual Performance

2018-02-22 Thread Joe Landman




On 02/22/2018 09:37 AM, Prentice Bisbal wrote:

Beowulfers,

In your experience, how close does actual performance of your 
processors match up to their theoretical performance? I'm 
investigating a performances issue on some of my nodes. These are 
older systems using AMD Opteron 6274 processors. I found literature 
from AMD stating the theoretical performance of these processors is 
282 GFLOPS, and my LINPACK performance isn't coming close to that (I 
get approximately ~33% of that).  The number I often hear mentioned is 
actual performance should be ~85%. of theoretical performance is that 
a realistic number your experience?


85% makes the assumption that you have the systems configured in an 
optimal manner, that the compiler doesn't do anything wonky, and that, 
to some degree, you isolate the OS portion of the workload off of most 
of the cores to reduce jitter.   Among other things.


At Scalable, I'd regularly hit 60-90 % of theoretical max computing 
performance, with progressively more heroic tuning.   Storage, I'd 
typically hit 90-95% of theoretical max (good architectures almost 
always beat bad ones).  Networking, fairly similar, though tuning per 
use case mattered significantly.




I don't want this to be a discussion of what could be wrong at this 
point, we will get to that in future posts, I assure you!




--
Joe Landman
t: @hpcjoe
w: https://scalability.org

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Intel CPU design bug & security flaw - kernel fix imposes performance penalty

2018-01-03 Thread Joe Landman

Looks like it will respond to a 'nopti' boot option (at least the 
patches I've seen from 4-Dec)




On 01/03/2018 12:57 PM, Ellis H. Wilson III wrote:

On 01/03/2018 12:47 PM, Lux, Jim (337K) wrote:

I suppose the down side is that if they do kernel mods to fix this
for the 99.9%, it adversely affects the performance for the 0.1%
(that is, us).


We've been discussing this extensively at my workplace, and the 
overwhelming expectation is that at least in Linux the fix should be 
configurable such that those operating in non-multitenant systems 
(such as scale-out storage appliances) can disable it.


If this ends up not being the case, I would expect it in the 
short-term to lock us out of upgrading to newer kernels where the fix 
and resultant overheads come into play until we're on newer CPUs where 
the architecture deficiency is resolved.  This latter part (the 
expectation of Intel fixing it in their newer HW) is all the more 
reason I'm inclined to believe the fix will be delivered as a tunable.


Best,

ellis



--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Openlava down?

2017-12-23 Thread Joe Landman




On 12/23/2017 05:49 PM, Jeffrey Layton wrote:
I tried it but it doesn't come up as the job scheduler - just 
capabilities of a company. Hmm..


FYI:  https://soylentnews.org/article.pl?sid=16/11/06/0254233

--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Julia Language

2017-09-19 Thread Joe Landman

It is very good.  Think "python done right".  Have a look at this:  
https://learnxinyminutes.com/docs/julia/


And some stuff I wrote: 
https://scalability.org/2017/06/on-hackerrank-and-julia/ and 
https://scalability.org/2017/06/the-birthday-problem-allocation-collisions-for-networks-and-mac-addresses/




On 09/19/2017 04:10 PM, Jeffrey Layton wrote:

John,

Have you done much Julia coding? Can you talk about your experience?

I have threatened to learn it for a while but your post has prompted 
me to finally start learning Julia :)


Thanks!

Jeff


On Wed, Sep 13, 2017 at 7:43 AM, John Hearns via Beowulf 
<beowulf@beowulf.org <mailto:beowulf@beowulf.org>> wrote:


I see HPCwire has an article on Julia.  I am a big fan of Julia,
so though it worth pointing out.
https://www.hpcwire.com/off-the-wire/julia-joins-petaflop-club/
<https://www.hpcwire.com/off-the-wire/julia-joins-petaflop-club/>
Though the source of this seems old news - it is a presentation
from this year's JuliaCon

JuliaCon 2018 will be talking place at UCL in London so mark your
diaries. Yours truly should be there.


___
Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>




___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Weird blade performs worse as more cpus are used?

2017-09-14 Thread Joe Landman




On 09/14/2017 11:34 AM, Faraz Hussain wrote:


Earlier I had posted about one of our blades running 30-50% slower 
than other  ones despite having identical hardware and OS. I followed 
the suggestions and compared cpu temperature, memory, dmesg and 
sysctl. Everything looks the same.


I then used "perf stat" to compare speed of pigz ( parralel gzip ). 
The results are quite interesting. Using one cpu, the slow blade is as 
fast as the rest! But as I use more cpus, the speed decreases linearly 
from 3.1Ghz to 0.4 Ghz. See snippets from "perf stat" command below. 
All tests were on /tmp to eliminate any nfs issue. And same behavior 
is observed with any multi-threaded program.


What does numastat report?  /tmp is a ramdisk or tmpfs?  Are the 
nodes/cpus otherwise idle?  What does lscpu on a good/bad node report?


If it decreases on a 1/Ncpu curve, then you have a fixed sized resource 
bandwidth contention issue you are fighting.   The question is what.




--

Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Varying performance across identical cluster nodes.

2017-09-14 Thread Joe Landman

   fine, and when I installed CentOS 6 on a local disk,
the nodes worked fine.

Any ideas where to look or what to tweak to fix this?
Any idea why this is only occuring with RHEL 6 w/ NFS
root OS?


___
Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe)
visit http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>




-- 
- Andrew "lathama" Latham lath...@gmail.com

<mailto:lath...@gmail.com> http://lathama.com
<http://lathama.org> -

___
Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe)
visit http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>





___
Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>




___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Varying performance across identical cluster nodes.

2017-09-13 Thread Joe Landman

FWIW:  I gave up on NFS boot a while ago, due in part to problems with 
performance that were hard to track down.  The environment I created to 
do completely ramboot boots at scale, allows me to pivot to NFS if 
desired (boot time switch).  But I rarely use that.  Pure ramboot has 
been a joy to work with as compared to NFS.



On 09/13/2017 01:48 PM, Prentice Bisbal wrote:
Okay, based on the various responses I've gotten here and on other 
lists, I feel I need to clarify things:


This problem only occurs when I'm running our NFSroot based version of 
the OS (CentOS 6). When I run the same OS installed on a local disk, I 
do not have this problem, using the same exact server(s).  For testing 
purposes, I'm using LINPACK, and running the same executable  with the 
same HPL.dat file in both instances.


Because I'm testing the same hardware using different OSes, this 
(should) eliminate the problem being in the BIOS, and faulty hardware. 
This leads me to believe it's most likely a software configuration 
issue, like a kernel tuning parameter, or some other software 
configuration issue.


These are Supermicro servers, and it seems they do not provide CPU 
temps. I do see a chassis temp, but not the temps of the individual 
CPUs. While I agree that should be the first thing I look at, it's not 
an option for me. Other tools like FLIR and Infrared thermometers 
aren't really an option for me, either.


What software configuration, either a kernel a parameter, 
configuration of numad or cpuspeed, or some other setting, could 
affect this?


Prentice

On 09/08/2017 02:41 PM, Prentice Bisbal wrote:

Beowulfers,

I need your assistance debugging a problem:

I have a dozen servers that are all identical hardware: SuperMicro 
servers with AMD Opteron 6320 processors. Every since we upgraded to 
CentOS 6, the users have been complaining of wildly inconsistent 
performance across these 12 nodes. I ran LINPACK on these nodes, and 
was able to duplicate the problem, with performance varying from ~14 
GFLOPS to 64 GFLOPS.


I've identified that performance on the slower nodes starts off fine, 
and then slowly degrades throughout the LINPACK run. For example, on 
a node with this problem, during first LINPACK test, I can see the 
performance drop from 115 GFLOPS down to 11.3 GFLOPS. That constant, 
downward trend continues throughout the remaining tests. At the start 
of subsequent tests, performance will jump up to about 9-10 GFLOPS, 
but then drop to 5-6 GLOPS at the end of the test.


Because of the nature of this problem, I suspect this might be a 
thermal issue. My guess is that the processor speed is being 
throttled to prevent overheating on the "bad" nodes.


But here's the thing: this wasn't a problem until we upgraded to 
CentOS 6. Where I work, we use a read-only NFSroot filesystem for our 
cluster nodes, so all nodes are mounting and using the same exact 
read-only image of the operating system. This only happens with these 
SuperMicro nodes, and only with the CentOS 6 on NFSroot. RHEL5 on 
NFSroot worked fine, and when I installed CentOS 6 on a local disk, 
the nodes worked fine.


Any ideas where to look or what to tweak to fix this? Any idea why 
this is only occuring with RHEL 6 w/ NFS root OS?




___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Varying performance across identical cluster nodes.

2017-09-08 Thread Joe Landman



On 09/08/2017 02:41 PM, Prentice Bisbal wrote:


But here's the thing: this wasn't a problem until we upgraded to 
CentOS 6. Where I work, we use a read-only NFSroot filesystem for our 
cluster nodes, so all nodes are mounting and using the same exact 
read-only image of the operating system. This only happens with these 
SuperMicro nodes, and only with the CentOS 6 on NFSroot. RHEL5 on 
NFSroot worked fine, and when I installed CentOS 6 on a local disk, 
the nodes worked fine.


Any ideas where to look or what to tweak to fix this? Any idea why 
this is only occuring with RHEL 6 w/ NFS root OS?




Sounds suspiciously like a network or other driver running hard in a 
tight polling mode causing a growing number of CSW/Ints over time. Since 
these are opteron (really? still in use?)  chances are you might have a 
firmware issue on the set of slower nodes, that had been corrected on 
the other nodes.   With NFS root, if you have a node locking a 
particular file that the other nodes want to write to, the node can 
appear slow while it waits on the IO.


You might try running dstat and saving output into a file from boot 
onwards.  Then run the tests, and see if the int or CSW are being driven 
very high.  Pay attention to the usr/idl and other percentages.


You can also grab temperature stats.  Helps if you have ipmi.

ipmitool sdr

 ipmitool sdr | grep Temp
CPU1 Temp| 35 degrees C  | ok
CPU2 Temp| 35 degrees C  | ok
System Temp  | 35 degrees C  | ok
Peripheral Temp  | 38 degrees C  | ok
PCH Temp | 43 degrees C  | ok

If not, sensors

sensors
Package id 1:  +35.0°C  (high = +82.0°C, crit = +92.0°C)
Core 0:+35.0°C  (high = +82.0°C, crit = +92.0°C)
Core 1:+35.0°C  (high = +82.0°C, crit = +92.0°C)
Core 2:+33.0°C  (high = +82.0°C, crit = +92.0°C)
Core 3:+34.0°C  (high = +82.0°C, crit = +92.0°C)
...



--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] RAID5 rebuild, remount with write without reboot?

2017-09-05 Thread Joe Landman

iles were not entirely 1:1, 
so there are certainly going to be some files on this system which 
have no match on the other.


Regards,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] cluster deployment and config management

2017-09-05 Thread Joe Landman

 wonderfully  you can replicate 
images to local image servers if you wish, replicate config servers, 
load balance the whole thing to whatever scale you need.





Thanks.



--
Dr Stuart Midgley
sdm...@gmail.com <mailto:sdm...@gmail.com>


___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Poor bandwith from one compute node

2017-08-17 Thread Joe Landman




On 08/17/2017 02:02 PM, Scott Atchley wrote:

I would agree that the bandwidth points at 1 GigE in this case.

For IB/OPA cards running slower than expected, I would recommend 
ensuring that they are using the correct amount of PCIe lanes.


Turns out, there is a really nice open source tool that does this for 
you ...


https://github.com/joelandman/pcilist

:D



On Thu, Aug 17, 2017 at 12:35 PM, Joe Landman <joe.land...@gmail.com 
<mailto:joe.land...@gmail.com>> wrote:




On 08/17/2017 12:00 PM, Faraz Hussain wrote:

I noticed an mpi job was taking 5X longer to run whenever it
got the compute node lusytp104 . So I ran qperf and found the
bandwidth between it and any other nodes was ~100MB/sec. This
is much lower than ~1GB/sec between all the other nodes. Any
tips on how to debug further? I haven't tried rebooting since
it is currently running a single-node job.

[hussaif1@lusytp114 ~]$ qperf lusytp104 tcp_lat tcp_bw
tcp_lat:
latency  =  17.4 us
tcp_bw:
bw  =  118 MB/sec
[hussaif1@lusytp114 ~]$ qperf lusytp113 tcp_lat tcp_bw
tcp_lat:
latency  =  20.4 us
tcp_bw:
bw  =  1.07 GB/sec

This is separate issue from my previous post about a slow
compute node. I am still investigating that per the helpful
replies. Will post an update about that once I find the root
cause!


Sounds very much like it is running over gigabit ethernet vs
Infiniband.  Check to make sure it is using the right network ...


___
Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
<http://www.beowulf.org/mailman/listinfo/beowulf>




--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Poor bandwith from one compute node

2017-08-17 Thread Joe Landman




On 08/17/2017 12:00 PM, Faraz Hussain wrote:
I noticed an mpi job was taking 5X longer to run whenever it got the 
compute node lusytp104 . So I ran qperf and found the bandwidth 
between it and any other nodes was ~100MB/sec. This is much lower than 
~1GB/sec between all the other nodes. Any tips on how to debug 
further? I haven't tried rebooting since it is currently running a 
single-node job.


[hussaif1@lusytp114 ~]$ qperf lusytp104 tcp_lat tcp_bw
tcp_lat:
latency  =  17.4 us
tcp_bw:
bw  =  118 MB/sec
[hussaif1@lusytp114 ~]$ qperf lusytp113 tcp_lat tcp_bw
tcp_lat:
latency  =  20.4 us
tcp_bw:
bw  =  1.07 GB/sec

This is separate issue from my previous post about a slow compute 
node. I am still investigating that per the helpful replies. Will post 
an update about that once I find the root cause!


Sounds very much like it is running over gigabit ethernet vs 
Infiniband.  Check to make sure it is using the right network ...




___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] How to know if infiniband network works?

2017-08-03 Thread Joe Landman




On 08/03/2017 09:21 AM, Faraz Hussain wrote:

I ran the qperf command between two compute nodes ( b4 and b5 ) and got:

[hussaif1@lustwzb5 ~]$ qperf lustwzb4 -t 30 rc_lat rc_bi_bw
rc_lat:

fd
latency  =  7.73 us
rc_bi_bw:
bw  =  9.06 GB/sec

If I understand correctly, I would need to enable ipoib and then rerun 
test? It would then show ~40GB/sec I assume.


No.  9GB/s is about 80 Gb/s.  Infiniband is working.  Looks like you 
might have dual-rail IB setup, or you were doing a bidirectional/full 
duplex test.



--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] How to know if infiniband network works?

2017-08-02 Thread Joe Landman




On 08/02/2017 01:50 PM, Faraz Hussain wrote:
Thanks Joe. Here is the output from the commands you suggested. We 
have open mpi built from Intel mpi compiler. Is there some benchmark 
code I can compile so that we are all comparing the same code?


[hussaif1@lustwzb4 test]$ ibv_devinfo
hca_id: mlx4_0
transport:  InfiniBand (0)
fw_ver: 2.11.550
node_guid:  f452:1403:0016:3b70
sys_image_guid: f452:1403:0016:3b73
vendor_id:  0x02c9
vendor_part_id: 4099
hw_ver: 0x0
board_id:   DEL0A4028
phys_port_cnt:  2
port:   1
state:  PORT_ACTIVE (4)


Port 1 on the machine is up.  This is the link level activity that the 
subnet manager (OpenSM or a switch level version) enables.


For OpenMPI, my recollection is that they expect the IB ports to have 
ethernet addresses as well (and will switch to RDMA after initialization).


What does

ifconfig -a

report?



--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] How to know if infiniband network works?

2017-08-02 Thread Joe Landman


start with

ibv_devinfo

ibstat

ibstatus


and see what (if anything) they report.

Second, how did you compile/run your MPI code?


On 08/02/2017 12:44 PM, Faraz Hussain wrote:
I have inherited a 20-node cluster that supposedly has an infiniband 
network. I am testing some mpi applications and am seeing no 
performance improvement with multiple nodes. So I am wondering if the 
Infiband network even works?


The output of ifconfig -a shows an ib0 and ib1 network. I ran ethtools 
ib0 and it shows:


Speed: 4Mb/s
Link detected: no

and for ib1 it show:

Speed: 1Mb/s
Link detected: no

I am assuming this means it is down? Any idea how to debug further and 
restart it?


Thanks!

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

[Beowulf] Anyone know whom to speak with at Broadcom about NIC drivers?

2017-07-05 Thread Joe Landman


Hi folks

  I am trying to find contacts at Broadcom to speak to about NIC 
drivers.  All my networking contacts seem to have moved on.  Does anyone 
have a recommendation as to someone to speak with?


  Thanks!

Joe

---

Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] BeeGFS usage question

2017-06-28 Thread Joe Landman




On 06/28/2017 09:39 PM, Christopher Samuel wrote:

Hi all,

A few folks were chatting about HPC distributed filesystems over on the
Australian HPC sysadmin Slack and the question arose about whether
anyone is using BeeGFS for non-scratch (persistent) storage.

So, is anyone doing that?


I have a number of (former) customers using it as primary storage. You 
set up mirrored metadata, mirrored data, and you get a nice HA-like 
system (they have some issues with HA daemons, but I understand this is 
being worked on and will be delivered soon).


It is IMO excellent for this.  Back it with zfs on the units, distribute 
your metadata and mirror it.




Also, is anyone doing that with CephFS too?


Can be done, but I know very few people using it (again some former 
prospective customers).  CephFS is somewhat behind for this use case, 
though it should be reasonably stable/performant.




All the best,
Chris


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Register article on Epyc

2017-06-21 Thread Joe Landman

Yeah, they should make very sweet storage units (single socket sku).  
Dual socket is also nice, as you'll have 64x lanes of fabric between 
sockets, as well as 64 from each socket to peripherals.


I'd love to see the QPI contention issue just go away.  This looks like 
it pushes back the problem quite a bit.   Doesn't solve it, but changes 
some of the limits.



On 06/21/2017 11:27 AM, Scott Atchley wrote:
The single socket versions make sense for storage boxes that can use 
RDMA. You can have two EDR ports out the front using 16 lanes each. 
For the storage, you can have 32-64 lanes internally or out the back 
for NVMe. You even have enough lanes for two ports of HDR, when it is 
ready, and 48-64 lanes for the storage.


On Wed, Jun 21, 2017 at 8:39 AM, John Hearns > wrote:


https://www.theregister.co.uk/2017/06/20/amd_epyc_launch/


Interesting to see that these are being promoted as single socket
systems.
For a long time the 'sweet spot' for HPC has been the dual socket
Xeons.
I would speculate about single socket AMD systems, with a smaller
form facotr motherboard, maybe with onboard Infiniband.  Put a lot
of these cards in a chassis and boot them disklessly and you get a
good amoutn of compute power.

Also regarding compute power, it would be interesting to see a
comparison of a single socket of these versus Xeon Phi rather than
-v4 or -v5 Xeon.

The encrypted RAM modes are interesting, however I can't see any
use case for HPC.
Unless you are running a cloudy cluster where your customers are
VERY concerned about security.  Of course there are such customers!

___
Beowulf mailing list, Beowulf@beowulf.org
 sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf





___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] GPFS and failed metadata NSD

2017-05-21 Thread Joe Landman




On 05/19/2017 10:33 AM, John Hanks wrote:

There is a potentially lost PhD project there which was to be defended
this month for which the person may simply give up and another project


Egads ...

I went in the complete opposite (almost paranoid) direction.  I made 
copies of my thesis/run data everywhere.  20+ years ago, but still the 
lessons hold true.


Most important of them is "Software has bugs and will actively work to 
nuke your data and confound your recovery".


Second is "hardware fails".

Third is "RAID is not a backup".  To this day ... I know many people 
whom mistakenly think it is, and do not backup their important data. 
I've seen file system crashes take out parallel file systems atop them, 
so ... its even more true today than ever.


Fourth should be "cloud storage can go away/fail/go out of business."

There are others, but ... wow ... losing Ph.D. project data.

--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Not OT, but a quick link to an article on InsideHPC

2017-03-24 Thread Joe Landman




On 03/24/2017 12:02 PM, C Bergström wrote:

On Fri, Mar 24, 2017 at 11:48 PM, Joe Landman <joe.land...@gmail.com> wrote:

On 3/23/17 5:27 PM, C Bergström wrote:


[...]


No issue, and I am sorry to see this happen.  I enjoyed my time using the
PathScale compilers.

Its sad that an ecosystem chooses not to support innovators.


I must admit that my rough around the edges personality may play into
this. I don't mince words and when smart people occasionally
(frequently?) do stupid things - there's a right way to say it and a
blunt way.. and then there's times when it's just a matter of
opinions.. So many people lately think LLVM is this magical panacea
that will fix all their problems.. That camp of people is growing and
any time you try to explain how craptastic it is on a whole category
of problems you're met with disbelief at best..


Heh ... I started adding llvm (really clang and hey, no real fortran ... 
grrr) support into my nlytiq project on github, basically to play with 
it.  Compiling Rust with an external llvm?  You don't want to go there. 
On a fast box, with many cores, lots of SSD and ram ... many hours.


I've been curious about how well/poorly it does at code generation as a 
base for other tools.  Specifically I'm interested in julia and making 
things easy to tie in to julia.  And R, python, Perl (5 and 6), octave, etc.


I've not done serious testing with it yet, but, for the moment I've left 
it the non-default option.



Maybe I'm wrong and it will "just work out in the end"..


Some things seem to work well, and it has some level of interest from a 
broader community.  This gets to what I think the core of marketing 
efforts are/should be for small folks like us.


Target ubiquity.  Make it bloody easy to install, use, operate.  For a 
compiler, this should be little more than a path change, and maybe some 
flags.  For an appliance, it should just work.


Making things so simple that they just work, and do so correctly, 
consistently, reliably ... is a non-trivial task.  You had the joy of 
many ABIs, each with the variants of bugs.  I had the joy of dealing 
with (*&(*&^$^$^$** kernel changes, *^^&^^$%*(& driver bugs ... and 
don't even get me started on firmware, *&^&%$$#%%$ OFED versions, 
RPM-only/RedHat specific functions/code/installation mechanisms.


If I get to SC17 this year, definitely gonna need to grab a bunch of 
folks for a commiseration night ...




#rant



--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
c: +1 734 612 4615
w: https://scalability.org
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Not OT, but a quick link to an article on InsideHPC

2017-03-24 Thread Joe Landman


On 3/23/17 5:27 PM, C Bergström wrote:

Tiz the season for HPC software to die?
https://www.hpcwire.com/2017/03/23/hpc-compiler-company-pathscale-seeks-life-raft/

(sorry I don't mean to hijack your thread, but timing of both
announcements is quite overlapping)


No issue, and I am sorry to see this happen.  I enjoyed my time using 
the PathScale compilers.


Its sad that an ecosystem chooses not to support innovators.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

[Beowulf] Not OT, but a quick link to an article on InsideHPC

2017-03-23 Thread Joe Landman


For those who I've not talked with yet ...

http://insidehpc.com/2017/03/scalable-informatics-closes-shop/


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
c: +1 734 612 4615
w: https://scalability.org
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Suggestions to what DFS to use

2017-02-14 Thread Joe Landman




On 02/14/2017 08:02 PM, Bogdan Costescu wrote:

I can second the recommendation for BeeGFS. We have it in use for ~4
years with very good results, by now on 3 different FSes. We also run


I'll freely admit to being biased here, but BeeGFS is definitely 
something you should be evaluating/using.  Even for this case, with 
HDFS, there is a connector:


http://www.beegfs.com/wiki/HadoopConnector

We've had excellent results with BeeGFS on spinning rust:


https://scalability.org/2014/05/massive-unapologetic-firepower-2tb-write-in-73-seconds/

and the same system at the customer site


https://scalability.org/2014/10/massive-unapologetic-firepower-part-2-the-dashboard/

(look closely at the plot, and note the vertical axes ... I had messed 
up the scale on it, but that is in thousands of MB/s).


as well as with an NVMe unit



https://scalability.org/2016/03/not-even-breaking-a-sweat-10gbs-write-to-single-node-forte-unit-over-100gb-net-realhyperconverged-hpc-storage/

Excellent performance, ease of configuration is what you should expect 
from BeeGFS.


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
c: +1 734 612 4615
w: https://scalability.org
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] solaris?

2017-02-14 Thread Joe Landman




On 02/14/2017 04:28 PM, Michael Di Domenico wrote:

just out of morbid curiosity, does Solaris even have a stake in HPC
anymore?  I've not heard boo about it in quite awhile and there
doesn't appear to be even one system on the top500 running it.


Solaris is effectively dead (IMO).

SmartOS is its logical progression (https://smartos.org) with the caveat 
of a tight hardware compatibility list.  If what you have works within 
it, it is a nice way to run things*.  You can combine it with Fifo 
(https://project-fifo.net/) and build a nice container friendly data 
center.  Not typical HPC workloads, but quite nice for the modern 
container centric view of the world.


SmartOS has what are called "lx branded zones" which enable you to run 
(some) linux binary code 'natively'.  That is, the zone handles syscall 
emulation.


* I'm biased as I helped set up some infrastructure with this before, 
and it was relatively painless to use.  Think of it as a predecessor to 
CoreOS, RancherOS, and others.



--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
c: +1 734 612 4615
w: https://scalability.org
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Mobos for portable use

2017-01-20 Thread Joe Landman




On 01/20/2017 02:38 PM, Douglas Eadline wrote:



That is related to the end Moores Law.  The shrinking of the transistor
stopped increasing CPU speed in 2005 which brought about the release of
multi core CPUS the fastest CPU ever release was at 4.5GHz in 2004.  The
newer i5 and i7 are quite a bit slower per core than the single core from
the early 2000s by almost half.  Any single threaded algorithm today will
suffer as core counts increase and frequency decreases.  This is creating
a very strong market for technologies like the fpga that accelerate single
threaded logic operations.  Just look at a CPU history chart we are
slowing down the core substantially making multithread a requirement for
the future and yet we failing to train programmers with the skills for
multithread.


Learn/teach Julia


+1 for this.  Julia is an excellent language.

(shameless off-topic plug) https://github.com/joelandman/nyltiq-base is 
an initial step at building modern analytical toolchain for people to 
use great tools like Julia/IJulia (and many others).


As for FPGA ... sorta like the joke about Gallium Arsenide being the 
material of the future, and always will be (think about it) ... I've had 
the same sense of FPGAs.  Mostly because of a lack of real standards for 
them, costly, non-portable, non-open development tools, etc.  These 
problems may eventually be solved.  But this is what I said 14 years ago 
when we started building accelerators.


Joe's first rule of market dominance:  target ubiquity.  GPUs have done 
a tremendous job there. FPGA, not so much.


But the on-topic complaint is that programmers are not being trained on 
multithreading.  Doug's point was that tools like julia enable you to 
avoid worrying about that in many cases, as it does the right thing.


OpenMP and other tools enable you to build multi-threaded/parallel code 
fairly easily if you prefer to stay with C/C++, Fortran, others.


This said, its hard enough now for developers without much experience to 
properly reason about their code and understand how to 
multi-thread/parallelize.  I've seen people gleefully write on HN and 
other places how they chase lock-free data structure development, though 
this is rarely what they really need.


Always start with an understanding of where your code spends time, why 
it spends time there, and then see if you can move upwards from there to 
leverage multi-threading/parallelization.


Julia makes this quite easy BTW.  Its built into the (macro) language.




--
Doug



Scott



Sent via the Samsung Galaxy S7, an AT 4G LTE smartphone


 Original message 
From: Lukasz Salwinski <luk...@mbi.ucla.edu>
Date: 1/19/17 8:43 PM (GMT-06:00)
To: beowulf@beowulf.org
Subject: Re: [Beowulf] Mobos for portable use

On 01/19/2017 02:09 PM, Lux, Jim (337C) wrote:



-Original Message-
From: Beowulf [mailto:beowulf-boun...@beowulf.org] On Behalf Of Andrew
M.A. Cater
Sent: Thursday, January 19, 2017 12:49 PM
To: beowulf@beowulf.org
Subject: Re: [Beowulf] Mobos for portable use

[...]

(I just found that at least a while ago, Xilinx supported clusters for
some of  their design tools.. Since right now the design I'm working
with takes an hour to synthesize (on a single machine), I'm going to
look further - it has been a real rate limiter in the lab, because it
makes the test, new design, load, test cycle a lot longer.)


it looks like current (vivado 16.4) synthesis program hasn't been
parallelized - it's strictly single threaded and so uses just one
core... :o/  I've recently benchmarked a few i5 & i7 workstations
- there seem to be very little differences (maybe 10-20%) between
CPUs released over last ~4-5 years :o/

lukasz


--
-
  Lukasz Salwinski PHONE:310-825-1402
  UCLA-DOE Institute FAX:310-206-3914
  UCLA, Los AngelesEMAIL: luk...@mbi.ucla.edu
-
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

--
Mailscanner: Clean

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf






--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
c: +1 734 612 4615
w: https://scalability.org
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] non-stop computing

2016-10-26 Thread Joe Landman




On 10/26/2016 10:20 AM, Prentice Bisbal wrote:

How so? By only having a single seat or node-locked license?


Either ... for licensed code this is a non-starter.   Which is a shame 
that we still are talking about node locked/single seat in 2016.



--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: land...@scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] non-stop computing

2016-10-26 Thread Joe Landman


Licensing might impede this ...  Usually does.


On 10/26/2016 09:50 AM, Prentice Bisbal wrote:

There is a amazing beauty in this simplicity.

Prentice

On 10/25/2016 02:46 PM, Gavin W. Burris wrote:

Hi, Michael.

What if the same job ran on two separate nodes, with IO to local 
scratch?  What are the odds both nodes would fail in that three week 
period.  No special hardware / software required.  Simple. Done.


Cheers.

On Tue 10/25/16 02:24PM EDT, Michael Di Domenico wrote:
here's an interesting thought exercise and a real problem i have to 
tackle.


i have a researchers that want to run magma codes for three weeks or
so at a time.  the process is unfortunately sequential in nature and
magma doesn't support check pointing (as far as i know) and (I don't
know much about magma)

So the question is;

what kind of a system could one design/buy using any combination of
hardware/software that would guarantee that this program would run for
3 wks or so and not fail

and by "fail" i mean from some system type error, ie memory faulted,
cpu faulted, network io slipped (nfs timeout) as opposed to "there's a
bug in magma" which already bit us a few times

there's probably some commercial or "unreleased" commercial product on
the market that might fill this need, but i'm also looking for
something "creative" as well

three weeks isn't a big stretch compared to some of the others codes
i've heard around the DOE that run for months, but it's still pretty
painful to have a run go for three weeks and then fail 2.5 weeks in
and have to restart.  most modern day hardware would probably support
this without issue, but i'm looking for more of a guarantee then a
prayer

double bonus points for anything that runs at high clock speeds >3Ghz

any thoughts?
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin 
Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: land...@scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] non-stop computing

2016-10-25 Thread Joe Landman


On 10/25/2016 02:24 PM, Michael Di Domenico wrote:

here's an interesting thought exercise and a real problem i have to tackle.

i have a researchers that want to run magma codes for three weeks or
so at a time.  the process is unfortunately sequential in nature and
magma doesn't support check pointing (as far as i know) and (I don't
know much about magma)

So the question is;

what kind of a system could one design/buy using any combination of
hardware/software that would guarantee that this program would run for
3 wks or so and not fail

and by "fail" i mean from some system type error, ie memory faulted,
cpu faulted, network io slipped (nfs timeout) as opposed to "there's a
bug in magma" which already bit us a few times


You'd need to design an HA network and storage system to handle the 
possibility of external failure.  For internal failure, you'd want to 
run this in a kvm very close to the metal, and snapshot/checkpoint the 
VM every so often to local/remote VERY FAST storage.


This said, it would help to start with a system that can handle 
hard/heavy load for that period of time w/o failure.  We have units at 
various places around the world that sustain many GB/s continuously of 
IO for more than a year of operations, under fairly intense loads.


Choose your systems wisely, and don't let brand names decide the outcome.


there's probably some commercial or "unreleased" commercial product on
the market that might fill this need, but i'm also looking for
something "creative" as well


Start with good.  If you ping me about our burn in test case, I'll be 
happy to send it over.  Its running y-cruncher to do burn in on all 
CPUs/ram continuously.  Its pretty good at catching bad MB/CPU/RAM. 
Previously, I had a GAMESS run I used for this (also very good).




three weeks isn't a big stretch compared to some of the others codes
i've heard around the DOE that run for months, but it's still pretty
painful to have a run go for three weeks and then fail 2.5 weeks in
and have to restart.  most modern day hardware would probably support
this without issue, but i'm looking for more of a guarantee then a
prayer

double bonus points for anything that runs at high clock speeds >3Ghz


See above.  This is fairly *easy* for various definitions of easy.



--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: land...@scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Underwater data centers -- the future?

2016-09-09 Thread Joe Landman


On 09/09/2016 07:20 AM, Tim Cutts wrote:

2. Surely, we heat up
the oceans regardless of whether it's directly by cooling with the
sea or indirectly by cooling in air, and atmospheric warming slowly
warming the oceans.  Ultimately it will all come to equilibrium (with
possible disastrous consequences) whichever way we do it.  There was


Actually, it is already at equilibrium (or we'd be in serious trouble). 
Equilibrium doesn't mean uniform, but that gains - losses == constant 
(at least to first order).


Heat gains:  insolation, radiation from decaying elements
Heat losses: thermal radiation (Stefan-Boltzmann) goes as the 4th power 
of temperature (assuming the earth is a "black body" in the physicists 
language).


AGW would effectively change the reflective properties of the atmosphere 
(perversely providing a higher albedo to reflect insolation which would 
be a "loss" or cooling mechanism), while potentially altering the 
effective heat capacity (Cv) of the atmosphere.  The latter is tiny 
compared to the heat capacity of the ocean ( Cv("air") << Cv(ocean) ).


This said, if FB dumped all of their data centers into the ocean, and we 
all started posting cat videos, and then doing analytics on them ... 
doom would be impending ...



quite a nice point made in the late David MacKay's book pointing out
that  it doesn't really matter even if we found an abundant source of
almost free energy; the atmosphere still has a finite surface area
from which to radiate the waste heat to space, which imposes a
significant upper limit on how much power the human race can safely
use.


True ... but these limits are way ... way out there, and the loss 
mechanisms are quite efficient.  We are in a pretty good eq right now, 
even if we shift it a little each way ... I have my doubts as to whether 
or not we could seriously dump enough waste heat energy into the 
ocean/atmosphere to significantly impact the equilibrium.


Happy to be shown to be wrong, but I think we are (many orders of 
magnitude) away from that point where it really starts to become a concern.




--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: land...@scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Parallel programming for Xeon Phis

2016-08-24 Thread Joe Landman




On 08/24/2016 09:51 AM, Prentice Bisbal wrote:


his is an old article, but it's relevant to the recent discussion on 
programming for Xeon Phis, 'code modernization', and the speedups 
'code modernization' can provide.


https://www.hpcwire.com/2015/08/24/cosmos-team-achieves-100x-speedup-on-cosmology-code/ 





Nice to see that the algorithmic shift is given the level of exposure it 
was.


Basically, when you need to extract more performance out of a code base, 
you need to see where in your code it is spending the time. Then ask 
yourself why you are doing things that way.


Sometimes the answer is pretty interesting, other times, it has to do 
with development inertia ("we've always done it that way").


Not quite technical debt, but more of an algorithmic debt accumulation.

Note that system architectures tend to change over decade long time 
spans, so large vector machines gave way to large SMPs which gave way to 
clusters of small SMPs which gave way to ...


There is no single way to optimize for all of these, and algorithms that 
might work well on large vector machines won't work so well on small SMP 
clusters ... etc.





--
Prentice

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: land...@scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Recent HPE support missive

2016-08-23 Thread Joe Landman

On 08/23/2016 10:01 AM, Peter St. John wrote:

HPE is in the process of being bought by CSC.

???

On the scale of 12 months
you will be contracting with CSC.

I thought they were spinning out their services organization to CSC ...
not the whole kit and kaboodle ...

http://www.csc.com/investor_relations/press_releases/137152-csc_announces_merger_with_enterprise_services_segment_of_hewlett_packard_enterprise_to_create_global_it_services_leader

This is IMO different than what Tim is concerned with. Support
contracts for HPE software (what ... Ibrix or similar?) Services are
typically outsourced managed services.

Peter

On Tue, Aug 23, 2016 at 6:47 AM, Tim Cutts > wrote:

Really not very impressed with HPE's missive yesterday changing the
software support contracts so that now it's going to be impossible
to reinstate a software support contract if you let it lapse for
more than 12 months.

The existing system was expensive but reasonable (you have to pay to
reinstate, including all the time in between when the contract
expired and the time of reinstatement).

But to now just bluntly state that if it's lapsed for more than 12
months you can never reinstate it seems somewhat draconian.

Tim

--
Head of Scientific Computing
Wellcome Trust Sanger Institute

___
Beowulf mailing list, Beowulf@beowulf.org
sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: land...@scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] bring back 2012?

2016-08-17 Thread Joe Landman


On 08/17/2016 11:50 AM, Kilian Cavalotti wrote:

On Wed, Aug 17, 2016 at 7:10 AM, Prentice Bisbal  wrote:

When Intel first started marketing the Xeon Phi, they emphasized that you
wouldn't need to rewrite your code to use the Xeon Phi. This was a marketing
moving to differentiate the Xeon Phi from the NVIDIA CUDA processors. That
may have been a true statement, but it didn't mention anything about
performance of that existing code, and was, frankly, very misleading. The
truth is, if you don't rewrite your code, you're not going to see much
(relatively speaking) of a performance improvement, and when you do rewrite
your code to optimize it for the Xeon Phi, you'll also see amazing speed ups
on regular Xeon processors.


Well, this is generally true of every "new" architecture.  You don't 
*need* to rewrite your code.  It will run.  Just not as well as if you 
took the time to optimize it carefully.



I've seen several presentations where speed ups of 5x, 10x, etc., on regular
Xeons just through optimizing the code to be more thread- and vector-
friendly. Some improvements were so significant, they make you ask if the
Xeon Phi was even needed. [...]


Been setting up a Phi environment for a customer on one of our machines. 
 Its non-trivial to get it to a workable/realistic state.  This and the 
effort required to get good performance out of this system (12GB ram is 
not large for many codes) ... means that this is more of an experimental 
platform for them.



If you pay attention to Intel's marketing and the industry news the past
couple of years, you will have noticed that Intel has been promoting "code
modernization" efforts, saying all codes need to be modernized to take
advantage newer processors, while that is certainly true, "code
modernization" is just a euphemism for "rewrite your code". This is Intel
backpedaling on their earlier statements that you don't need to rewrite your
code to take advantage of a Xeon Phi, without actually admitting it.


Can't agree more, very well described.


Generally, algorithm shifts, better coding, better code/memory layout 
will buy you quite a bit.  You will lose quite a bit if you go crazy 
with pointer chasing code, deep object factories, and other 
(anti)patterns for performance.


About a decade ago, I hacked on HMMer and rewrote one of the core 
expensive routines in a simpler manner for a 2x immediate improvement in 
30 lines of C or so.  It did require a rethinking of a few aspects of 
the code, but generally, keeping performance in mind when coding 
requires thinking about how the calculation will flow, and what will 
impede it.


For massively parallel compute engines like Phi/KNL, memory bandwidth 
(keeping the cores fed) will be a problem.  So you code assuming that is 
the resource that you need to manage more carefully.


And at a higher level, industry/technology changes over time which 
renders code that runs quickly on one platform, running slowly on some 
future platform as the platform design tradeoffs are different.


I hate to say "back in the day", but the first major computing project I 
worked on in the (mumble)1980s(mumble), ram was the expensive resource 
and disk and CPU were relatively "free".  So while working on our large 
simulation code (Fortran of course), we spilled our matrix to disk, as 
we didn't have enough ram to keep all of it in memory.  It made our 
matrix multiple ... interesting ... but it worked.  These days, spilling 
to disk (or swap) is a really bad idea ... just build an MPI version 
that can keep the whole thing in RAM with enough nodes, or 
partition/shard your data so that you can do parallel IO to/from a big 
fast parallel storage platform.


Its all the same issues (optimizing to use as little of the expensive to 
use resource as you can), but the expensive resource keeps changing. 
The trick Intel and the other folks are trying to make work, is to make 
initial use of the platform as easy as possible.  So your code "just 
works".  But if you really want to take advantage of the resource, you 
need to understand the expensive and "free" aspects of the platform, in 
order to optimize to exploit the free and manage the expensive.




--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: land...@scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] HP Enterprise to buy SGI

2016-08-12 Thread Joe Landman


On 08/12/2016 10:46 AM, Douglas Eadline wrote:


I remember when the old HP bought Convex. More like 1 + 1 = .2
in that case. And, then in recent years many of the old
Convex crew emerged as Convey which was then bought by
Micron last year.




Maybe I am biased, but I see actual (strong) value in big data 
analytical appliance companies.


HP hasn't had a great run on acquisitions, and if you look at the 
financials, this won't likely make a noticeable impact on revenues or 
profits.


But from a strategic point of view, it make sense to prevent 
Cisco/Lenovo/others from grabbing SGI for themselves.


Sometimes the reasons for an acquisition have less to do with the 
details of the company, and more to do with the potential 
strategy/competitive landscape.



--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: land...@scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

[Beowulf] Any pointers to the what the fields in the sysfs for infiniband are?

2016-06-12 Thread Joe Landman

I am working on extracting meaningful data on various components for our 
monitoring tools, and realized that I don't have a good writeup anywhere 
(other than the source) of what the fields are.  Anyone have or know of 
such a writeup?


For example:

root@n01:/sys/devices/pci:00/:00:03.0/:02:00.0/infiniband/mlx5_0/ports/1/counters_ext# 
pwd/sys/devices/pci:00/:00:03.0/:02:00.0/infiniband/mlx5_0/ports/1/counters_ext
root@n01:/sys/devices/pci:00/:00:03.0/:02:00.0/infiniband/mlx5_0/ports/1/counters_ext# 
ls -alF

total 0
drwxr-xr-x 2 root root0 Jun 12 21:19 ./
drwxr-xr-x 7 root root0 Jun 12 21:19 ../
-r--r--r-- 1 root root 4096 Jun 12 21:19 port_multicast_rcv_packets
-r--r--r-- 1 root root 4096 Jun 12 21:19 port_multicast_xmit_packets
-r--r--r-- 1 root root 4096 Jun 12 21:19 port_rcv_data_64
-r--r--r-- 1 root root 4096 Jun 12 21:19 port_rcv_packets_64
-r--r--r-- 1 root root 4096 Jun 12 21:19 port_unicast_rcv_packets
-r--r--r-- 1 root root 4096 Jun 12 21:19 port_unicast_xmit_packets
-r--r--r-- 1 root root 4096 Jun 12 21:19 port_xmit_data_64
-r--r--r-- 1 root root 4096 Jun 12 21:19 port_xmit_packets_64

I can infer a number of things from the names, but I'd like to be sure 
of what I am looking at.  I'll look at the source if I can't find a 
summary ... hence my question.


Thanks in advance!

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: land...@scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Thoughts on IB EDR and Intel OmniPath

2016-04-29 Thread Joe Landman

On 04/29/2016 06:18 PM, Hutcheson, Mike wrote:

Hi. We are looking at EDR and OmniPath for a new cluster we are looking to
purchase this summer. I am interested in what the Beowulf community has to say
regarding the pros and cons for each of the technologies. I am certainly not
trying to start a flame war here, just looking for unbiased observations based
on knowledge and experience.

I can talk more from the vendor side of this (and we integrate both
vendor's gear into our systems, so take this with whatever amount of
NaCl you think it needs). Both are nice for a fairly widely overlapping
range of needs.

We've been using the EDR and 100GbE side for a while now for our storage
and hyperconverged systems. I like that we can drive full bandwidth
over the IB fabric without crushing the CPU. With apologies to
everyone, a link to a post I did about a month ago:

https://scalability.org/2016/03/not-even-breaking-a-sweat-10gbs-write-to-single-node-forte-unit-over-100gb-net-realhyperconverged-hpc-storage/

We plan to try a similar test with OPA at some point, and preliminary
tests others have done have suggested that similar performance is
achievable.

Basically I think the choice comes down to the specifics of the codes
you'll be running, and how you will be implementing the storage side.
Both systems are nice, but you may find the polled mechanism of OPA
might match your code base differently than the offload mechanism of EDR.

I would recommend staying away from 100GbE if you have the option to do
EDR/OPA. While we really enjoy 100GbE (and have shown some pretty
terrific performance with it), I don't think the technology on
congestion mediation is quite up to snuff handling these data rates, as
compared to EDR/OPA. Even with RoCE2, some of the testing we did
demonstrated very significant congestion related slowdowns that we
couldn't easily tune for (with PFC and other bits that RoCE needs).

I've used iWARP in the dim and distant past, and it was much better than
plain old gigabit on the same systems (with Ammasso cards). But I'd
recommend going with EDR/OPA if you have the choice. You can always
have your storage or other nodes handle ethernet gatewaying if needed.

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: land...@scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Broadwell HPL performance

2016-04-21 Thread Joe Landman

On 04/21/2016 08:56 AM, Douglas Eadline wrote:

On 20/04/16 16:52, John Hearns wrote:

[...]

Basically I don't trust these numbers. I assume the rest of the
data are equally wrong. Being an old school
kind of dude a link to the raw output is always nice
and some run-time data like HT (on off) and number of threads
etc. is helpful.

Not specific for HPL, but in general, we find that people don't release
all the information around their tests, so they aren't quite replicable.

Then there are the tests which are outliers for some reason or another,
which are not repeatable even on the same rig, yet get used as the
"actual" result.

Our operational theory is, if you report something, it ought to be the
same as what the user would measure if they did the test.

Others from Boston in the UK:

https://www.boston.co.uk/blog/2016/04/06/intel-xeon-e5-2600-v4-codename-broadwell-launch-and-preliminary-bench.aspx

They report 859 GFLOPS for the E5-2650, again above peak, but they
seem to state that HT is on. How many threads do they use for the test?

I assumed that enabling HT hurt HPL numbers (at least in my MPI tests it
does). Is it possible that for these tests, HT helps performance (a bit),
but in the case of the Dell blog, including the HT "cores" means doubling
the peak number which would make the result look bad?

Sigh.

HT is a mixed bag for computational workloads.

The bigger issue is as Doug notes, if you don't quite understand
platform details, and what/how you are measuring, your results will be
difficult (at best) to correctly interpret. Its easy to mess up
performance measurements, and peak theoretical numbers. It helps to
show all the inputs into your assumption, so that others can check as
well. Showing raw data is even better, though a summarized data set
(with a detailed description of how you summarized) is also helpful.

For marketing docs, this is rarely done. The engineering white papers
that feed into it? It should be done.

Real benchmarking done right is actually quite hard. Discerning useful
information from these efforts is a challenge.

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: land...@scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] [OT] acoustic engineers around?

2016-03-14 Thread Joe Landman


On 03/14/2016 02:36 PM, C Bergström wrote:

This is the smartest list I subscribe to - I'm not sure if anyone has
the knowledge/time, but I'm trying to solve a general problem.

Moving condos and new location is *very* noise sensitive. I feel
fairly confident I can spam absorbent material as needed to reduce the
db from high frequency sound going out the door, but bass is going to
be more of a challenge.

My crazy ideas

#1 Raise the subs 1-1.5 meter off the ground and decouple them from the floor
#2 Put some highly dense material directly under the subs.
#3 low pass filter so only dealing with 80Hz and not even more
problematic lower frequencies


Not an acoustic engineer, so ... use lots of salt with this reply.

I've used (in the dim/distant past) things like this to support my 
speakers: http://www.amazon.com/10-Solid-Rubber-Stopper/dp/B00BTMWV8W


Good decoupling from the ground, and anti-slip too.

This said, I've not had good speakers in like 25+ years



I'm a little fuzzy on the details of if subwoofers typically produce a
directional sound and or if it depends on the box.

"bass traps" appear to be more marketing hype than actually
functionally useful for my case.

(lowering the volume to the subs and adding "buttkickers" is an
option, but that's more for fun and less about sound quality/general
problem)
--
My walls are at least 150mm thick concrete fwiw.


Look at "sound proofing" wall tile or similar.  Not aesthetically 
pleasing in most cases, but solves some set of problems.


150mm thick concrete?  Solid or cinderblock?





Thanks
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf




--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: land...@scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] [OT] MPI-haters

2016-03-11 Thread Joe Landman


On 03/11/2016 11:23 AM, Prentice Bisbal wrote:

On 03/11/2016 05:02 AM, John Hanks wrote:

I remember rgb, although not any Vincent who must have appeared in the
intervening years I didn't follow the list. My first cluster was 8
discarded IBM PS/2 machines with MCA buses and tiny hard drives and
yes, 3.5" floppies circa 1996. I originally joined the beowulf list
around 1996/1997 and although I'd have sworn I posted more, I don't
seem to have much of anything in the archive. This confirms my
lifelong status of lurker. Which is probably good, no one wants to
hear those stories about hard-coding your IRQ in the ne2000 driver
source, pushing packets between nodes by hand, in the snow, uphill
both ways over coax cables Now it's all "container this, as a
service that, blah, blah, blah" Kids today.



Stay off my lawn with your  damn containers, you damn whippersnappers!


root@borg:~# docker run ubuntu /bin/echo 'Resistance is futile ... you 
will be assimilated into the container collective'


Resistance is futile ... you will be assimilated into the container 
collective



--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: land...@scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Most common cluster management software, job schedulers, etc?

2016-03-09 Thread Joe Landman


On 03/09/2016 02:20 PM, Prentice Bisbal wrote:


On 03/08/2016 05:40 PM, Christopher Samuel wrote:

On 09/03/16 08:40, Paul McIntosh wrote:


FYI - Good info from SC13 Sys admin BOF - http://isaac.lsu.edu/sc13/
- be nice to have this updated on a yearly basis

You could always make the suggestion to the sysadmin bof cabal. Er, who
don't exist, no.. really they don't..  but if they did you might find
them here:

http://emac3.hpc.lsu.edu/mailman/listinfo/sc-admin



Way to blow our, er, their cover, Samuels! Why don't you just teach him
the secret handshake? If there is one, that is.



The first rule of HPC devops club is ... there is no HPC devops club ...


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: land...@scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

1 2 3 4 5 6 7 8 9 >

1 - 100 of 868 matches

Mail list logo