Re: [Beowulf] AMD and AVX512

2021-06-21 Thread Douglas Eadline


> On Wed, 16 Jun 2021 13:15:40 -0400, you wrote:
>
>>The answer given, and I'm
>>not making this up, is that AMD listens to their users and gives the
>>users what they want, and right now they're not hearing any demand for
>>AVX512.
>>
>>Personally, I call BS on that one. I can't imagine anyone in the HPC
>>community saying "we'd like processors that offer only 1/2 the floating
>>point performance of Intel processors".
>
> I suspect that is marketing speak, which roughly translates to not
> that no one has asked for it, but rather requests haven't reached a
> threshold where the requests are viewed as significant enough.
>

Exactly, or "Right now cloud based servers are the biggest market.
These customers need as many cores/threads as possible on die with
"adequate" memory bandwidth. Oh, and they buy them by the boatload.
What did you say you do again?"


-- 
Doug

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] AMD and AVX512

2021-06-21 Thread Brian Dobbins
Hi all,

This is, in my humble opinion, also the big problem CPUs are facing. They
> are
> build to tackle all possible scenarios, from simple integer to floating
> point,
> from in-memory to disc I/O. In some respect it would have been better to
> stick
> with a separate math unit which then could be selected according to your
> workload you want to run on that server. I guess this is where the GPUs
> are
> trying to fit in here, or maybe ARM.
>

  I recall a few years ago the rumors that the Argonne "A18" system was
going to use the 'Configurable Spatial Accelerators' that Intel was
developing, with the idea being you *could* reconfigure based on the needs
of the code.  In principle, it sounds like the Holy Grail, but in practice
it seems quite difficult, and I don't believe I've heard much more about
the CSA approach since.

WikiChip on the CSA:
https://en.wikichip.org/wiki/intel/configurable_spatial_accelerator
NextPlatform article:
https://www.nextplatform.com/2018/08/30/intels-exascale-dataflow-engine-drops-x86-and-von-neuman/

  I have to imagine that research hasn't gone fully quiet, especially with
Intel's moves towards oneAPI and their FPGA experiences, but I haven't seen
anything about it in a while.  Of course


> I also agree with the compiler "problem". If you are starting to push some
> compilers too much, the code is running very fast but the results are
> simply
> wrong. Again, in an ideal world we have a compiler for the job for the
> given
> hardware which also depends on the job you want to run.
>

 ... It exacerbates the compiler issues, *I think*.  I hesitate to say it
does so definitively, since the patent write-up talks about how the CSA
architecture uses a representation very similar to what the (now old) Intel
compilers created as an IR (intermediate representation).  In my opinion,
having a compiler that can 'do everything' is like having an AI that can do
everything - we're good at very, *very* specific use-cases, but not
generality.  So configurable systems are a big challenge.  (I'm *way* out
of my depth on compilers, though - maybe they're improving massively?)


> Maybe the whole climate problem will finally push HPC into the more
> bespoken
> system where the components are fit for the job in question, say weather
> modeling for example, simply as that would be more energy efficient and
> faster.
>

  I can't speak to whether climate research will influence hardware, but
back to the *original* theme of this thread, I actually had some data -very
*limited* data, mind you!- on how NCAR's climate model, CESM, run in an
'F2000climo' case (one of many, many cases, and very atmospheric focused)
at 2-degree atmosphere resolution (*very* coarse) on a 36-core Xeon Skylake
performs across AVX2, AVX512 and AVX512+FMA.  By default, FMA is turned off
in these cases due to numerical sensitivity.  So, that's a *very* specific
case, but on the off chance people are curious, here's what it looks like -
note that this is *noisy* data, because the model also does a lot of I/O,
hence why I tend to look at median times, in blue below:

SKX (AWS C5N.18xlarge) Performance Comparison
CESM Case: F2000climo @ f19_g17 resolution
(36 cores each component / 10 model day run, skipping 1st and last)
Flags AVX2 (no FMA) AVX512 (no FMA) AVX512 + FMA
Min 60.18 60.24 59.16
Max 66.26 60.47 59.40
Median 60.28 60.38 59.32

  The take-away?  We're not really benefiting *at all* (at this resolution,
for this compset, etc) from AVX512 here.  Maybe at higher resolution?
Maybe with more vertical levels, or chemistry, or something like that?
*Maybe*, but differences seem indistinguishable from noise here, and
possibly negative!  Now, give us more *memory bandwidth*, and that's
fantastic.  Could this code be rewritten to take better advantage of larger
vectors?  Sure, and some *really* capable people do work on that sort of
stuff, and it helps, but as an *evolution* in performance, not a revolution
in it.

  (Also, I'm always horrified by presenting one-off tests as examples of
anything, but it's the only data I have on-hand!  Other cases may indeed
vary.)

Before somebody comes along with: but but but it costs! Think about how
> much
> money is being spent simply to kill people, or at other wasteful project
> like
> Brexit etc.
>

One can only hope.  When it comes to spending on research, I recall the
quote:
   "If you think education is expensive, try ignorance!"

  Cheers,
  - Brian


Am Montag, 21. Juni 2021, 14:46:30 BST schrieb Joe Landman:
> > On 6/21/21 9:20 AM, Jonathan Engwall wrote:
> > > I have followed this thinking "square peg, round hole."
> > > You have got it again, Joe. Compilers are your problem.
> >
> > Erp ... did I mess up again?
> >
> > System architecture has been a problem ... making a processing unit
> > 10-100x as fast as its support components means you have to code with
> > that in mind.  A simple `gfortran -O3 mycode.f` won't necessarily
> > generate optimal code for the system ( 

Re: [Beowulf] AMD and AVX512

2021-06-21 Thread Jörg Saßmannshausen
Dear all

> System architecture has been a problem ... making a processing unit
> 10-100x as fast as its support components means you have to code with
> that in mind.  A simple `gfortran -O3 mycode.f` won't necessarily
> generate optimal code for the system ( but I swear ... -O3 ... it says
> it on the package!)

From a computational Chemist perspective I agree. In an ideal world, you want 
to get the right hardware for the program you want to use. Some of the code is 
running entirely in memory, others is using disc space for offloading files. 

This is, in my humble opinion, also the big problem CPUs are facing. They are 
build to tackle all possible scenarios, from simple integer to floating point, 
from in-memory to disc I/O. In some respect it would have been better to stick 
with a separate math unit which then could be selected according to your 
workload you want to run on that server. I guess this is where the GPUs are 
trying to fit in here, or maybe ARM. 

I also agree with the compiler "problem". If you are starting to push some 
compilers too much, the code is running very fast but the results are simply 
wrong. Again, in an ideal world we have a compiler for the job for the given 
hardware which also depends on the job you want to run. 

The problem here is not: is that possible, the problem is more: how much does 
it cost? From what I understand, some big server farms are actually not using 
commodity HPC stuff but they are designing what they need themselves. 

Maybe the whole climate problem will finally push HPC into the more bespoken 
system where the components are fit for the job in question, say weather 
modeling for example, simply as that would be more energy efficient and 
faster. 
Before somebody comes along with: but but but it costs! Think about how much 
money is being spent simply to kill people, or at other wasteful project like 
Brexit etc. 

My 2 shillings for what it is worth! :D

Jörg

Am Montag, 21. Juni 2021, 14:46:30 BST schrieb Joe Landman:
> On 6/21/21 9:20 AM, Jonathan Engwall wrote:
> > I have followed this thinking "square peg, round hole."
> > You have got it again, Joe. Compilers are your problem.
> 
> Erp ... did I mess up again?
> 
> System architecture has been a problem ... making a processing unit
> 10-100x as fast as its support components means you have to code with
> that in mind.  A simple `gfortran -O3 mycode.f` won't necessarily
> generate optimal code for the system ( but I swear ... -O3 ... it says
> it on the package!)
> 
> Way back at Scalable, our secret sauce was largely increasing IO
> bandwidth and lowering IO latency while coupling computing more tightly
> to this massive IO/network pipe set, combined with intelligence in the
> kernel on how to better use the resources.  It was simply a better
> architecture.  We used the same CPUs.  We simply exploited the design
> better.
> 
> End result was codes that ran on our systems with off-cpu work (storage,
> networking, etc.) could push our systems far harder than competitors. 
> And you didn't have to use a different ISA to get these benefits.  No
> recompilation needed, though we did show the folks who were interested,
> how to get even better performance.
> 
> Architecture matters, as does implementation of that architecture. 
> There are costs to every decision within an architecture.  For AVX512,
> along comes lots of other baggage associated with downclocking, etc. 
> You have to do a cost-benefit analysis on whether or not it is worth
> paying for that baggage, with the benefits you get from doing so.  Some
> folks have made that decision towards AVX512, and have been enjoying the
> benefits of doing so (e.g. willing to pay the costs).  For the general
> audience, these costs represent a (significant) hurdle one must overcome.
> 
> Here's where awesome compiler support would help.  FWIW, gcc isn't that
> great a compiler.  Its not performance minded for HPC. Its a reasonable
> general purpose standards compliant (for some subset of standards)
> compilation system.  LLVM is IMO a better compiler system, and its
> clang/flang are developing nicely, albeit still not really HPC focused. 
> Then you have variants built on that.  Like the Cray compiler, Nvidia
> compiler and AMD compiler. These are HPC focused, and actually do quite
> well with some codes (though the AMD version lags the Cray and Nvidia
> compilers). You've got the Intel compiler, which would be a good general
> compiler if it wasn't more of a marketing vehicle for Intel processors
> and their features (hey you got an AMD chip?  you will take the slowest
> code path even if you support the features needed for the high
> performance code path).
> 
> Maybe, someday, we'll get a great HPC compiler for C/Fortran.



___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 

Re: [Beowulf] AMD and AVX512

2021-06-21 Thread Jonathan Engwall
AVX-512 is SIMD and in that respect compiled Intel routines will run almost
automatically on Intel processors.
It's not like I was answering the question. I realize or under realize the
implementation problems. You need to do a side by side comparison of the
die.

On Mon, Jun 21, 2021, 7:47 AM Andrew M.A. Cater  wrote:

> On Mon, Jun 21, 2021 at 09:46:30AM -0400, Joe Landman wrote:
> > On 6/21/21 9:20 AM, Jonathan Engwall wrote:
> > > I have followed this thinking "square peg, round hole."
> > > You have got it again, Joe. Compilers are your problem.
> >
> >
> > Erp ... did I mess up again?
> >
> > Here's where awesome compiler support would help.  FWIW, gcc isn't that
> > great a compiler.  Its not performance minded for HPC. Its a reasonable
> > general purpose standards compliant (for some subset of standards)
> > compilation system.  LLVM is IMO a better compiler system, and its
> > clang/flang are developing nicely, albeit still not really HPC focused.
> > Then you have variants built on that.  Like the Cray compiler, Nvidia
> > compiler and AMD compiler. These are HPC focused, and actually do quite
> well
> > with some codes (though the AMD version lags the Cray and Nvidia
> compilers).
> > You've got the Intel compiler, which would be a good general compiler if
> it
> > wasn't more of a marketing vehicle for Intel processors and their
> features
> > (hey you got an AMD chip?  you will take the slowest code path even if
> you
> > support the features needed for the high performance code path).
> >
> > Maybe, someday, we'll get a great HPC compiler for C/Fortran.
> >
> The problem is that, maybe, the HPC market is still not _quite_ big enough
> to merit a dedicated set of compilers and is diverse enough in its problem
> sets that we still need a dozen or more specialist use cases to work well.
>
> You would think there would be a cross-over point where massively parallel
> scalable cloud infrastructure wold intersect with HPC but that doesn't
> seem to be happening. Parallelisation is the great bugbear anyway.
>
> Most of the experts I know on all of this are the regulars on this list:
> paging Greg Lindahl ...
>
> All the best,
>
> Andy Cater
>
> >
> > --
> > Joe Landman
> > e: joe.land...@gmail.com
> > t: @hpcjoe
> > w: https://scalability.org
> > g: https://github.com/joelandman
> > l: https://www.linkedin.com/in/joelandman
> >
>
> > ___
> > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] AMD and AVX512

2021-06-21 Thread Andrew M.A. Cater
On Mon, Jun 21, 2021 at 09:46:30AM -0400, Joe Landman wrote:
> On 6/21/21 9:20 AM, Jonathan Engwall wrote:
> > I have followed this thinking "square peg, round hole."
> > You have got it again, Joe. Compilers are your problem.
> 
> 
> Erp ... did I mess up again?
> 
> Here's where awesome compiler support would help.  FWIW, gcc isn't that
> great a compiler.  Its not performance minded for HPC. Its a reasonable
> general purpose standards compliant (for some subset of standards)
> compilation system.  LLVM is IMO a better compiler system, and its
> clang/flang are developing nicely, albeit still not really HPC focused. 
> Then you have variants built on that.  Like the Cray compiler, Nvidia
> compiler and AMD compiler. These are HPC focused, and actually do quite well
> with some codes (though the AMD version lags the Cray and Nvidia compilers).
> You've got the Intel compiler, which would be a good general compiler if it
> wasn't more of a marketing vehicle for Intel processors and their features
> (hey you got an AMD chip?  you will take the slowest code path even if you
> support the features needed for the high performance code path).
> 
> Maybe, someday, we'll get a great HPC compiler for C/Fortran.
> 
The problem is that, maybe, the HPC market is still not _quite_ big enough
to merit a dedicated set of compilers and is diverse enough in its problem 
sets that we still need a dozen or more specialist use cases to work well.

You would think there would be a cross-over point where massively parallel
scalable cloud infrastructure wold intersect with HPC but that doesn't
seem to be happening. Parallelisation is the great bugbear anyway.

Most of the experts I know on all of this are the regulars on this list:
paging Greg Lindahl ... 

All the best,

Andy Cater

> 
> -- 
> Joe Landman
> e: joe.land...@gmail.com
> t: @hpcjoe
> w: https://scalability.org
> g: https://github.com/joelandman
> l: https://www.linkedin.com/in/joelandman
> 

> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] AMD and AVX512

2021-06-21 Thread Joe Landman

On 6/21/21 9:20 AM, Jonathan Engwall wrote:

I have followed this thinking "square peg, round hole."
You have got it again, Joe. Compilers are your problem.



Erp ... did I mess up again?

System architecture has been a problem ... making a processing unit 
10-100x as fast as its support components means you have to code with 
that in mind.  A simple `gfortran -O3 mycode.f` won't necessarily 
generate optimal code for the system ( but I swear ... -O3 ... it says 
it on the package!)


Way back at Scalable, our secret sauce was largely increasing IO 
bandwidth and lowering IO latency while coupling computing more tightly 
to this massive IO/network pipe set, combined with intelligence in the 
kernel on how to better use the resources.  It was simply a better 
architecture.  We used the same CPUs.  We simply exploited the design 
better.


End result was codes that ran on our systems with off-cpu work (storage, 
networking, etc.) could push our systems far harder than competitors.  
And you didn't have to use a different ISA to get these benefits.  No 
recompilation needed, though we did show the folks who were interested, 
how to get even better performance.


Architecture matters, as does implementation of that architecture.  
There are costs to every decision within an architecture.  For AVX512, 
along comes lots of other baggage associated with downclocking, etc.  
You have to do a cost-benefit analysis on whether or not it is worth 
paying for that baggage, with the benefits you get from doing so.  Some 
folks have made that decision towards AVX512, and have been enjoying the 
benefits of doing so (e.g. willing to pay the costs).  For the general 
audience, these costs represent a (significant) hurdle one must overcome.


Here's where awesome compiler support would help.  FWIW, gcc isn't that 
great a compiler.  Its not performance minded for HPC. Its a reasonable 
general purpose standards compliant (for some subset of standards) 
compilation system.  LLVM is IMO a better compiler system, and its 
clang/flang are developing nicely, albeit still not really HPC focused.  
Then you have variants built on that.  Like the Cray compiler, Nvidia 
compiler and AMD compiler. These are HPC focused, and actually do quite 
well with some codes (though the AMD version lags the Cray and Nvidia 
compilers). You've got the Intel compiler, which would be a good general 
compiler if it wasn't more of a marketing vehicle for Intel processors 
and their features (hey you got an AMD chip?  you will take the slowest 
code path even if you support the features needed for the high 
performance code path).


Maybe, someday, we'll get a great HPC compiler for C/Fortran.


--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] AMD and AVX512

2021-06-21 Thread Jonathan Engwall
I have followed this thinking "square peg, round hole."
You have got it again, Joe. Compilers are your problem.

On Sun, Jun 20, 2021, 10:21 AM Joe Landman  wrote:

> (Note:  not disagreeing at all with Gerald, actually agreeing strongly ...
> also, correct address this time!  Thanks Gerald!)
>
>
> On 6/19/21 11:49 AM, Gerald Henriksen wrote:
>
> On Wed, 16 Jun 2021 13:15:40 -0400, you wrote:
>
>
> The answer given, and I'm
> not making this up, is that AMD listens to their users and gives the
> users what they want, and right now they're not hearing any demand for
> AVX512.
>
> More accurately, there is call for it.  From a very small segment of the
> market.  Ones who buy small quantities of processors (under 100k volume per
> purchase).
>
> That is, not a significant enough portion of the market to make a huge
> difference to the supplier (Intel).
>
> And more to the point, AI and HPC joining forces has put the spotlight on
> small matrix multiplies, often with lower precision.  I'm not sure (haven't
> read much on it recently) if AVX512 will be enabling/has enabled support
> for bfloat16/FP16 or similar.  These tend to go to GPUs and other
> accelerators.
>
> Personally, I call BS on that one. I can't imagine anyone in the HPC
> community saying "we'd like processors that offer only 1/2 the floating
> point performance of Intel processors".
>
> I suspect that is marketing speak, which roughly translates to not
> that no one has asked for it, but rather requests haven't reached a
> threshold where the requests are viewed as significant enough.
>
> This, precisely.  AMD may be losing the AVX512 users to Intel.  But that's
> a small/miniscule fraction of the overall users of its products.  The
> demand for this is quite constrained.  Moreover, there are often
> significant performance consequences to using AVX512 (downclocking,
> pipeline stalls, etc.) whereby the cost of enabling it and using it, far
> outweighs the benefits of providing it, for the vast, overwhelming portion
> of the market.
>
> And, as noted above on the accelerator side, this use case (large vectors)
> are better handled by the accelerators.  There is a cost (engineering, code
> design, etc.) to using accelerators as well.  But it won't directly impact
> the CPUs.
>
> Sure, AMD can offer more cores,
> but with only AVX2, you'd need twice as many cores as Intel processors,
> all other things being equal.
>
> ... or you run the GPU versions of the code, which are likely getting more
> active developer attention.  AVX512 applies to only a miniscule number of
> codes/problems.  Its really not a panacea.
>
> More to the point, have you seen how "well" compilers use AVX2/SSE
> registers and do code gen?  Its not pretty in general.  Would you want the
> compilers to purposefully spit out AVX512 code the way the do AVX2/SSE code
> now?  I've found one has to work very hard with intrinsics to get good
> performance out of AVX2, never mind AVX512.
>
> Put another way, we've been hearing about "smart" compilers for a while,
> and in all honesty, most can barely implement a standard correctly, never
> mind generate reasonably (near) optimal code for the target system.  This
> has been a problem my entire professional life, and while I wish they were
> better, at the end of the day, this is where human intelligence fits into
> the HPC/AI narrative.
>
> But of course all other things aren't equal.
>
> AVX512 is a mess.
>
> Understated, and yes.
>
> Look at the Wikipedia page(*) and note that AVX512 means different
> things depending on the processor implementing it.
>
> I made comments previously about which ISA ARM folks were going to write
> to.  That is, different processors, likely implementing different
> instructions, differently ... you won't really have 1 equally good compiler
> for all these features.  You'll have a compiler that implements common
> denominators reasonably well.  Which mitigates the benefits of the
> ISA/architecture.
>
> Intel has the same problem with AVX512.  I know, I know ... feature flags
> on the CPU (see last line of lscpu output).  And how often have certain
> (ahem) compilers ignored the flags, and used a different mechanism to
> determine CPU feature support, specifically targeting their competitor
> offerings to force (literally) low performance paths for those CPUs?
>
>
> So what does the poor software developer target?
>
> Lowest common denominator.  Make the code work correctly first.  Then make
> it fast.  If fast is platform specific, ask how often with that platform be
> used.
>
>
> Or that it can for heat reasons cause CPU frequency reductions,
> meaning real world performance may not match theoritical - thus easier
> to just go with GPU's.
>
> The result is that most of the world is quite happily (at least for
> now) ignoring AVX512 and going with GPU's as necessary - particularly
> given the convenient libraries that Nvidia offers.
>
> Yeah ... like it or not, that battle is over (for now).
>
> [...]

Re: [Beowulf] AMD and AVX512

2021-06-20 Thread Jörg Saßmannshausen
Dear all,

same here, I should have joined the discussion earlier but currently I am 
recovering from a trapped ulnaris nerve OP, so long typing is something I need 
to avoid.
As it is quite apt I think, I would like to inform you about this upcoming 
talk (copy):

**
*Performance Optimizations & Best Practices for AMD Rome and Milan CPUs in HPC 
Environments*
- date & time: Fri July 2nd 2021 - 16:00-17:30 UTC
- speakers: Evan Burness and Jithin Jose (Principal Program Managers for High-
Performance Computing in Microsoft Azure)

More information available at https://github.com/easybuilders/easybuild/wiki/
EasyBuild-tech-talks-IV:-AMD-Rome-&-Milan

The talk will be presented via a Zoom session, which registered attendees can 
join, and will be streamed (+ recorded) via the EasyBuild YouTube channel.
Q via the #tech-talks channel in the EasyBuild Slack.

Please register (free or charge) if you plan to attend, via:
https://webappsx.ugent.be/eventManager/events/ebtechtalkamdromemilan
The Zoom link will only be shared with registered attendees.
**

These talks are really tech talks and not sales talks and all of the ones I 
been to were very informative and friendly. So that might be a good idea to 
ask some questions there?

All the best

Jörg

Am Sonntag, 20. Juni 2021, 18:28:25 BST schrieb Mikhail Kuzminsky:
> I apologize - I should have written earlier, but I don't always work
> with my broken right hand. It seems to me that a reasonable basis for
> discussing AMD EPYC performance could be the specified performance
> data in the Daresburg University benchmark from M.Guest. Yes, newer
> versions of AMD EPYC and Xeon Scalable processors have appeared since
> then, and new compiler versions. However, Intel already had AVX-512
> support, and AMD - AVX-256.
> Of course, peak performanceis is not so important as application
> performance. There are applications where performance is not limited
> to working with vectors - there AVX-512 may not be needed. And in AI
> tasks, working with vectors is actual - and GPUs are often used there.
> For AI, the Daresburg benchmark, on the other hand, is less relevant.
> And in Zen 4, AMD seemed to be going to support 512 bit vectors. But
> performance of linear algebra does not always require work with GPU.
> In quantum chemistry, you can get acceleration due to vectors on the
> V100, let's say a 2 times - how much more expensive is the GPU?
> Of course, support for 512 bit vectors is a plus, but you really need
> to look to application performance and cost (including power
> consumption). I prefer to see to the A64FX now, although there may
> need to be rebuild applications. Servers w/A64FX sold now, but the
> price is very important.
> 
> In message from John Hearns  (Sun, 20 Jun 2021
> 
> 06:38:06 +0100):
> > Regarding benchmarking real world codes on AMD , every year Martyn
> >
> >Guest
> >
> > presents a comprehensive set of benchmark studies to the UK Computing
> > Insights Conference.
> > I suggest a Sunday afternoon with the beverage of your choice is a
> >
> >good
> >
> > time to settle down and take time to read these or watch the
> >
> >presentation.
> >
> > 2019
> > https://www.scd.stfc.ac.uk/SiteAssets/Pages/CIUK-2019-Presentations/Martyn
> > _Guest.pdf
> > 
> > 
> > 2020 Video session
> > https://ukri.zoom.us/rec/share/ajvsxdJ8RM1wzpJtnlcypw4OyrZ9J27nqsfAG7eW49E
> > hq_Z5igat_7gj21Ge8gWu.78Cd9I1DNIjVViPV?startTime=1607008552000
> > 
> > Skylake / Cascade Lake / AMD Rome
> > 
> > The slides for 2020 do exist - as I remember all the slides from all
> >
> >talks
> >
> > are grouped together, but I cannot find them.
> > Watch the video - it is an excellent presentation.
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > On Sat, 19 Jun 2021 at 16:49, Gerald Henriksen 
> >
> >wrote:
> >> On Wed, 16 Jun 2021 13:15:40 -0400, you wrote:
> >> >The answer given, and I'm
> >> >not making this up, is that AMD listens to their users and gives the
> >> >users what they want, and right now they're not hearing any demand
> >>
> >>for
> >>
> >> >AVX512.
> >> >
> >> >Personally, I call BS on that one. I can't imagine anyone in the HPC
> >> >community saying "we'd like processors that offer only 1/2 the
> >>
> >>floating
> >>
> >> >point performance of Intel processors".
> >> 
> >> I suspect that is marketing speak, which roughly translates to not
> >> that no one has asked for it, but rather requests haven't reached a
> >> threshold where the requests are viewed as significant enough.
> >> 
> >> > Sure, AMD can offer more cores,
> >> >
> >> >but with only AVX2, you'd need twice as many cores as Intel
> >>
> >>processors,
> >>
> >> >all other things being equal.
> >> 
> >> But of course all other things aren't equal.
> >> 
> >> AVX512 is a mess.
> >> 
> >> Look at the Wikipedia page(*) and note that AVX512 means different
> >> things depending on the processor implementing it.
> >> 
> >> So what does the poor software 

Re: [Beowulf] AMD and AVX512

2021-06-20 Thread Mikhail Kuzminsky

I apologize - I should have written earlier, but I don't always work
with my broken right hand. It seems to me that a reasonable basis for
discussing AMD EPYC performance could be the specified performance
data in the Daresburg University benchmark from M.Guest. Yes, newer
versions of AMD EPYC and Xeon Scalable processors have appeared since
then, and new compiler versions. However, Intel already had AVX-512
support, and AMD - AVX-256.
Of course, peak performanceis is not so important as application
performance. There are applications where performance is not limited
to working with vectors - there AVX-512 may not be needed. And in AI
tasks, working with vectors is actual - and GPUs are often used there.
For AI, the Daresburg benchmark, on the other hand, is less relevant.
And in Zen 4, AMD seemed to be going to support 512 bit vectors. But
performance of linear algebra does not always require work with GPU.
In quantum chemistry, you can get acceleration due to vectors on the
V100, let's say a 2 times - how much more expensive is the GPU?
Of course, support for 512 bit vectors is a plus, but you really need
to look to application performance and cost (including power
consumption). I prefer to see to the A64FX now, although there may
need to be rebuild applications. Servers w/A64FX sold now, but the
price is very important.

In message from John Hearns  (Sun, 20 Jun 2021
06:38:06 +0100):
Regarding benchmarking real world codes on AMD , every year Martyn 
Guest

presents a comprehensive set of benchmark studies to the UK Computing
Insights Conference.
I suggest a Sunday afternoon with the beverage of your choice is a 
good
time to settle down and take time to read these or watch the 
presentation.


2019
https://www.scd.stfc.ac.uk/SiteAssets/Pages/CIUK-2019-Presentations/Martyn_Guest.pdf


2020 Video session
https://ukri.zoom.us/rec/share/ajvsxdJ8RM1wzpJtnlcypw4OyrZ9J27nqsfAG7eW49Ehq_Z5igat_7gj21Ge8gWu.78Cd9I1DNIjVViPV?startTime=1607008552000

Skylake / Cascade Lake / AMD Rome

The slides for 2020 do exist - as I remember all the slides from all 
talks

are grouped together, but I cannot find them.
Watch the video - it is an excellent presentation.


















On Sat, 19 Jun 2021 at 16:49, Gerald Henriksen  
wrote:



On Wed, 16 Jun 2021 13:15:40 -0400, you wrote:

>The answer given, and I'm
>not making this up, is that AMD listens to their users and gives the
>users what they want, and right now they're not hearing any demand 
for

>AVX512.
>
>Personally, I call BS on that one. I can't imagine anyone in the HPC
>community saying "we'd like processors that offer only 1/2 the 
floating

>point performance of Intel processors".

I suspect that is marketing speak, which roughly translates to not
that no one has asked for it, but rather requests haven't reached a
threshold where the requests are viewed as significant enough.

> Sure, AMD can offer more cores,
>but with only AVX2, you'd need twice as many cores as Intel 
processors,

>all other things being equal.

But of course all other things aren't equal.

AVX512 is a mess.

Look at the Wikipedia page(*) and note that AVX512 means different
things depending on the processor implementing it.

So what does the poor software developer target?

Or that it can for heat reasons cause CPU frequency reductions,
meaning real world performance may not match theoritical - thus 
easier

to just go with GPU's.

The result is that most of the world is quite happily (at least for
now) ignoring AVX512 and going with GPU's as necessary - particularly
given the convenient libraries that Nvidia offers.

> I compared a server with dual AMD EPYC >7H12 processors (128)
> quad Intel Xeon 8268 >processors (96 cores).

> From what I've heard, the AMD processors run much hotter than the 
Intel
>processors, too, so I imagine a FLOPS/Watt comparison would be even 
less

>favorable to AMD.

Spec sheets would indicate AMD runs hotter, but then again you
benchmarked twice as many Intel processors.

So, per spec sheets for you processors above:

AMD - 280W - 2 processors means system 560W
Intel - 205W - 4 processors means system 820W

(and then you also need to factor in purchase price).

>An argument can be made that for calculations that lend themselves 
to
>vectorization should be done on GPUs, instead of the main processors 
but

>the last time I checked, GPU jobs are still memory is limited, and
>moving data in and out of GPU memory can still take time, so I can 
see
>situations where for large amounts of data using CPUs would be 
preferred

>over GPUs.

AMD's latest chips support PCI 4 while Intel is still stuck on PCI 3,
which may or may not mean a difference.

But what despite all of the above and the other replies, it is AMD 
who

has been winning the HPC contracts of late, not Intel.

* - https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin 
Computing

To 

Re: [Beowulf] AMD and AVX512 [EXT]

2021-06-20 Thread Gerald Henriksen
On Sun, 20 Jun 2021 06:51:58 +0100, you wrote:

>That is a very interesting point! I never thought of that.
>Also mobile drives ARM development - yes I know the CPUs in Isambard and
>Fugaku will not be seen in your mobile phone but the ecosystem is propped
>up by having a diverse market and also the power saving priorities of
>mobile will influence HPC ARM CPUs.

I think the danger is in thinking of ARM (or going forward RISC-V) in
the same way that we have traditionally considered CPU families like
the x86 / x64 / Power families.

One of things hobbling x64 is that is effectively 1 design that Intel
(and to a lesser extent AMD) try to fit into multiple roles - often
without success.  Consider the now abandoned attempts to get Intel
chips into phones and tablets.

ARM has no such contraints - they are quite happy to develop new
designs for specific markets that are entirely unsuitable for their
existing strengths.

Hence, as part of the ARM push into HPC, the new Neoverse V1 - a
design for HPC that probably won't appear in phones.

https://www.arm.com/blogs/blueprint/neoverse-v1

Or consider that the ARM ecosystem has shunned making multiple-bitness
CPUs/SOCs - they essentially made a clean break with 64-bit only chips
that sit alongside the 32-bit only chips - vendors choose the hardware
for their needs and don't carry along legacy stuff that eats up
silicon space and power.

ARM is about taking ARM IP and creating custom designs for specific
markets.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] AMD and AVX512

2021-06-20 Thread Stu Midgley
we should be upto about EV12 by now...

On Sun, Jun 20, 2021 at 1:38 PM John Hearns  wrote:

> Regarding benchmarking real world codes on AMD , every year Martyn Guest
> presents a comprehensive set of benchmark studies to the UK Computing
> Insights Conference.
> I suggest a Sunday afternoon with the beverage of your choice is a good
> time to settle down and take time to read these or watch the presentation.
>
> 2019
>
> https://www.scd.stfc.ac.uk/SiteAssets/Pages/CIUK-2019-Presentations/Martyn_Guest.pdf
>
>
> 2020 Video session
>
> https://ukri.zoom.us/rec/share/ajvsxdJ8RM1wzpJtnlcypw4OyrZ9J27nqsfAG7eW49Ehq_Z5igat_7gj21Ge8gWu.78Cd9I1DNIjVViPV?startTime=1607008552000
>
> Skylake / Cascade Lake / AMD Rome
>
> The slides for 2020 do exist - as I remember all the slides from all talks
> are grouped together, but I cannot find them.
> Watch the video - it is an excellent presentation.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sat, 19 Jun 2021 at 16:49, Gerald Henriksen  wrote:
>
>> On Wed, 16 Jun 2021 13:15:40 -0400, you wrote:
>>
>> >The answer given, and I'm
>> >not making this up, is that AMD listens to their users and gives the
>> >users what they want, and right now they're not hearing any demand for
>> >AVX512.
>> >
>> >Personally, I call BS on that one. I can't imagine anyone in the HPC
>> >community saying "we'd like processors that offer only 1/2 the floating
>> >point performance of Intel processors".
>>
>> I suspect that is marketing speak, which roughly translates to not
>> that no one has asked for it, but rather requests haven't reached a
>> threshold where the requests are viewed as significant enough.
>>
>> > Sure, AMD can offer more cores,
>> >but with only AVX2, you'd need twice as many cores as Intel processors,
>> >all other things being equal.
>>
>> But of course all other things aren't equal.
>>
>> AVX512 is a mess.
>>
>> Look at the Wikipedia page(*) and note that AVX512 means different
>> things depending on the processor implementing it.
>>
>> So what does the poor software developer target?
>>
>> Or that it can for heat reasons cause CPU frequency reductions,
>> meaning real world performance may not match theoritical - thus easier
>> to just go with GPU's.
>>
>> The result is that most of the world is quite happily (at least for
>> now) ignoring AVX512 and going with GPU's as necessary - particularly
>> given the convenient libraries that Nvidia offers.
>>
>> > I compared a server with dual AMD EPYC >7H12 processors (128)
>> > quad Intel Xeon 8268 >processors (96 cores).
>>
>> > From what I've heard, the AMD processors run much hotter than the Intel
>> >processors, too, so I imagine a FLOPS/Watt comparison would be even less
>> >favorable to AMD.
>>
>> Spec sheets would indicate AMD runs hotter, but then again you
>> benchmarked twice as many Intel processors.
>>
>> So, per spec sheets for you processors above:
>>
>> AMD - 280W - 2 processors means system 560W
>> Intel - 205W - 4 processors means system 820W
>>
>> (and then you also need to factor in purchase price).
>>
>> >An argument can be made that for calculations that lend themselves to
>> >vectorization should be done on GPUs, instead of the main processors but
>> >the last time I checked, GPU jobs are still memory is limited, and
>> >moving data in and out of GPU memory can still take time, so I can see
>> >situations where for large amounts of data using CPUs would be preferred
>> >over GPUs.
>>
>> AMD's latest chips support PCI 4 while Intel is still stuck on PCI 3,
>> which may or may not mean a difference.
>>
>> But what despite all of the above and the other replies, it is AMD who
>> has been winning the HPC contracts of late, not Intel.
>>
>> * - https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
>> ___
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>


-- 
Dr Stuart Midgley
sdm...@gmail.com
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] AMD and AVX512 [EXT]

2021-06-19 Thread John Hearns
That is a very interesting point! I never thought of that.
Also mobile drives ARM development - yes I know the CPUs in Isambard and
Fugaku will not be seen in your mobile phone but the ecosystem is propped
up by having a diverse market and also the power saving priorities of
mobile will influence HPC ARM CPUs.



On Sun, 20 Jun 2021 at 02:04, Tim Cutts  wrote:

> I think that’s a major important point.  Even if the whole of the HPC
> market were clamouring for it (which they’re not, judging by this
> discussion) that’s still a very small proportion of the worldwide CPU
> market.  We have to remember that we in the HPC community are a niche
> market.  I recall at SC a couple of years ago someone from Intel pointing
> out that mobile devices and IoT were what was driving IT technology; the
> volume dwarfs everything else.  Hence the drive to NVRAM - not to make
> things faster for HPC (although that was the benefit being presented
> through that talk), but the fundamental driver was to increase phone
> battery life.
>
> Tim
>
> --
> Tim Cutts
> Head of Scientific Computing
> Wellcome Sanger Institute
>
>
> On 19 Jun 2021, at 16:49, Gerald Henriksen  wrote:
>
> I suspect that is marketing speak, which roughly translates to not
> that no one has asked for it, but rather requests haven't reached a
> threshold where the requests are viewed as significant enough.
>
>
> -- The Wellcome Sanger Institute is operated by Genome Research Limited, a
> charity registered in England with number 1021457 and a company registered
> in England with number 2742969, whose registered office is 215 Euston Road,
> London, NW1 2BE.
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] AMD and AVX512

2021-06-19 Thread John Hearns
Regarding benchmarking real world codes on AMD , every year Martyn Guest
presents a comprehensive set of benchmark studies to the UK Computing
Insights Conference.
I suggest a Sunday afternoon with the beverage of your choice is a good
time to settle down and take time to read these or watch the presentation.

2019
https://www.scd.stfc.ac.uk/SiteAssets/Pages/CIUK-2019-Presentations/Martyn_Guest.pdf


2020 Video session
https://ukri.zoom.us/rec/share/ajvsxdJ8RM1wzpJtnlcypw4OyrZ9J27nqsfAG7eW49Ehq_Z5igat_7gj21Ge8gWu.78Cd9I1DNIjVViPV?startTime=1607008552000

Skylake / Cascade Lake / AMD Rome

The slides for 2020 do exist - as I remember all the slides from all talks
are grouped together, but I cannot find them.
Watch the video - it is an excellent presentation.


















On Sat, 19 Jun 2021 at 16:49, Gerald Henriksen  wrote:

> On Wed, 16 Jun 2021 13:15:40 -0400, you wrote:
>
> >The answer given, and I'm
> >not making this up, is that AMD listens to their users and gives the
> >users what they want, and right now they're not hearing any demand for
> >AVX512.
> >
> >Personally, I call BS on that one. I can't imagine anyone in the HPC
> >community saying "we'd like processors that offer only 1/2 the floating
> >point performance of Intel processors".
>
> I suspect that is marketing speak, which roughly translates to not
> that no one has asked for it, but rather requests haven't reached a
> threshold where the requests are viewed as significant enough.
>
> > Sure, AMD can offer more cores,
> >but with only AVX2, you'd need twice as many cores as Intel processors,
> >all other things being equal.
>
> But of course all other things aren't equal.
>
> AVX512 is a mess.
>
> Look at the Wikipedia page(*) and note that AVX512 means different
> things depending on the processor implementing it.
>
> So what does the poor software developer target?
>
> Or that it can for heat reasons cause CPU frequency reductions,
> meaning real world performance may not match theoritical - thus easier
> to just go with GPU's.
>
> The result is that most of the world is quite happily (at least for
> now) ignoring AVX512 and going with GPU's as necessary - particularly
> given the convenient libraries that Nvidia offers.
>
> > I compared a server with dual AMD EPYC >7H12 processors (128)
> > quad Intel Xeon 8268 >processors (96 cores).
>
> > From what I've heard, the AMD processors run much hotter than the Intel
> >processors, too, so I imagine a FLOPS/Watt comparison would be even less
> >favorable to AMD.
>
> Spec sheets would indicate AMD runs hotter, but then again you
> benchmarked twice as many Intel processors.
>
> So, per spec sheets for you processors above:
>
> AMD - 280W - 2 processors means system 560W
> Intel - 205W - 4 processors means system 820W
>
> (and then you also need to factor in purchase price).
>
> >An argument can be made that for calculations that lend themselves to
> >vectorization should be done on GPUs, instead of the main processors but
> >the last time I checked, GPU jobs are still memory is limited, and
> >moving data in and out of GPU memory can still take time, so I can see
> >situations where for large amounts of data using CPUs would be preferred
> >over GPUs.
>
> AMD's latest chips support PCI 4 while Intel is still stuck on PCI 3,
> which may or may not mean a difference.
>
> But what despite all of the above and the other replies, it is AMD who
> has been winning the HPC contracts of late, not Intel.
>
> * - https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] AMD and AVX512 [EXT]

2021-06-19 Thread Tim Cutts
I think that’s a major important point.  Even if the whole of the HPC market 
were clamouring for it (which they’re not, judging by this discussion) that’s 
still a very small proportion of the worldwide CPU market.  We have to remember 
that we in the HPC community are a niche market.  I recall at SC a couple of 
years ago someone from Intel pointing out that mobile devices and IoT were what 
was driving IT technology; the volume dwarfs everything else.  Hence the drive 
to NVRAM - not to make things faster for HPC (although that was the benefit 
being presented through that talk), but the fundamental driver was to increase 
phone battery life.

Tim

--
Tim Cutts
Head of Scientific Computing
Wellcome Sanger Institute


On 19 Jun 2021, at 16:49, Gerald Henriksen 
mailto:ghenr...@gmail.com>> wrote:

I suspect that is marketing speak, which roughly translates to not
that no one has asked for it, but rather requests haven't reached a
threshold where the requests are viewed as significant enough.




-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] AMD and AVX512

2021-06-19 Thread Gerald Henriksen
On Wed, 16 Jun 2021 13:15:40 -0400, you wrote:

>The answer given, and I'm 
>not making this up, is that AMD listens to their users and gives the 
>users what they want, and right now they're not hearing any demand for 
>AVX512.
>
>Personally, I call BS on that one. I can't imagine anyone in the HPC 
>community saying "we'd like processors that offer only 1/2 the floating 
>point performance of Intel processors".

I suspect that is marketing speak, which roughly translates to not
that no one has asked for it, but rather requests haven't reached a
threshold where the requests are viewed as significant enough.

> Sure, AMD can offer more cores, 
>but with only AVX2, you'd need twice as many cores as Intel processors, 
>all other things being equal.

But of course all other things aren't equal.

AVX512 is a mess.

Look at the Wikipedia page(*) and note that AVX512 means different
things depending on the processor implementing it.

So what does the poor software developer target?

Or that it can for heat reasons cause CPU frequency reductions,
meaning real world performance may not match theoritical - thus easier
to just go with GPU's.

The result is that most of the world is quite happily (at least for
now) ignoring AVX512 and going with GPU's as necessary - particularly
given the convenient libraries that Nvidia offers.

> I compared a server with dual AMD EPYC >7H12 processors (128)
> quad Intel Xeon 8268 >processors (96 cores).

> From what I've heard, the AMD processors run much hotter than the Intel 
>processors, too, so I imagine a FLOPS/Watt comparison would be even less 
>favorable to AMD.

Spec sheets would indicate AMD runs hotter, but then again you
benchmarked twice as many Intel processors.

So, per spec sheets for you processors above:

AMD - 280W - 2 processors means system 560W
Intel - 205W - 4 processors means system 820W

(and then you also need to factor in purchase price).

>An argument can be made that for calculations that lend themselves to 
>vectorization should be done on GPUs, instead of the main processors but 
>the last time I checked, GPU jobs are still memory is limited, and 
>moving data in and out of GPU memory can still take time, so I can see 
>situations where for large amounts of data using CPUs would be preferred 
>over GPUs.

AMD's latest chips support PCI 4 while Intel is still stuck on PCI 3,
which may or may not mean a difference.

But what despite all of the above and the other replies, it is AMD who
has been winning the HPC contracts of late, not Intel.

* - https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] AMD and AVX512

2021-06-16 Thread Stu Midgley
I've told AMD brass that we need AVX512 many many times.

I've also told them that we need more memory bandwidth and that adding
dimms is not the answer.  We don't need more capacity - just more bandwidth.

We have a stack load of KNL systems and have invested heavily in AVX512
(writing with intrinsics) and shifting those codes away from it would be
considerable work.

Bring on Sapphire Rapids :)


On Thu, Jun 17, 2021 at 1:16 AM Prentice Bisbal via Beowulf <
beowulf@beowulf.org> wrote:

> Did anyone else attend this webinar panel discussion with AMD hosted by
> HPCWire yesterday? It was titled "AMD HPC Solutions: Enabling Your
> Success in HPC"
>
> https://www.hpcwire.com/amd-hpc-solutions-enabling-your-success-in-hpc/
>
> I attended it, and noticed there was no mention of AMD supporting
> AVX512, so during the question and answer portion of the program, I
> asked when AMD processors will support AVX512. The answer given, and I'm
> not making this up, is that AMD listens to their users and gives the
> users what they want, and right now they're not hearing any demand for
> AVX512.
>
> Personally, I call BS on that one. I can't imagine anyone in the HPC
> community saying "we'd like processors that offer only 1/2 the floating
> point performance of Intel processors". Sure, AMD can offer more cores,
> but with only AVX2, you'd need twice as many cores as Intel processors,
> all other things being equal.
>
> Last fall I evaluated potential new cluster nodes for a large cluster
> purchase using the HPL benchmark. I compared a server with dual AMD EPYC
> 7H12 processors (128) cores to a server with quad Intel Xeon 8268
> processors (96 cores). I measured 5,389 GFLOPS for the Xeon 8268, and
> only 3,446.00 GFLOPS for the AMD 7H12. That's LINPACK score that only
> 64% of the Xeon 8268 system, despite having 33% more cores.
>
>  From what I've heard, the AMD processors run much hotter than the Intel
> processors, too, so I imagine a FLOPS/Watt comparison would be even less
> favorable to AMD.
>
> An argument can be made that for calculations that lend themselves to
> vectorization should be done on GPUs, instead of the main processors but
> the last time I checked, GPU jobs are still memory is limited, and
> moving data in and out of GPU memory can still take time, so I can see
> situations where for large amounts of data using CPUs would be preferred
> over GPUs.
>
> Your thoughts?
>
> --
> Prentice
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>


-- 
Dr Stuart Midgley
sdm...@gmail.com
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] AMD and AVX512

2021-06-16 Thread Scott Atchley
On Wed, Jun 16, 2021 at 1:15 PM Prentice Bisbal via Beowulf <
beowulf@beowulf.org> wrote:

> Did anyone else attend this webinar panel discussion with AMD hosted by
> HPCWire yesterday? It was titled "AMD HPC Solutions: Enabling Your
> Success in HPC"
>
> https://www.hpcwire.com/amd-hpc-solutions-enabling-your-success-in-hpc/
>
> I attended it, and noticed there was no mention of AMD supporting
> AVX512, so during the question and answer portion of the program, I
> asked when AMD processors will support AVX512. The answer given, and I'm
> not making this up, is that AMD listens to their users and gives the
> users what they want, and right now they're not hearing any demand for
> AVX512.
>
> Personally, I call BS on that one. I can't imagine anyone in the HPC
> community saying "we'd like processors that offer only 1/2 the floating
> point performance of Intel processors". Sure, AMD can offer more cores,
> but with only AVX2, you'd need twice as many cores as Intel processors,
> all other things being equal.
>
> Last fall I evaluated potential new cluster nodes for a large cluster
> purchase using the HPL benchmark. I compared a server with dual AMD EPYC
> 7H12 processors (128) cores to a server with quad Intel Xeon 8268
> processors (96 cores). I measured 5,389 GFLOPS for the Xeon 8268, and
> only 3,446.00 GFLOPS for the AMD 7H12. That's LINPACK score that only
> 64% of the Xeon 8268 system, despite having 33% more cores.
>
>  From what I've heard, the AMD processors run much hotter than the Intel
> processors, too, so I imagine a FLOPS/Watt comparison would be even less
> favorable to AMD.
>
> An argument can be made that for calculations that lend themselves to
> vectorization should be done on GPUs, instead of the main processors but
> the last time I checked, GPU jobs are still memory is limited, and
> moving data in and out of GPU memory can still take time, so I can see
> situations where for large amounts of data using CPUs would be preferred
> over GPUs.
>
> Your thoughts?
>
> --
> Prentice
>

AMD has studied this quite a bit in DOE's FastForward-2 and PathForward. I
think Carlos' comment is on track. Having a unit that cannot be fed data
quick enough is pointless. It is application dependent. If your working set
fits in cache, then the vector units work well. If not, you have to move
data which stalls compute pipelines. NERSC saw only a 10% increase in
performance when moving from low core count Xeon CPUs with AVX2 to Knights
Landing with many cores and AVX-512 when it should have seen an order of
magnitude increase. Although Knights Landing had MCDRAM (Micron's not-quite
HBM), other constraints limited performance (e.g., lack of enough memory
references in flight, coherence traffic).

Fujitsu's ARM64 chip with 512b SVE in Fugaku does much better than Xeon
with AVX-512 (or Knights Landing) because of the High Bandwidth Memory
(HBM) attached and I assume a larger number of memory references in flight.
The downside is the lack of memory capacity (only 32 GB per node). This
shows that it is possible to get more performance with a CPU with a 512b
vector engine. That said, it is not clear that even this CPU design can
extract the most from the memory bandwidth. If you look at the increase in
memory bandwidth from Summit to Fugaku, one would expect performance on
real apps to increase by that amount as well. From the presentations that I
have seen, that is not always the case. For some apps, the GPU
architecture, with its coherence on demand rather than with every
operation, can extract more performance.

AMD will add 512b vectors if/when it makes sense on real apps.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] AMD and AVX512

2021-06-16 Thread Michael Di Domenico
AMD's argument is a little unsalesmen like, but i'd buy it as an
explanation.  avx512 uptake isn't a profound as intel would lead you
to believe and the push to GPU's for vectors will probably remove the
need for most of these high end vectors sooner or later (but that's my
opinion, some chip changes need to happen first)

i also think you're hpl numbers on the amd chip are low, you should be
>4000 which would put you closer to intel, but intel will still edge
out just because it has a higher base clock.

On Wed, Jun 16, 2021 at 1:15 PM Prentice Bisbal via Beowulf
 wrote:
>
> Did anyone else attend this webinar panel discussion with AMD hosted by
> HPCWire yesterday? It was titled "AMD HPC Solutions: Enabling Your
> Success in HPC"
>
> https://www.hpcwire.com/amd-hpc-solutions-enabling-your-success-in-hpc/
>
> I attended it, and noticed there was no mention of AMD supporting
> AVX512, so during the question and answer portion of the program, I
> asked when AMD processors will support AVX512. The answer given, and I'm
> not making this up, is that AMD listens to their users and gives the
> users what they want, and right now they're not hearing any demand for
> AVX512.
>
> Personally, I call BS on that one. I can't imagine anyone in the HPC
> community saying "we'd like processors that offer only 1/2 the floating
> point performance of Intel processors". Sure, AMD can offer more cores,
> but with only AVX2, you'd need twice as many cores as Intel processors,
> all other things being equal.
>
> Last fall I evaluated potential new cluster nodes for a large cluster
> purchase using the HPL benchmark. I compared a server with dual AMD EPYC
> 7H12 processors (128) cores to a server with quad Intel Xeon 8268
> processors (96 cores). I measured 5,389 GFLOPS for the Xeon 8268, and
> only 3,446.00 GFLOPS for the AMD 7H12. That's LINPACK score that only
> 64% of the Xeon 8268 system, despite having 33% more cores.
>
>  From what I've heard, the AMD processors run much hotter than the Intel
> processors, too, so I imagine a FLOPS/Watt comparison would be even less
> favorable to AMD.
>
> An argument can be made that for calculations that lend themselves to
> vectorization should be done on GPUs, instead of the main processors but
> the last time I checked, GPU jobs are still memory is limited, and
> moving data in and out of GPU memory can still take time, so I can see
> situations where for large amounts of data using CPUs would be preferred
> over GPUs.
>
> Your thoughts?
>
> --
> Prentice
>
> ___
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


Re: [Beowulf] AMD and AVX512

2021-06-16 Thread Carlos Bederián
On Wed, Jun 16, 2021 at 2:16 PM Prentice Bisbal via Beowulf <
beowulf@beowulf.org> wrote:

> Last fall I evaluated potential new cluster nodes for a large cluster
> purchase using the HPL benchmark. I compared a server with dual AMD EPYC
> 7H12 processors (128) cores to a server with quad Intel Xeon 8268
> processors (96 cores). I measured 5,389 GFLOPS for the Xeon 8268, and
> only 3,446.00 GFLOPS for the AMD 7H12. That's LINPACK score that only
> 64% of the Xeon 8268 system, despite having 33% more cores.
>

Most of the workloads we see on our clusters have arithmetic intensities
much lower than LINPACK's, so all that extra compute gets starved by lack
of memory bandwidth.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf