Re: [Beowulf] AMD and AVX512
> On Wed, 16 Jun 2021 13:15:40 -0400, you wrote: > >>The answer given, and I'm >>not making this up, is that AMD listens to their users and gives the >>users what they want, and right now they're not hearing any demand for >>AVX512. >> >>Personally, I call BS on that one. I can't imagine anyone in the HPC >>community saying "we'd like processors that offer only 1/2 the floating >>point performance of Intel processors". > > I suspect that is marketing speak, which roughly translates to not > that no one has asked for it, but rather requests haven't reached a > threshold where the requests are viewed as significant enough. > Exactly, or "Right now cloud based servers are the biggest market. These customers need as many cores/threads as possible on die with "adequate" memory bandwidth. Oh, and they buy them by the boatload. What did you say you do again?" -- Doug ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
Re: [Beowulf] AMD and AVX512
Hi all, This is, in my humble opinion, also the big problem CPUs are facing. They > are > build to tackle all possible scenarios, from simple integer to floating > point, > from in-memory to disc I/O. In some respect it would have been better to > stick > with a separate math unit which then could be selected according to your > workload you want to run on that server. I guess this is where the GPUs > are > trying to fit in here, or maybe ARM. > I recall a few years ago the rumors that the Argonne "A18" system was going to use the 'Configurable Spatial Accelerators' that Intel was developing, with the idea being you *could* reconfigure based on the needs of the code. In principle, it sounds like the Holy Grail, but in practice it seems quite difficult, and I don't believe I've heard much more about the CSA approach since. WikiChip on the CSA: https://en.wikichip.org/wiki/intel/configurable_spatial_accelerator NextPlatform article: https://www.nextplatform.com/2018/08/30/intels-exascale-dataflow-engine-drops-x86-and-von-neuman/ I have to imagine that research hasn't gone fully quiet, especially with Intel's moves towards oneAPI and their FPGA experiences, but I haven't seen anything about it in a while. Of course > I also agree with the compiler "problem". If you are starting to push some > compilers too much, the code is running very fast but the results are > simply > wrong. Again, in an ideal world we have a compiler for the job for the > given > hardware which also depends on the job you want to run. > ... It exacerbates the compiler issues, *I think*. I hesitate to say it does so definitively, since the patent write-up talks about how the CSA architecture uses a representation very similar to what the (now old) Intel compilers created as an IR (intermediate representation). In my opinion, having a compiler that can 'do everything' is like having an AI that can do everything - we're good at very, *very* specific use-cases, but not generality. So configurable systems are a big challenge. (I'm *way* out of my depth on compilers, though - maybe they're improving massively?) > Maybe the whole climate problem will finally push HPC into the more > bespoken > system where the components are fit for the job in question, say weather > modeling for example, simply as that would be more energy efficient and > faster. > I can't speak to whether climate research will influence hardware, but back to the *original* theme of this thread, I actually had some data -very *limited* data, mind you!- on how NCAR's climate model, CESM, run in an 'F2000climo' case (one of many, many cases, and very atmospheric focused) at 2-degree atmosphere resolution (*very* coarse) on a 36-core Xeon Skylake performs across AVX2, AVX512 and AVX512+FMA. By default, FMA is turned off in these cases due to numerical sensitivity. So, that's a *very* specific case, but on the off chance people are curious, here's what it looks like - note that this is *noisy* data, because the model also does a lot of I/O, hence why I tend to look at median times, in blue below: SKX (AWS C5N.18xlarge) Performance Comparison CESM Case: F2000climo @ f19_g17 resolution (36 cores each component / 10 model day run, skipping 1st and last) Flags AVX2 (no FMA) AVX512 (no FMA) AVX512 + FMA Min 60.18 60.24 59.16 Max 66.26 60.47 59.40 Median 60.28 60.38 59.32 The take-away? We're not really benefiting *at all* (at this resolution, for this compset, etc) from AVX512 here. Maybe at higher resolution? Maybe with more vertical levels, or chemistry, or something like that? *Maybe*, but differences seem indistinguishable from noise here, and possibly negative! Now, give us more *memory bandwidth*, and that's fantastic. Could this code be rewritten to take better advantage of larger vectors? Sure, and some *really* capable people do work on that sort of stuff, and it helps, but as an *evolution* in performance, not a revolution in it. (Also, I'm always horrified by presenting one-off tests as examples of anything, but it's the only data I have on-hand! Other cases may indeed vary.) Before somebody comes along with: but but but it costs! Think about how > much > money is being spent simply to kill people, or at other wasteful project > like > Brexit etc. > One can only hope. When it comes to spending on research, I recall the quote: "If you think education is expensive, try ignorance!" Cheers, - Brian Am Montag, 21. Juni 2021, 14:46:30 BST schrieb Joe Landman: > > On 6/21/21 9:20 AM, Jonathan Engwall wrote: > > > I have followed this thinking "square peg, round hole." > > > You have got it again, Joe. Compilers are your problem. > > > > Erp ... did I mess up again? > > > > System architecture has been a problem ... making a processing unit > > 10-100x as fast as its support components means you have to code with > > that in mind. A simple `gfortran -O3 mycode.f` won't necessarily > > generate optimal code for the system (
Re: [Beowulf] AMD and AVX512
Dear all > System architecture has been a problem ... making a processing unit > 10-100x as fast as its support components means you have to code with > that in mind. A simple `gfortran -O3 mycode.f` won't necessarily > generate optimal code for the system ( but I swear ... -O3 ... it says > it on the package!) From a computational Chemist perspective I agree. In an ideal world, you want to get the right hardware for the program you want to use. Some of the code is running entirely in memory, others is using disc space for offloading files. This is, in my humble opinion, also the big problem CPUs are facing. They are build to tackle all possible scenarios, from simple integer to floating point, from in-memory to disc I/O. In some respect it would have been better to stick with a separate math unit which then could be selected according to your workload you want to run on that server. I guess this is where the GPUs are trying to fit in here, or maybe ARM. I also agree with the compiler "problem". If you are starting to push some compilers too much, the code is running very fast but the results are simply wrong. Again, in an ideal world we have a compiler for the job for the given hardware which also depends on the job you want to run. The problem here is not: is that possible, the problem is more: how much does it cost? From what I understand, some big server farms are actually not using commodity HPC stuff but they are designing what they need themselves. Maybe the whole climate problem will finally push HPC into the more bespoken system where the components are fit for the job in question, say weather modeling for example, simply as that would be more energy efficient and faster. Before somebody comes along with: but but but it costs! Think about how much money is being spent simply to kill people, or at other wasteful project like Brexit etc. My 2 shillings for what it is worth! :D Jörg Am Montag, 21. Juni 2021, 14:46:30 BST schrieb Joe Landman: > On 6/21/21 9:20 AM, Jonathan Engwall wrote: > > I have followed this thinking "square peg, round hole." > > You have got it again, Joe. Compilers are your problem. > > Erp ... did I mess up again? > > System architecture has been a problem ... making a processing unit > 10-100x as fast as its support components means you have to code with > that in mind. A simple `gfortran -O3 mycode.f` won't necessarily > generate optimal code for the system ( but I swear ... -O3 ... it says > it on the package!) > > Way back at Scalable, our secret sauce was largely increasing IO > bandwidth and lowering IO latency while coupling computing more tightly > to this massive IO/network pipe set, combined with intelligence in the > kernel on how to better use the resources. It was simply a better > architecture. We used the same CPUs. We simply exploited the design > better. > > End result was codes that ran on our systems with off-cpu work (storage, > networking, etc.) could push our systems far harder than competitors. > And you didn't have to use a different ISA to get these benefits. No > recompilation needed, though we did show the folks who were interested, > how to get even better performance. > > Architecture matters, as does implementation of that architecture. > There are costs to every decision within an architecture. For AVX512, > along comes lots of other baggage associated with downclocking, etc. > You have to do a cost-benefit analysis on whether or not it is worth > paying for that baggage, with the benefits you get from doing so. Some > folks have made that decision towards AVX512, and have been enjoying the > benefits of doing so (e.g. willing to pay the costs). For the general > audience, these costs represent a (significant) hurdle one must overcome. > > Here's where awesome compiler support would help. FWIW, gcc isn't that > great a compiler. Its not performance minded for HPC. Its a reasonable > general purpose standards compliant (for some subset of standards) > compilation system. LLVM is IMO a better compiler system, and its > clang/flang are developing nicely, albeit still not really HPC focused. > Then you have variants built on that. Like the Cray compiler, Nvidia > compiler and AMD compiler. These are HPC focused, and actually do quite > well with some codes (though the AMD version lags the Cray and Nvidia > compilers). You've got the Intel compiler, which would be a good general > compiler if it wasn't more of a marketing vehicle for Intel processors > and their features (hey you got an AMD chip? you will take the slowest > code path even if you support the features needed for the high > performance code path). > > Maybe, someday, we'll get a great HPC compiler for C/Fortran. ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit
Re: [Beowulf] AMD and AVX512
AVX-512 is SIMD and in that respect compiled Intel routines will run almost automatically on Intel processors. It's not like I was answering the question. I realize or under realize the implementation problems. You need to do a side by side comparison of the die. On Mon, Jun 21, 2021, 7:47 AM Andrew M.A. Cater wrote: > On Mon, Jun 21, 2021 at 09:46:30AM -0400, Joe Landman wrote: > > On 6/21/21 9:20 AM, Jonathan Engwall wrote: > > > I have followed this thinking "square peg, round hole." > > > You have got it again, Joe. Compilers are your problem. > > > > > > Erp ... did I mess up again? > > > > Here's where awesome compiler support would help. FWIW, gcc isn't that > > great a compiler. Its not performance minded for HPC. Its a reasonable > > general purpose standards compliant (for some subset of standards) > > compilation system. LLVM is IMO a better compiler system, and its > > clang/flang are developing nicely, albeit still not really HPC focused. > > Then you have variants built on that. Like the Cray compiler, Nvidia > > compiler and AMD compiler. These are HPC focused, and actually do quite > well > > with some codes (though the AMD version lags the Cray and Nvidia > compilers). > > You've got the Intel compiler, which would be a good general compiler if > it > > wasn't more of a marketing vehicle for Intel processors and their > features > > (hey you got an AMD chip? you will take the slowest code path even if > you > > support the features needed for the high performance code path). > > > > Maybe, someday, we'll get a great HPC compiler for C/Fortran. > > > The problem is that, maybe, the HPC market is still not _quite_ big enough > to merit a dedicated set of compilers and is diverse enough in its problem > sets that we still need a dozen or more specialist use cases to work well. > > You would think there would be a cross-over point where massively parallel > scalable cloud infrastructure wold intersect with HPC but that doesn't > seem to be happening. Parallelisation is the great bugbear anyway. > > Most of the experts I know on all of this are the regulars on this list: > paging Greg Lindahl ... > > All the best, > > Andy Cater > > > > > -- > > Joe Landman > > e: joe.land...@gmail.com > > t: @hpcjoe > > w: https://scalability.org > > g: https://github.com/joelandman > > l: https://www.linkedin.com/in/joelandman > > > > > ___ > > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > > ___ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
Re: [Beowulf] AMD and AVX512
On Mon, Jun 21, 2021 at 09:46:30AM -0400, Joe Landman wrote: > On 6/21/21 9:20 AM, Jonathan Engwall wrote: > > I have followed this thinking "square peg, round hole." > > You have got it again, Joe. Compilers are your problem. > > > Erp ... did I mess up again? > > Here's where awesome compiler support would help. FWIW, gcc isn't that > great a compiler. Its not performance minded for HPC. Its a reasonable > general purpose standards compliant (for some subset of standards) > compilation system. LLVM is IMO a better compiler system, and its > clang/flang are developing nicely, albeit still not really HPC focused. > Then you have variants built on that. Like the Cray compiler, Nvidia > compiler and AMD compiler. These are HPC focused, and actually do quite well > with some codes (though the AMD version lags the Cray and Nvidia compilers). > You've got the Intel compiler, which would be a good general compiler if it > wasn't more of a marketing vehicle for Intel processors and their features > (hey you got an AMD chip? you will take the slowest code path even if you > support the features needed for the high performance code path). > > Maybe, someday, we'll get a great HPC compiler for C/Fortran. > The problem is that, maybe, the HPC market is still not _quite_ big enough to merit a dedicated set of compilers and is diverse enough in its problem sets that we still need a dozen or more specialist use cases to work well. You would think there would be a cross-over point where massively parallel scalable cloud infrastructure wold intersect with HPC but that doesn't seem to be happening. Parallelisation is the great bugbear anyway. Most of the experts I know on all of this are the regulars on this list: paging Greg Lindahl ... All the best, Andy Cater > > -- > Joe Landman > e: joe.land...@gmail.com > t: @hpcjoe > w: https://scalability.org > g: https://github.com/joelandman > l: https://www.linkedin.com/in/joelandman > > ___ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
Re: [Beowulf] AMD and AVX512
On 6/21/21 9:20 AM, Jonathan Engwall wrote: I have followed this thinking "square peg, round hole." You have got it again, Joe. Compilers are your problem. Erp ... did I mess up again? System architecture has been a problem ... making a processing unit 10-100x as fast as its support components means you have to code with that in mind. A simple `gfortran -O3 mycode.f` won't necessarily generate optimal code for the system ( but I swear ... -O3 ... it says it on the package!) Way back at Scalable, our secret sauce was largely increasing IO bandwidth and lowering IO latency while coupling computing more tightly to this massive IO/network pipe set, combined with intelligence in the kernel on how to better use the resources. It was simply a better architecture. We used the same CPUs. We simply exploited the design better. End result was codes that ran on our systems with off-cpu work (storage, networking, etc.) could push our systems far harder than competitors. And you didn't have to use a different ISA to get these benefits. No recompilation needed, though we did show the folks who were interested, how to get even better performance. Architecture matters, as does implementation of that architecture. There are costs to every decision within an architecture. For AVX512, along comes lots of other baggage associated with downclocking, etc. You have to do a cost-benefit analysis on whether or not it is worth paying for that baggage, with the benefits you get from doing so. Some folks have made that decision towards AVX512, and have been enjoying the benefits of doing so (e.g. willing to pay the costs). For the general audience, these costs represent a (significant) hurdle one must overcome. Here's where awesome compiler support would help. FWIW, gcc isn't that great a compiler. Its not performance minded for HPC. Its a reasonable general purpose standards compliant (for some subset of standards) compilation system. LLVM is IMO a better compiler system, and its clang/flang are developing nicely, albeit still not really HPC focused. Then you have variants built on that. Like the Cray compiler, Nvidia compiler and AMD compiler. These are HPC focused, and actually do quite well with some codes (though the AMD version lags the Cray and Nvidia compilers). You've got the Intel compiler, which would be a good general compiler if it wasn't more of a marketing vehicle for Intel processors and their features (hey you got an AMD chip? you will take the slowest code path even if you support the features needed for the high performance code path). Maybe, someday, we'll get a great HPC compiler for C/Fortran. -- Joe Landman e: joe.land...@gmail.com t: @hpcjoe w: https://scalability.org g: https://github.com/joelandman l: https://www.linkedin.com/in/joelandman ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
Re: [Beowulf] AMD and AVX512
I have followed this thinking "square peg, round hole." You have got it again, Joe. Compilers are your problem. On Sun, Jun 20, 2021, 10:21 AM Joe Landman wrote: > (Note: not disagreeing at all with Gerald, actually agreeing strongly ... > also, correct address this time! Thanks Gerald!) > > > On 6/19/21 11:49 AM, Gerald Henriksen wrote: > > On Wed, 16 Jun 2021 13:15:40 -0400, you wrote: > > > The answer given, and I'm > not making this up, is that AMD listens to their users and gives the > users what they want, and right now they're not hearing any demand for > AVX512. > > More accurately, there is call for it. From a very small segment of the > market. Ones who buy small quantities of processors (under 100k volume per > purchase). > > That is, not a significant enough portion of the market to make a huge > difference to the supplier (Intel). > > And more to the point, AI and HPC joining forces has put the spotlight on > small matrix multiplies, often with lower precision. I'm not sure (haven't > read much on it recently) if AVX512 will be enabling/has enabled support > for bfloat16/FP16 or similar. These tend to go to GPUs and other > accelerators. > > Personally, I call BS on that one. I can't imagine anyone in the HPC > community saying "we'd like processors that offer only 1/2 the floating > point performance of Intel processors". > > I suspect that is marketing speak, which roughly translates to not > that no one has asked for it, but rather requests haven't reached a > threshold where the requests are viewed as significant enough. > > This, precisely. AMD may be losing the AVX512 users to Intel. But that's > a small/miniscule fraction of the overall users of its products. The > demand for this is quite constrained. Moreover, there are often > significant performance consequences to using AVX512 (downclocking, > pipeline stalls, etc.) whereby the cost of enabling it and using it, far > outweighs the benefits of providing it, for the vast, overwhelming portion > of the market. > > And, as noted above on the accelerator side, this use case (large vectors) > are better handled by the accelerators. There is a cost (engineering, code > design, etc.) to using accelerators as well. But it won't directly impact > the CPUs. > > Sure, AMD can offer more cores, > but with only AVX2, you'd need twice as many cores as Intel processors, > all other things being equal. > > ... or you run the GPU versions of the code, which are likely getting more > active developer attention. AVX512 applies to only a miniscule number of > codes/problems. Its really not a panacea. > > More to the point, have you seen how "well" compilers use AVX2/SSE > registers and do code gen? Its not pretty in general. Would you want the > compilers to purposefully spit out AVX512 code the way the do AVX2/SSE code > now? I've found one has to work very hard with intrinsics to get good > performance out of AVX2, never mind AVX512. > > Put another way, we've been hearing about "smart" compilers for a while, > and in all honesty, most can barely implement a standard correctly, never > mind generate reasonably (near) optimal code for the target system. This > has been a problem my entire professional life, and while I wish they were > better, at the end of the day, this is where human intelligence fits into > the HPC/AI narrative. > > But of course all other things aren't equal. > > AVX512 is a mess. > > Understated, and yes. > > Look at the Wikipedia page(*) and note that AVX512 means different > things depending on the processor implementing it. > > I made comments previously about which ISA ARM folks were going to write > to. That is, different processors, likely implementing different > instructions, differently ... you won't really have 1 equally good compiler > for all these features. You'll have a compiler that implements common > denominators reasonably well. Which mitigates the benefits of the > ISA/architecture. > > Intel has the same problem with AVX512. I know, I know ... feature flags > on the CPU (see last line of lscpu output). And how often have certain > (ahem) compilers ignored the flags, and used a different mechanism to > determine CPU feature support, specifically targeting their competitor > offerings to force (literally) low performance paths for those CPUs? > > > So what does the poor software developer target? > > Lowest common denominator. Make the code work correctly first. Then make > it fast. If fast is platform specific, ask how often with that platform be > used. > > > Or that it can for heat reasons cause CPU frequency reductions, > meaning real world performance may not match theoritical - thus easier > to just go with GPU's. > > The result is that most of the world is quite happily (at least for > now) ignoring AVX512 and going with GPU's as necessary - particularly > given the convenient libraries that Nvidia offers. > > Yeah ... like it or not, that battle is over (for now). > > [...]
Re: [Beowulf] AMD and AVX512
Dear all, same here, I should have joined the discussion earlier but currently I am recovering from a trapped ulnaris nerve OP, so long typing is something I need to avoid. As it is quite apt I think, I would like to inform you about this upcoming talk (copy): ** *Performance Optimizations & Best Practices for AMD Rome and Milan CPUs in HPC Environments* - date & time: Fri July 2nd 2021 - 16:00-17:30 UTC - speakers: Evan Burness and Jithin Jose (Principal Program Managers for High- Performance Computing in Microsoft Azure) More information available at https://github.com/easybuilders/easybuild/wiki/ EasyBuild-tech-talks-IV:-AMD-Rome-&-Milan The talk will be presented via a Zoom session, which registered attendees can join, and will be streamed (+ recorded) via the EasyBuild YouTube channel. Q via the #tech-talks channel in the EasyBuild Slack. Please register (free or charge) if you plan to attend, via: https://webappsx.ugent.be/eventManager/events/ebtechtalkamdromemilan The Zoom link will only be shared with registered attendees. ** These talks are really tech talks and not sales talks and all of the ones I been to were very informative and friendly. So that might be a good idea to ask some questions there? All the best Jörg Am Sonntag, 20. Juni 2021, 18:28:25 BST schrieb Mikhail Kuzminsky: > I apologize - I should have written earlier, but I don't always work > with my broken right hand. It seems to me that a reasonable basis for > discussing AMD EPYC performance could be the specified performance > data in the Daresburg University benchmark from M.Guest. Yes, newer > versions of AMD EPYC and Xeon Scalable processors have appeared since > then, and new compiler versions. However, Intel already had AVX-512 > support, and AMD - AVX-256. > Of course, peak performanceis is not so important as application > performance. There are applications where performance is not limited > to working with vectors - there AVX-512 may not be needed. And in AI > tasks, working with vectors is actual - and GPUs are often used there. > For AI, the Daresburg benchmark, on the other hand, is less relevant. > And in Zen 4, AMD seemed to be going to support 512 bit vectors. But > performance of linear algebra does not always require work with GPU. > In quantum chemistry, you can get acceleration due to vectors on the > V100, let's say a 2 times - how much more expensive is the GPU? > Of course, support for 512 bit vectors is a plus, but you really need > to look to application performance and cost (including power > consumption). I prefer to see to the A64FX now, although there may > need to be rebuild applications. Servers w/A64FX sold now, but the > price is very important. > > In message from John Hearns (Sun, 20 Jun 2021 > > 06:38:06 +0100): > > Regarding benchmarking real world codes on AMD , every year Martyn > > > >Guest > > > > presents a comprehensive set of benchmark studies to the UK Computing > > Insights Conference. > > I suggest a Sunday afternoon with the beverage of your choice is a > > > >good > > > > time to settle down and take time to read these or watch the > > > >presentation. > > > > 2019 > > https://www.scd.stfc.ac.uk/SiteAssets/Pages/CIUK-2019-Presentations/Martyn > > _Guest.pdf > > > > > > 2020 Video session > > https://ukri.zoom.us/rec/share/ajvsxdJ8RM1wzpJtnlcypw4OyrZ9J27nqsfAG7eW49E > > hq_Z5igat_7gj21Ge8gWu.78Cd9I1DNIjVViPV?startTime=1607008552000 > > > > Skylake / Cascade Lake / AMD Rome > > > > The slides for 2020 do exist - as I remember all the slides from all > > > >talks > > > > are grouped together, but I cannot find them. > > Watch the video - it is an excellent presentation. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, 19 Jun 2021 at 16:49, Gerald Henriksen > > > >wrote: > >> On Wed, 16 Jun 2021 13:15:40 -0400, you wrote: > >> >The answer given, and I'm > >> >not making this up, is that AMD listens to their users and gives the > >> >users what they want, and right now they're not hearing any demand > >> > >>for > >> > >> >AVX512. > >> > > >> >Personally, I call BS on that one. I can't imagine anyone in the HPC > >> >community saying "we'd like processors that offer only 1/2 the > >> > >>floating > >> > >> >point performance of Intel processors". > >> > >> I suspect that is marketing speak, which roughly translates to not > >> that no one has asked for it, but rather requests haven't reached a > >> threshold where the requests are viewed as significant enough. > >> > >> > Sure, AMD can offer more cores, > >> > > >> >but with only AVX2, you'd need twice as many cores as Intel > >> > >>processors, > >> > >> >all other things being equal. > >> > >> But of course all other things aren't equal. > >> > >> AVX512 is a mess. > >> > >> Look at the Wikipedia page(*) and note that AVX512 means different > >> things depending on the processor implementing it. > >> > >> So what does the poor software
Re: [Beowulf] AMD and AVX512
I apologize - I should have written earlier, but I don't always work with my broken right hand. It seems to me that a reasonable basis for discussing AMD EPYC performance could be the specified performance data in the Daresburg University benchmark from M.Guest. Yes, newer versions of AMD EPYC and Xeon Scalable processors have appeared since then, and new compiler versions. However, Intel already had AVX-512 support, and AMD - AVX-256. Of course, peak performanceis is not so important as application performance. There are applications where performance is not limited to working with vectors - there AVX-512 may not be needed. And in AI tasks, working with vectors is actual - and GPUs are often used there. For AI, the Daresburg benchmark, on the other hand, is less relevant. And in Zen 4, AMD seemed to be going to support 512 bit vectors. But performance of linear algebra does not always require work with GPU. In quantum chemistry, you can get acceleration due to vectors on the V100, let's say a 2 times - how much more expensive is the GPU? Of course, support for 512 bit vectors is a plus, but you really need to look to application performance and cost (including power consumption). I prefer to see to the A64FX now, although there may need to be rebuild applications. Servers w/A64FX sold now, but the price is very important. In message from John Hearns (Sun, 20 Jun 2021 06:38:06 +0100): Regarding benchmarking real world codes on AMD , every year Martyn Guest presents a comprehensive set of benchmark studies to the UK Computing Insights Conference. I suggest a Sunday afternoon with the beverage of your choice is a good time to settle down and take time to read these or watch the presentation. 2019 https://www.scd.stfc.ac.uk/SiteAssets/Pages/CIUK-2019-Presentations/Martyn_Guest.pdf 2020 Video session https://ukri.zoom.us/rec/share/ajvsxdJ8RM1wzpJtnlcypw4OyrZ9J27nqsfAG7eW49Ehq_Z5igat_7gj21Ge8gWu.78Cd9I1DNIjVViPV?startTime=1607008552000 Skylake / Cascade Lake / AMD Rome The slides for 2020 do exist - as I remember all the slides from all talks are grouped together, but I cannot find them. Watch the video - it is an excellent presentation. On Sat, 19 Jun 2021 at 16:49, Gerald Henriksen wrote: On Wed, 16 Jun 2021 13:15:40 -0400, you wrote: >The answer given, and I'm >not making this up, is that AMD listens to their users and gives the >users what they want, and right now they're not hearing any demand for >AVX512. > >Personally, I call BS on that one. I can't imagine anyone in the HPC >community saying "we'd like processors that offer only 1/2 the floating >point performance of Intel processors". I suspect that is marketing speak, which roughly translates to not that no one has asked for it, but rather requests haven't reached a threshold where the requests are viewed as significant enough. > Sure, AMD can offer more cores, >but with only AVX2, you'd need twice as many cores as Intel processors, >all other things being equal. But of course all other things aren't equal. AVX512 is a mess. Look at the Wikipedia page(*) and note that AVX512 means different things depending on the processor implementing it. So what does the poor software developer target? Or that it can for heat reasons cause CPU frequency reductions, meaning real world performance may not match theoritical - thus easier to just go with GPU's. The result is that most of the world is quite happily (at least for now) ignoring AVX512 and going with GPU's as necessary - particularly given the convenient libraries that Nvidia offers. > I compared a server with dual AMD EPYC >7H12 processors (128) > quad Intel Xeon 8268 >processors (96 cores). > From what I've heard, the AMD processors run much hotter than the Intel >processors, too, so I imagine a FLOPS/Watt comparison would be even less >favorable to AMD. Spec sheets would indicate AMD runs hotter, but then again you benchmarked twice as many Intel processors. So, per spec sheets for you processors above: AMD - 280W - 2 processors means system 560W Intel - 205W - 4 processors means system 820W (and then you also need to factor in purchase price). >An argument can be made that for calculations that lend themselves to >vectorization should be done on GPUs, instead of the main processors but >the last time I checked, GPU jobs are still memory is limited, and >moving data in and out of GPU memory can still take time, so I can see >situations where for large amounts of data using CPUs would be preferred >over GPUs. AMD's latest chips support PCI 4 while Intel is still stuck on PCI 3, which may or may not mean a difference. But what despite all of the above and the other replies, it is AMD who has been winning the HPC contracts of late, not Intel. * - https://en.wikipedia.org/wiki/Advanced_Vector_Extensions ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To
Re: [Beowulf] AMD and AVX512 [EXT]
On Sun, 20 Jun 2021 06:51:58 +0100, you wrote: >That is a very interesting point! I never thought of that. >Also mobile drives ARM development - yes I know the CPUs in Isambard and >Fugaku will not be seen in your mobile phone but the ecosystem is propped >up by having a diverse market and also the power saving priorities of >mobile will influence HPC ARM CPUs. I think the danger is in thinking of ARM (or going forward RISC-V) in the same way that we have traditionally considered CPU families like the x86 / x64 / Power families. One of things hobbling x64 is that is effectively 1 design that Intel (and to a lesser extent AMD) try to fit into multiple roles - often without success. Consider the now abandoned attempts to get Intel chips into phones and tablets. ARM has no such contraints - they are quite happy to develop new designs for specific markets that are entirely unsuitable for their existing strengths. Hence, as part of the ARM push into HPC, the new Neoverse V1 - a design for HPC that probably won't appear in phones. https://www.arm.com/blogs/blueprint/neoverse-v1 Or consider that the ARM ecosystem has shunned making multiple-bitness CPUs/SOCs - they essentially made a clean break with 64-bit only chips that sit alongside the 32-bit only chips - vendors choose the hardware for their needs and don't carry along legacy stuff that eats up silicon space and power. ARM is about taking ARM IP and creating custom designs for specific markets. ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
Re: [Beowulf] AMD and AVX512
we should be upto about EV12 by now... On Sun, Jun 20, 2021 at 1:38 PM John Hearns wrote: > Regarding benchmarking real world codes on AMD , every year Martyn Guest > presents a comprehensive set of benchmark studies to the UK Computing > Insights Conference. > I suggest a Sunday afternoon with the beverage of your choice is a good > time to settle down and take time to read these or watch the presentation. > > 2019 > > https://www.scd.stfc.ac.uk/SiteAssets/Pages/CIUK-2019-Presentations/Martyn_Guest.pdf > > > 2020 Video session > > https://ukri.zoom.us/rec/share/ajvsxdJ8RM1wzpJtnlcypw4OyrZ9J27nqsfAG7eW49Ehq_Z5igat_7gj21Ge8gWu.78Cd9I1DNIjVViPV?startTime=1607008552000 > > Skylake / Cascade Lake / AMD Rome > > The slides for 2020 do exist - as I remember all the slides from all talks > are grouped together, but I cannot find them. > Watch the video - it is an excellent presentation. > > > > > > > > > > > > > > > > > > > On Sat, 19 Jun 2021 at 16:49, Gerald Henriksen wrote: > >> On Wed, 16 Jun 2021 13:15:40 -0400, you wrote: >> >> >The answer given, and I'm >> >not making this up, is that AMD listens to their users and gives the >> >users what they want, and right now they're not hearing any demand for >> >AVX512. >> > >> >Personally, I call BS on that one. I can't imagine anyone in the HPC >> >community saying "we'd like processors that offer only 1/2 the floating >> >point performance of Intel processors". >> >> I suspect that is marketing speak, which roughly translates to not >> that no one has asked for it, but rather requests haven't reached a >> threshold where the requests are viewed as significant enough. >> >> > Sure, AMD can offer more cores, >> >but with only AVX2, you'd need twice as many cores as Intel processors, >> >all other things being equal. >> >> But of course all other things aren't equal. >> >> AVX512 is a mess. >> >> Look at the Wikipedia page(*) and note that AVX512 means different >> things depending on the processor implementing it. >> >> So what does the poor software developer target? >> >> Or that it can for heat reasons cause CPU frequency reductions, >> meaning real world performance may not match theoritical - thus easier >> to just go with GPU's. >> >> The result is that most of the world is quite happily (at least for >> now) ignoring AVX512 and going with GPU's as necessary - particularly >> given the convenient libraries that Nvidia offers. >> >> > I compared a server with dual AMD EPYC >7H12 processors (128) >> > quad Intel Xeon 8268 >processors (96 cores). >> >> > From what I've heard, the AMD processors run much hotter than the Intel >> >processors, too, so I imagine a FLOPS/Watt comparison would be even less >> >favorable to AMD. >> >> Spec sheets would indicate AMD runs hotter, but then again you >> benchmarked twice as many Intel processors. >> >> So, per spec sheets for you processors above: >> >> AMD - 280W - 2 processors means system 560W >> Intel - 205W - 4 processors means system 820W >> >> (and then you also need to factor in purchase price). >> >> >An argument can be made that for calculations that lend themselves to >> >vectorization should be done on GPUs, instead of the main processors but >> >the last time I checked, GPU jobs are still memory is limited, and >> >moving data in and out of GPU memory can still take time, so I can see >> >situations where for large amounts of data using CPUs would be preferred >> >over GPUs. >> >> AMD's latest chips support PCI 4 while Intel is still stuck on PCI 3, >> which may or may not mean a difference. >> >> But what despite all of the above and the other replies, it is AMD who >> has been winning the HPC contracts of late, not Intel. >> >> * - https://en.wikipedia.org/wiki/Advanced_Vector_Extensions >> ___ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >> > ___ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > -- Dr Stuart Midgley sdm...@gmail.com ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
Re: [Beowulf] AMD and AVX512 [EXT]
That is a very interesting point! I never thought of that. Also mobile drives ARM development - yes I know the CPUs in Isambard and Fugaku will not be seen in your mobile phone but the ecosystem is propped up by having a diverse market and also the power saving priorities of mobile will influence HPC ARM CPUs. On Sun, 20 Jun 2021 at 02:04, Tim Cutts wrote: > I think that’s a major important point. Even if the whole of the HPC > market were clamouring for it (which they’re not, judging by this > discussion) that’s still a very small proportion of the worldwide CPU > market. We have to remember that we in the HPC community are a niche > market. I recall at SC a couple of years ago someone from Intel pointing > out that mobile devices and IoT were what was driving IT technology; the > volume dwarfs everything else. Hence the drive to NVRAM - not to make > things faster for HPC (although that was the benefit being presented > through that talk), but the fundamental driver was to increase phone > battery life. > > Tim > > -- > Tim Cutts > Head of Scientific Computing > Wellcome Sanger Institute > > > On 19 Jun 2021, at 16:49, Gerald Henriksen wrote: > > I suspect that is marketing speak, which roughly translates to not > that no one has asked for it, but rather requests haven't reached a > threshold where the requests are viewed as significant enough. > > > -- The Wellcome Sanger Institute is operated by Genome Research Limited, a > charity registered in England with number 1021457 and a company registered > in England with number 2742969, whose registered office is 215 Euston Road, > London, NW1 2BE. > ___ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
Re: [Beowulf] AMD and AVX512
Regarding benchmarking real world codes on AMD , every year Martyn Guest presents a comprehensive set of benchmark studies to the UK Computing Insights Conference. I suggest a Sunday afternoon with the beverage of your choice is a good time to settle down and take time to read these or watch the presentation. 2019 https://www.scd.stfc.ac.uk/SiteAssets/Pages/CIUK-2019-Presentations/Martyn_Guest.pdf 2020 Video session https://ukri.zoom.us/rec/share/ajvsxdJ8RM1wzpJtnlcypw4OyrZ9J27nqsfAG7eW49Ehq_Z5igat_7gj21Ge8gWu.78Cd9I1DNIjVViPV?startTime=1607008552000 Skylake / Cascade Lake / AMD Rome The slides for 2020 do exist - as I remember all the slides from all talks are grouped together, but I cannot find them. Watch the video - it is an excellent presentation. On Sat, 19 Jun 2021 at 16:49, Gerald Henriksen wrote: > On Wed, 16 Jun 2021 13:15:40 -0400, you wrote: > > >The answer given, and I'm > >not making this up, is that AMD listens to their users and gives the > >users what they want, and right now they're not hearing any demand for > >AVX512. > > > >Personally, I call BS on that one. I can't imagine anyone in the HPC > >community saying "we'd like processors that offer only 1/2 the floating > >point performance of Intel processors". > > I suspect that is marketing speak, which roughly translates to not > that no one has asked for it, but rather requests haven't reached a > threshold where the requests are viewed as significant enough. > > > Sure, AMD can offer more cores, > >but with only AVX2, you'd need twice as many cores as Intel processors, > >all other things being equal. > > But of course all other things aren't equal. > > AVX512 is a mess. > > Look at the Wikipedia page(*) and note that AVX512 means different > things depending on the processor implementing it. > > So what does the poor software developer target? > > Or that it can for heat reasons cause CPU frequency reductions, > meaning real world performance may not match theoritical - thus easier > to just go with GPU's. > > The result is that most of the world is quite happily (at least for > now) ignoring AVX512 and going with GPU's as necessary - particularly > given the convenient libraries that Nvidia offers. > > > I compared a server with dual AMD EPYC >7H12 processors (128) > > quad Intel Xeon 8268 >processors (96 cores). > > > From what I've heard, the AMD processors run much hotter than the Intel > >processors, too, so I imagine a FLOPS/Watt comparison would be even less > >favorable to AMD. > > Spec sheets would indicate AMD runs hotter, but then again you > benchmarked twice as many Intel processors. > > So, per spec sheets for you processors above: > > AMD - 280W - 2 processors means system 560W > Intel - 205W - 4 processors means system 820W > > (and then you also need to factor in purchase price). > > >An argument can be made that for calculations that lend themselves to > >vectorization should be done on GPUs, instead of the main processors but > >the last time I checked, GPU jobs are still memory is limited, and > >moving data in and out of GPU memory can still take time, so I can see > >situations where for large amounts of data using CPUs would be preferred > >over GPUs. > > AMD's latest chips support PCI 4 while Intel is still stuck on PCI 3, > which may or may not mean a difference. > > But what despite all of the above and the other replies, it is AMD who > has been winning the HPC contracts of late, not Intel. > > * - https://en.wikipedia.org/wiki/Advanced_Vector_Extensions > ___ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
Re: [Beowulf] AMD and AVX512 [EXT]
I think that’s a major important point. Even if the whole of the HPC market were clamouring for it (which they’re not, judging by this discussion) that’s still a very small proportion of the worldwide CPU market. We have to remember that we in the HPC community are a niche market. I recall at SC a couple of years ago someone from Intel pointing out that mobile devices and IoT were what was driving IT technology; the volume dwarfs everything else. Hence the drive to NVRAM - not to make things faster for HPC (although that was the benefit being presented through that talk), but the fundamental driver was to increase phone battery life. Tim -- Tim Cutts Head of Scientific Computing Wellcome Sanger Institute On 19 Jun 2021, at 16:49, Gerald Henriksen mailto:ghenr...@gmail.com>> wrote: I suspect that is marketing speak, which roughly translates to not that no one has asked for it, but rather requests haven't reached a threshold where the requests are viewed as significant enough. -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
Re: [Beowulf] AMD and AVX512
On Wed, 16 Jun 2021 13:15:40 -0400, you wrote: >The answer given, and I'm >not making this up, is that AMD listens to their users and gives the >users what they want, and right now they're not hearing any demand for >AVX512. > >Personally, I call BS on that one. I can't imagine anyone in the HPC >community saying "we'd like processors that offer only 1/2 the floating >point performance of Intel processors". I suspect that is marketing speak, which roughly translates to not that no one has asked for it, but rather requests haven't reached a threshold where the requests are viewed as significant enough. > Sure, AMD can offer more cores, >but with only AVX2, you'd need twice as many cores as Intel processors, >all other things being equal. But of course all other things aren't equal. AVX512 is a mess. Look at the Wikipedia page(*) and note that AVX512 means different things depending on the processor implementing it. So what does the poor software developer target? Or that it can for heat reasons cause CPU frequency reductions, meaning real world performance may not match theoritical - thus easier to just go with GPU's. The result is that most of the world is quite happily (at least for now) ignoring AVX512 and going with GPU's as necessary - particularly given the convenient libraries that Nvidia offers. > I compared a server with dual AMD EPYC >7H12 processors (128) > quad Intel Xeon 8268 >processors (96 cores). > From what I've heard, the AMD processors run much hotter than the Intel >processors, too, so I imagine a FLOPS/Watt comparison would be even less >favorable to AMD. Spec sheets would indicate AMD runs hotter, but then again you benchmarked twice as many Intel processors. So, per spec sheets for you processors above: AMD - 280W - 2 processors means system 560W Intel - 205W - 4 processors means system 820W (and then you also need to factor in purchase price). >An argument can be made that for calculations that lend themselves to >vectorization should be done on GPUs, instead of the main processors but >the last time I checked, GPU jobs are still memory is limited, and >moving data in and out of GPU memory can still take time, so I can see >situations where for large amounts of data using CPUs would be preferred >over GPUs. AMD's latest chips support PCI 4 while Intel is still stuck on PCI 3, which may or may not mean a difference. But what despite all of the above and the other replies, it is AMD who has been winning the HPC contracts of late, not Intel. * - https://en.wikipedia.org/wiki/Advanced_Vector_Extensions ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
Re: [Beowulf] AMD and AVX512
I've told AMD brass that we need AVX512 many many times. I've also told them that we need more memory bandwidth and that adding dimms is not the answer. We don't need more capacity - just more bandwidth. We have a stack load of KNL systems and have invested heavily in AVX512 (writing with intrinsics) and shifting those codes away from it would be considerable work. Bring on Sapphire Rapids :) On Thu, Jun 17, 2021 at 1:16 AM Prentice Bisbal via Beowulf < beowulf@beowulf.org> wrote: > Did anyone else attend this webinar panel discussion with AMD hosted by > HPCWire yesterday? It was titled "AMD HPC Solutions: Enabling Your > Success in HPC" > > https://www.hpcwire.com/amd-hpc-solutions-enabling-your-success-in-hpc/ > > I attended it, and noticed there was no mention of AMD supporting > AVX512, so during the question and answer portion of the program, I > asked when AMD processors will support AVX512. The answer given, and I'm > not making this up, is that AMD listens to their users and gives the > users what they want, and right now they're not hearing any demand for > AVX512. > > Personally, I call BS on that one. I can't imagine anyone in the HPC > community saying "we'd like processors that offer only 1/2 the floating > point performance of Intel processors". Sure, AMD can offer more cores, > but with only AVX2, you'd need twice as many cores as Intel processors, > all other things being equal. > > Last fall I evaluated potential new cluster nodes for a large cluster > purchase using the HPL benchmark. I compared a server with dual AMD EPYC > 7H12 processors (128) cores to a server with quad Intel Xeon 8268 > processors (96 cores). I measured 5,389 GFLOPS for the Xeon 8268, and > only 3,446.00 GFLOPS for the AMD 7H12. That's LINPACK score that only > 64% of the Xeon 8268 system, despite having 33% more cores. > > From what I've heard, the AMD processors run much hotter than the Intel > processors, too, so I imagine a FLOPS/Watt comparison would be even less > favorable to AMD. > > An argument can be made that for calculations that lend themselves to > vectorization should be done on GPUs, instead of the main processors but > the last time I checked, GPU jobs are still memory is limited, and > moving data in and out of GPU memory can still take time, so I can see > situations where for large amounts of data using CPUs would be preferred > over GPUs. > > Your thoughts? > > -- > Prentice > > ___ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > -- Dr Stuart Midgley sdm...@gmail.com ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
Re: [Beowulf] AMD and AVX512
On Wed, Jun 16, 2021 at 1:15 PM Prentice Bisbal via Beowulf < beowulf@beowulf.org> wrote: > Did anyone else attend this webinar panel discussion with AMD hosted by > HPCWire yesterday? It was titled "AMD HPC Solutions: Enabling Your > Success in HPC" > > https://www.hpcwire.com/amd-hpc-solutions-enabling-your-success-in-hpc/ > > I attended it, and noticed there was no mention of AMD supporting > AVX512, so during the question and answer portion of the program, I > asked when AMD processors will support AVX512. The answer given, and I'm > not making this up, is that AMD listens to their users and gives the > users what they want, and right now they're not hearing any demand for > AVX512. > > Personally, I call BS on that one. I can't imagine anyone in the HPC > community saying "we'd like processors that offer only 1/2 the floating > point performance of Intel processors". Sure, AMD can offer more cores, > but with only AVX2, you'd need twice as many cores as Intel processors, > all other things being equal. > > Last fall I evaluated potential new cluster nodes for a large cluster > purchase using the HPL benchmark. I compared a server with dual AMD EPYC > 7H12 processors (128) cores to a server with quad Intel Xeon 8268 > processors (96 cores). I measured 5,389 GFLOPS for the Xeon 8268, and > only 3,446.00 GFLOPS for the AMD 7H12. That's LINPACK score that only > 64% of the Xeon 8268 system, despite having 33% more cores. > > From what I've heard, the AMD processors run much hotter than the Intel > processors, too, so I imagine a FLOPS/Watt comparison would be even less > favorable to AMD. > > An argument can be made that for calculations that lend themselves to > vectorization should be done on GPUs, instead of the main processors but > the last time I checked, GPU jobs are still memory is limited, and > moving data in and out of GPU memory can still take time, so I can see > situations where for large amounts of data using CPUs would be preferred > over GPUs. > > Your thoughts? > > -- > Prentice > AMD has studied this quite a bit in DOE's FastForward-2 and PathForward. I think Carlos' comment is on track. Having a unit that cannot be fed data quick enough is pointless. It is application dependent. If your working set fits in cache, then the vector units work well. If not, you have to move data which stalls compute pipelines. NERSC saw only a 10% increase in performance when moving from low core count Xeon CPUs with AVX2 to Knights Landing with many cores and AVX-512 when it should have seen an order of magnitude increase. Although Knights Landing had MCDRAM (Micron's not-quite HBM), other constraints limited performance (e.g., lack of enough memory references in flight, coherence traffic). Fujitsu's ARM64 chip with 512b SVE in Fugaku does much better than Xeon with AVX-512 (or Knights Landing) because of the High Bandwidth Memory (HBM) attached and I assume a larger number of memory references in flight. The downside is the lack of memory capacity (only 32 GB per node). This shows that it is possible to get more performance with a CPU with a 512b vector engine. That said, it is not clear that even this CPU design can extract the most from the memory bandwidth. If you look at the increase in memory bandwidth from Summit to Fugaku, one would expect performance on real apps to increase by that amount as well. From the presentations that I have seen, that is not always the case. For some apps, the GPU architecture, with its coherence on demand rather than with every operation, can extract more performance. AMD will add 512b vectors if/when it makes sense on real apps. ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
Re: [Beowulf] AMD and AVX512
AMD's argument is a little unsalesmen like, but i'd buy it as an explanation. avx512 uptake isn't a profound as intel would lead you to believe and the push to GPU's for vectors will probably remove the need for most of these high end vectors sooner or later (but that's my opinion, some chip changes need to happen first) i also think you're hpl numbers on the amd chip are low, you should be >4000 which would put you closer to intel, but intel will still edge out just because it has a higher base clock. On Wed, Jun 16, 2021 at 1:15 PM Prentice Bisbal via Beowulf wrote: > > Did anyone else attend this webinar panel discussion with AMD hosted by > HPCWire yesterday? It was titled "AMD HPC Solutions: Enabling Your > Success in HPC" > > https://www.hpcwire.com/amd-hpc-solutions-enabling-your-success-in-hpc/ > > I attended it, and noticed there was no mention of AMD supporting > AVX512, so during the question and answer portion of the program, I > asked when AMD processors will support AVX512. The answer given, and I'm > not making this up, is that AMD listens to their users and gives the > users what they want, and right now they're not hearing any demand for > AVX512. > > Personally, I call BS on that one. I can't imagine anyone in the HPC > community saying "we'd like processors that offer only 1/2 the floating > point performance of Intel processors". Sure, AMD can offer more cores, > but with only AVX2, you'd need twice as many cores as Intel processors, > all other things being equal. > > Last fall I evaluated potential new cluster nodes for a large cluster > purchase using the HPL benchmark. I compared a server with dual AMD EPYC > 7H12 processors (128) cores to a server with quad Intel Xeon 8268 > processors (96 cores). I measured 5,389 GFLOPS for the Xeon 8268, and > only 3,446.00 GFLOPS for the AMD 7H12. That's LINPACK score that only > 64% of the Xeon 8268 system, despite having 33% more cores. > > From what I've heard, the AMD processors run much hotter than the Intel > processors, too, so I imagine a FLOPS/Watt comparison would be even less > favorable to AMD. > > An argument can be made that for calculations that lend themselves to > vectorization should be done on GPUs, instead of the main processors but > the last time I checked, GPU jobs are still memory is limited, and > moving data in and out of GPU memory can still take time, so I can see > situations where for large amounts of data using CPUs would be preferred > over GPUs. > > Your thoughts? > > -- > Prentice > > ___ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
Re: [Beowulf] AMD and AVX512
On Wed, Jun 16, 2021 at 2:16 PM Prentice Bisbal via Beowulf < beowulf@beowulf.org> wrote: > Last fall I evaluated potential new cluster nodes for a large cluster > purchase using the HPL benchmark. I compared a server with dual AMD EPYC > 7H12 processors (128) cores to a server with quad Intel Xeon 8268 > processors (96 cores). I measured 5,389 GFLOPS for the Xeon 8268, and > only 3,446.00 GFLOPS for the AMD 7H12. That's LINPACK score that only > 64% of the Xeon 8268 system, despite having 33% more cores. > Most of the workloads we see on our clusters have arithmetic intensities much lower than LINPACK's, so all that extra compute gets starved by lack of memory bandwidth. ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf