Hi, I think, on the contrary, that he did notice the AMD/ARM issue. I suppose you haven't read the text (and I like the fact that there are different opinions on this issue).
Matthieu 2018-01-05 8:23 GMT+01:00 Gilles Gouaillardet <gil...@rist.or.jp>: > John, > > > The technical assessment so to speak is linked in the article and is > available at https://googleprojectzero.blogspot.jp/2018/01/reading-privil > eged-memory-with-side.html. > > The long rant against Intel PR blinded you and you did not notice AMD and > ARM (and though not mentionned here, Power and Sparc too) are vulnerable to > some bugs. > > > Full disclosure, i have no affiliation with Intel, but i am getting pissed > with the hysteria around this issue. > > Gilles > > > On 1/5/2018 3:54 PM, John Chludzinski wrote: > >> That article gives the best technical assessment I've seen of Intel's >> architecture bug. I noted the discussion's subject and thought I'd add some >> clarity. Nothing more. >> >> For the TL;DR crowd: get an AMD chip in your computer. >> >> On Thursday, January 4, 2018, r...@open-mpi.org <mailto:r...@open-mpi.org> >> <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote: >> >> Yes, please - that was totally inappropriate for this mailing list. >> Ralph >> >> >> On Jan 4, 2018, at 4:33 PM, Jeff Hammond <jeff.scie...@gmail.com >>> <mailto:jeff.scie...@gmail.com>> wrote: >>> >>> Can we restrain ourselves to talk about Open-MPI or at least >>> technical aspects of HPC communication on this list and leave the >>> stock market tips for Hacker News and Twitter? >>> >>> Thanks, >>> >>> Jeff >>> >>> On Thu, Jan 4, 2018 at 3:53 PM, John >>> Chludzinski<john.chludzin...@gmail.com >>> <mailto:john.chludzin...@gmail.com>>wrote: >>> >>> Fromhttps://semiaccurate.com/2018/01/04/kaiser-security-hole >>> s-will-devastate-intels-marketshare/ >>> <https://semiaccurate.com/2018/01/04/kaiser-security-holes- >>> will-devastate-intels-marketshare/> >>> >>> >>> >>> Kaiser security holes will devastate Intel’s marketshare >>> >>> >>> Analysis: This one tips the balance toward AMD in a big way >>> >>> >>> Jan 4, 2018 by Charlie Demerjian >>> <https://semiaccurate.com/author/charlie/> >>> >>> >>> >>> This latest decade-long critical security hole in Intel CPUs >>> is going to cost the company significant market share. >>> SemiAccurate thinks it is not only consequential but will >>> shift the balance of power away from Intel CPUs for at least >>> the next several years. >>> >>> Today’s latest crop of gaping security flaws have three sets >>> of holes across Intel, AMD, and ARM processors along with a >>> slew of official statements and detailed analyses. On top of >>> that the statements from vendors range from detailed and >>> direct to intentionally misleading and slimy. Lets take a >>> look at what the problems are, who they effect and what the >>> outcome will be. Those outcomes range from trivial patching >>> to destroying the market share of Intel servers, and no we >>> are not joking. >>> >>> (*Authors Note 1:* For the technical readers we are >>> simplifying a lot, sorry we know this hurts. The full >>> disclosure docs are linked, read them for the details.) >>> >>> (*Authors Note 2:* For the financial oriented subscribers out >>> there, the parts relevant to you are at the very end, the >>> section is titled *Rubber Meet Road*.) >>> >>> *The Problem(s):* >>> >>> As we said earlier there are three distinct security flaws >>> that all fall somewhat under the same umbrella. All are ‘new’ >>> in the sense that the class of attacks hasn’t been publicly >>> described before, and all are very obscure CPU speculative >>> execution and timing related problems. The extent the fixes >>> affect differing architectures also ranges from minor to >>> near-crippling slowdowns. Worse yet is that all three flaws >>> aren’t bugs or errors, they exploit correct CPU behavior to >>> allow the systems to be hacked. >>> >>> The three problems are cleverly labeled Variant One, Variant >>> Two, and Variant Three. Google Project Zero was the original >>> discoverer of them and has labeled the classes as Bounds >>> Bypass Check, Branch Target Injection, and Rogue Data Cache >>> Load respectively. You can read up on the extensive and gory >>> details here >>> <https://googleprojectzero.blogspot.com/2018/01/reading-priv >>> ileged-memory-with-side.html> if >>> >>> you wish. >>> >>> If you are the TLDR type the very simplified summary is that >>> modern CPUs will speculatively execute operations ahead of >>> the one they are currently running. Some architectures will >>> allow these executions to start even when they violate >>> privilege levels, but those instructions are killed or rolled >>> back hopefully before they actually complete running. >>> >>> Another feature of modern CPUs is virtual memory which can >>> allow memory from two or more processes to occupy the same >>> physical page. This is a good thing because if you have >>> memory from the kernel and a bit of user code in the same >>> physical page but different virtual pages, changing from >>> kernel to userspace execution doesn’t require a page fault. >>> This saves massive amounts of time and overhead giving modern >>> CPUs a huge speed boost. (For the really technical out there, >>> I know you are cringing at this simplification, sorry). >>> >>> These two things together allow you to do some interesting >>> things and along with timing attacks add new weapons to your >>> hacking arsenal. If you have code executing on one side of a >>> virtual memory page boundary, it can speculatively execute >>> the next few instructions on the physical page that cross the >>> virtual page boundary. This isn’t a big deal unless the two >>> virtual pages are mapped to processes that are from different >>> users or different privilege levels. Then you have a problem. >>> (Again painfully simplified and liberties taken with the >>> explanation, read the Google paper for the full detail.) >>> >>> This speculative execution allows you to get a few short (low >>> latency) instructions in before the speculation ends. Under >>> certain circumstances you can read memory from different >>> threads or privilege levels, write those things somewhere, >>> and figure out what addresses other bits of code are using. >>> The latter bit has the nasty effect of potentially blowing >>> through address space randomization defenses which are a >>> keystone of modern security efforts. It is ugly. >>> >>> *Who Gets Hit:* >>> >>> So we have three attack vectors and three affected companies, >>> Intel, AMD, and ARM. Each has a different set of >>> vulnerabilities to the different attacks due to differences >>> in underlying architectures. AMD put out a pretty clear >>> statement of what is affected, ARM put out by far the best >>> and most comprehensive description, and Intel obfuscated, >>> denied, blamed others, and downplayed the problem. If this >>> was a contest for misleading with doublespeak and >>> misdirection, Intel won with a gold star, the others weren’t >>> even in the game. Lets look at who said what and why. >>> >>> *ARM:* >>> >>> ARM has a page up >>> <https://developer.arm.com/support/security-update> listing >>> vulnerable processor cores, descriptions of the attacks, and >>> plenty of links to more information. They also put up a very >>> comprehensive white paper that rivals Google’s original >>> writeup, complete with code examples and a new 3a variant. >>> You can find it here >>> <https://developer.arm.com/support/security-update/download- >>> the-whitepaper>. >>> Just for completeness we are putting up ARM’s excellent table >>> of affected processors, enjoy. >>> >>> ARM Kaiser core table >>> <https://www.semiaccurate.com/assets/uploads/2018/01/ARM_Kai >>> ser_response_table.jpg> >>> >>> *Affected ARM cores* >>> >>> *AMD:* >>> >>> AMD gave us the following table which lays out their position >>> pretty clearly. The short version is that architecturally >>> speaking they are vulnerable to 1 and 2 but three is not >>> possible due to microarchitecture. More on this in a bit, it >>> is very important. AMD also went on to describe some of the >>> issues and mitigations to SemiAccurate, but again, more in a bit. >>> >>> AMD Kaiser response Matrix >>> <https://www.semiaccurate.com/assets/uploads/2018/01/AMD_Kai >>> ser_response.jpg> >>> >>> *AMD’s response matrix* >>> >>> *Intel:* >>> >>> Intel is continuing to be the running joke of the industry as >>> far as messaging is concerned. Their statement is a pretty >>> awe-inspiring example of saying nothing while desperately >>> trying to minimize the problem. You can find it here >>> <https://newsroom.intel.com/news/intel-responds-to-security- >>> research-findings/> but >>> it contains zero useful information. SemiAccurate is getting >>> tired of saying this but Intel should be ashamed of how their >>> messaging is done, not saying anything would do less damage >>> than their current course of action. >>> >>> You will notice the line in the second paragraph, “/Recent >>> reports that these exploits are caused by a “bug” or a “flaw” >>> and are unique to Intel products are incorrect.”/ This is >>> technically true and pretty damning. They are directly saying >>> that the problem is not a bug but is due to *misuse of >>> correct processor behavior*. This a a critical problem >>> because it can’t be ‘patched’ or ‘updated’ like a bug or flaw >>> without breaking the CPU. In short you can’t fix it, and this >>> will be important later. Intel mentions this but others don’t >>> for a good reason, again later. >>> >>> Then Intel goes on to say, /“Intel is committed to the >>> industry best practice of responsible disclosure of potential >>> security issues, which is why Intel and other vendors had >>> planned to disclose this issue next week when more software >>> and firmware updates will be available. However, Intel is >>> making this statement today because of the current inaccurate >>> media reports./” This is simply not true, or at least the >>> part about industry best practices of responsible disclosure. >>> Intel sat on the last critical security flaw affecting 10+ >>> years of CPUs which SemiAccurate exclusively disclosed >>> <https://www.semiaccurate.com/2017/05/01/remote-security-exp >>> loit-2008-intel-platforms/> for >>> 6+ weeks after a patch was released. Why? PR reasons. >>> >>> SemiAccurate feels that Intel holding back knowledge of what >>> we believe were flaws being actively exploited in the field >>> even though there were simple mitigation steps available is >>> not responsible. Or best practices. Or ethical. Or anything >>> even intoning goodness. It is simply unethical, but only that >>> good if you are feeling kind. Intel does not do the right >>> thing for security breaches and has not even attempted to do >>> so in the 15+ years this reporter has been tracking them on >>> the topic. They are by far the worst major company in this >>> regard, and getting worse. >>> >>> *Mitigation:* >>> >>> >>> As is described by Google, ARM, and AMD, but not Intel, there >>> are workarounds for the three new vulnerabilities. Since >>> Google first discovered these holes in June, 2017, there have >>> been patches pushed up to various Linux kernel and related >>> repositories. The first one SemiAccurate can find was dated >>> October 2017 and the industry coordinated announcement was >>> set for Monday, January 9, 2018 so you can be pretty sure >>> that the patches are in place and ready to be pushed out if >>> not on your systems already. Microsoft and Apple are said to >>> be at a similar state of readiness too. In short by the time >>> you read this, it will likely be fixed. >>> >>> That said the fixes do have consequences, and all are heavily >>> workload dependent. For variants 1 and 2 the performance hit >>> is pretty minor with reports of ~1% performance hits under >>> certain circumstances but for the most part you won’t notice >>> anything if you patch, and you should patch. Basically 1 and >>> 2 are irrelevant from any performance perspective as long as >>> your system is patched. >>> >>> The big problem is with variant 3 which ARM claims has a >>> similar effect on devices like phones or tablets, IE low >>> single digit performance hits if that. Given the way ARM CPUs >>> are used in the majority of devices, they don’t tend to have >>> the multi-user, multi-tenant, heavily virtualized workloads >>> that servers do. For the few ARM cores that are affected, >>> their users will see a minor, likely unnoticeable performance >>> hit when patched. >>> >>> User x86 systems will likely be closer to the ARM model for >>> performance hits. Why? Because while they can run heavily >>> virtualized, multi-user, multi-tenant workloads, most desktop >>> users don’t. Even if they do, it is pretty rare that these >>> users are CPU bound for performance, memory and storage >>> bandwidth will hammer performance on these workloads long >>> before the CPU becomes a bottleneck. Why do we bring this up? >>> >>> Because in those heavily virtualized, multi-tenant, >>> multi-user workloads that most servers run in the modern >>> world, the patches for 3 are painful. How painful? >>> SemiAccurate’s research has found reports of between 5-50% >>> slowdowns, again workload and software dependent, with the >>> average being around 30%. This stands to reason because the >>> fixes we have found essentially force a demapping of kernel >>> code on a context switch. >>> >>> *The Pain:* >>> >>> >>> This may sound like techno-babble but it isn’t, and it >>> happens a many thousands of times a second on modern machines >>> if not more. Because as Intel pointed out, the CPU is >>> operating correctly and the exploit uses correct behavior, it >>> can’t be patched or ‘fixed’ without breaking the CPU itself. >>> Instead what you have to do is make sure the circumstances >>> that can be exploited don’t happen. Consider this a software >>> workaround or avoidance mechanism, not a patch or bug fix, >>> the underlying problem is still there and exploitable, there >>> is just nothing to exploit. >>> >>> Since the root cause of 3 is a mechanism that results in a >>> huge performance benefit by not having to take a few thousand >>> or perhaps millions page faults a second, at the very least >>> you now have to take the hit of those page faults. Worse yet >>> the fix, from what SemiAccurate has gathered so far, has to >>> unload the kernel pages from virtual memory maps on a context >>> switch. So with the patch not only do you have to take the >>> hit you previously avoided, but you have to also do a lot of >>> work copying/scrubbing virtual memory every time you do. This >>> explains the hit of ~1/3rd of your total CPU performance >>> quite nicely. >>> >>> Going back to user x86 machines and ARM devices, they aren’t >>> doing nearly as many context switches as the servers are but >>> likely have to do the same work when doing a switch. In short >>> if you do a theoretical 5% of the switches, you take 5% of >>> that 30% hit. It isn’t this simple but you get the idea, it >>> is unlikely to cripple a consumer desktop PC or phone but >>> will probably cripple a server. Workload dependent, we meant it. >>> >>> *The Knife Goes In:* >>> >>> So x86 servers are in deep trouble, what was doable on two >>> racks of machines now needs three if you apply the patch for >>> 3. If not, well customers have lawyers, will you risk it? >>> Worse yet would you buy cloud services from someone who >>> didn’t apply the patch? Think about this for the economics of >>> the megadatacenters, if you are buying 100K+ servers a month, >>> you now need closer to 150K, not a trivial added outlay for >>> even the big guys. >>> >>> But there is one big caveat and it comes down to the part we >>> said we would get to later. Later is now. Go back and look at >>> that AMD chart near the top of the article, specifically >>> their vulnerability for Variant 3 attacks. Note the bit >>> about, “/Zero AMD vulnerability or risk because of AMD >>> architecture differences./” See an issue here? >>> >>> >>> What AMD didn’t spell out in detail is a minor difference in >>> microarchitecture between Intel and AMD CPUs. When a CPU >>> speculatively executes and crosses a privilege level >>> boundary, any idiot would probably say that the CPU should >>> see this crossing and not execute the following instructions >>> that are out of it’s privilege level. This isn’t rocket >>> science, just basic common sense. >>> >>> AMD’s microarchitecture sees this privilege level change and >>> throws the microarchitectural equivalent of a hissy fit and >>> doesn’t execute the code. Common sense wins out. Intel’s >>> implementation does execute the following code across >>> privilege levels which sounds on the surface like a bit of a >>> face-palm implementation but it really isn’t. >>> >>> What saves Intel is that the speculative execution goes on >>> but, to the best of our knowledge, is unwound when the >>> privilege level changes a few instructions later. Since Intel >>> CPUs in the wild don’t crash or violate privilege levels, it >>> looks like that mechanism works properly in practice. What >>> these new exploits do is slip a few very short instructions >>> in that can read data from the other user or privilege level >>> before the context change happens. If crafted correctly the >>> instructions are unwound but the data can be stashed in a >>> place that is persistent. >>> >>> Intel probably get a slight performance gain from doing this >>> ‘sloppy’ method but AMD seems to have have done the right >>> thing for the right reasons. That extra bounds check probably >>> take a bit of time but in retrospect, doing the right thing >>> was worth it. Since both are fundamental ‘correct’ behaviors >>> for their respective microarchitectures, there is no possible >>> fix, just code that avoids scenarios where it can be abused. >>> >>> For Intel this avoidance comes with a 30% performance hit on >>> server type workloads, less on desktop workloads. For AMD the >>> problem was avoided by design and the performance hit is >>> zero. Doing the right thing for the right reasons even if it >>> is marginally slower seems to have paid off in this >>> circumstance. Mother was right, AMD listened, Intel didn’t. >>> >>> *Weasel Words:* >>> >>> Now you have a bit more context about why Intel’s response >>> was, well, a non-response. They blamed others, correctly, for >>> having the same problem but their blanket statement avoided >>> the obvious issue of the others aren’t crippled by the >>> effects of the patches like Intel. Intel screwed up, badly, >>> and are facing a 30% performance hit going forward for it. >>> AMD did right and are probably breaking out the champagne at >>> HQ about now. >>> >>> Intel also tried to deflect lawyers by saying they follow >>> industry best practices. They don’t and the AMT hole was a >>> shining example of them putting PR above customer security. >>> Similarly their sitting on the fix for the TXT flaw for >>> *THREE*YEARS* >>> <https://www.semiaccurate.com/2016/01/20/intel-puts-out-secu >>> re-cpus-based-on-insecurity/> because >>> they didn’t want to admit to architectural security blunders >>> and reveal publicly embarrassing policies until forced to >>> disclose by a governmental agency being exploited by a >>> foreign power is another example that shines a harsh light on >>> their ‘best practices’ line. There are many more like this. >>> Intel isn’t to be trusted for security practices or >>> disclosures because PR takes precedence over customer security. >>> >>> *Rubber Meet Road:* >>> >>> Unfortunately security doesn’t sell and rarely affects >>> marketshare. This time however is different and will hit >>> Intel were it hurts, in the wallet. SemiAccurate thinks this >>> exploit is going to devastate Intel’s marketshare. Why? Read >>> on subscribers. >>> >>> /Note: The following is analysis for professional level >>> subscribers only./ >>> >>> /Disclosures: Charlie Demerjian and Stone Arch Networking >>> Services, Inc. have no consulting relationships, investment >>> relationships, or hold any investment positions with any of >>> the companies mentioned in this report./ >>> >>> / >>> / >>> >>> On Thu, Jan 4, 2018 at 6:21 PM, >>> Reuti<re...@staff.uni-marburg.de >>> <mailto:re...@staff.uni-marburg.de>>wrote: >>> >>> >>> Am 04.01.2018 um 23:45 schrieb...@open-mpi.org >>> <mailto:r...@open-mpi.org>: >>> >>> >>> > As more information continues to surface, it is clear >>> that this original article that spurred this thread was >>> somewhat incomplete - probably released a little too >>> quickly, before full information was available. There is >>> still some confusion out there, but the gist from surfing >>> the various articles (and trimming away the hysteria) >>> appears to be: >>> > >>> > * there are two security issues, both stemming from the >>> same root cause. The “problem” has actually been around >>> for nearly 20 years, but faster processors are making it >>> much more visible. >>> > >>> > * one problem (Meltdown) specifically impacts at least >>> Intel, ARM, and AMD processors. This problem is the one >>> that the kernel patches address as it can be corrected >>> via software, albeit with some impact that varies based >>> on application. Those apps that perform lots of kernel >>> services will see larger impacts than those that don’t >>> use the kernel much. >>> > >>> > * the other problem (Spectre) appears to impact _all_ >>> processors (including, by some reports, SPARC and Power). >>> This problem lacks a software solution >>> > >>> > * the “problem” is only a problem if you are running on >>> shared nodes - i.e., if multiple users share a common OS >>> instance as it allows a user to potentially access the >>> kernel information of the other user. So HPC >>> installations that allocate complete nodes to a single >>> user might want to take a closer look before installing >>> the patches. Ditto for your desktop and laptop - unless >>> someone can gain access to the machine, it isn’t really a >>> “problem”. >>> >>> Weren't there some PowerPC with strict in-order-execution >>> which could circumvent this? I find a hint about an >>> "EIEIO" command only. Sure, in-order-execution might slow >>> down the system too. >>> >>> -- Reuti >>> >>> >>> > >>> > * containers and VMs don’t fully resolve the problem - >>> the only solution other than the patches is to limit >>> allocations to single users on a node >>> > >>> > HTH >>> > Ralph >>> > >>> > >>> >> On Jan 3, 2018, at 10:47 AM,r...@open-mpi.org >>> <mailto:r...@open-mpi.org>wrote: >>> >> >>> >> Well, it appears from that article that the primary >>> impact comes from accessing kernel services. With an >>> OS-bypass network, that shouldn’t happen all that >>> frequently, and so I would naively expect the impact to >>> be at the lower end of the reported scale for those >>> environments. TCP-based systems, though, might be on the >>> other end. >>> >> >>> >> Probably something we’ll only really know after testing. >>> >> >>> >>> On Jan 3, 2018, at 10:24 AM, Noam Bernstein >>> <noam.bernst...@nrl.navy.mil >>> <mailto:noam.bernst...@nrl.navy.mil>> wrote: >>> >>> >>> >>> Out of curiosity, have any of the OpenMPI developers >>> tested (or care to speculate) how strongly affected >>> OpenMPI based codes (just the MPI part, obviously) will >>> be by the proposed Intel CPU memory-mapping-related >>> kernel patches that are all the rage? >>> >>> >>> >>> >>> https://arstechnica.com/gadgets/2018/01/whats-behind-the- >>> intel-design-flaw-forcing-numerous-patches/ >>> <https://arstechnica.com/gadgets/2018/01/whats-behind-the- >>> intel-design-flaw-forcing-numerous-patches/> >>> >>> >>> >>> Noam >>> >>> _______________________________________________ >>> >>> users mailing list >>> >>>users@lists.open-mpi.org <mailto:users@lists.open-mpi.org >>> > >>> >>>https://lists.open-mpi.org/mailman/listinfo/users >>> <https://lists.open-mpi.org/mailman/listinfo/users> >>> >> >>> >> _______________________________________________ >>> >> users mailing list >>> >>users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> >>https://lists.open-mpi.org/mailman/listinfo/users >>> <https://lists.open-mpi.org/mailman/listinfo/users> >>> > >>> > _______________________________________________ >>> > users mailing list >>> >users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> >https://lists.open-mpi.org/mailman/listinfo/users >>> <https://lists.open-mpi.org/mailman/listinfo/users> >>> > >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> https://lists.open-mpi.org/mailman/listinfo/users >>> <https://lists.open-mpi.org/mailman/listinfo/users> >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> https://lists.open-mpi.org/mailman/listinfo/users >>> <https://lists.open-mpi.org/mailman/listinfo/users> >>> >>> >>> >>> >>> -- >>> Jeff Hammond >>> jeff.scie...@gmail.com <mailto:jeff.scie...@gmail.com> >>> http://jeffhammond.github.io/ >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> https://lists.open-mpi.org/mailman/listinfo/users >>> <https://lists.open-mpi.org/mailman/listinfo/users> >>> >> >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users >> > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > -- Quantitative analyst, Ph.D. Blog: http://blog.audio-tk.com/ LinkedIn: http://www.linkedin.com/in/matthieubrucher
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users