Re: Multi CPU interlock question
I think of the operative term as "doubleword consistency" (it is implied that the doubleword is on a doubleword boundary). That is why LM of two regs from a doubleword gets you "consistency", but "L from 1st word followed by L from 2nd word" does not). And yes, for the most part, it doesn't have to be 8 bytes, just something that does not cross a doubleword boundary. I never remember whether some of the moves for some lengths <=8 are an exception. But few things have "quadword consistency" (e.g., LPQ, STPQ). LMG of two regs does not. Peter Relson z/OS Core Technology Design
Re: Multi CPU interlock question
On 2019-01-15, at 10:48:52, Ngan, Robert wrote: > If you want to load two doublewords, block concurrency guarantees each > (aligned) doubleword is consistent, but if task 2 is in process of updating > both doublewords, using (for example) LMG may result in you loading one > doubleword from before task 2's change and one after. > Oh, this is tricky! I had to look it up. PoOps says: Block-Concurrent References For *some* references, the accesses to all bytes within a halfword, word, doubleword, or quadword are specified to appear to be block concurrent as observed by other CPUs and channel programs. (I emphasized the "some".) But elsewhere: ...; the instructions LOAD MULTIPLE (LMG) and STORE MULTIPLE (STMG), when the operand starts on a doubleword boundary; ... access their storage operands in a left-to-right direction, and all bytes accessed within each doubleword appear to be accessed concurrently as observed by other CPUs. -- gil
Re: Multi CPU interlock question
If you want to load two doublewords, block concurrency guarantees each (aligned) doubleword is consistent, but if task 2 is in process of updating both doublewords, using (for example) LMG may result in you loading one doubleword from before task 2's change and one after. -Original Message- From: IBM Mainframe Assembler List On Behalf Of Keven Sent: Monday, January 14, 2019 16:54 To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Multi CPU interlock question Shouldn’t that be:Protection for readers is only necessary when the storage in question doesn’t cross a doubleword boundary? Keven On Mon, Jan 14, 2019 at 4:17 PM -0600, "Ngan, Robert" wrote: Protection for readers is only necessary when the storage in question is larger than a doubleword. For quadwords, you can use either LPQ or PLO function 3 (CLX). Robert Ngan HCL Technologies -Original Message- From: IBM Mainframe Assembler List On Behalf Of Joe Owens Sent: Thursday, January 10, 2019 04:28 To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Multi CPU interlock question Yes, your illustration is exactly what I was concerned about. My instinct was CS was just about updaters of storage, and not readers, so there must be some other type of protection for readers. Thanks, Joe DXC Technology Company - Headquarters: 1775 Tysons Boulevard, Tysons, Virginia 22102, USA. DXC Technology Company -- This message is transmitted to you by or on behalf of DXC Technology Company or one of its affiliates. It is intended exclusively for the addressee. The substance of this message, along with any attachments, may contain proprietary, confidential or privileged information or information that is otherwise legally exempt from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient of this message, you are not authorized to read, print, retain, copy or disseminate any part of this message. If you have received this message in error, please destroy and delete all copies and notify the sender by return e-mail. Regardless of content, this e-mail shall not operate to bind DXC Technology Company or any of its affiliates to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Re: Multi CPU interlock question
Shouldn’t that be:Protection for readers is only necessary when the storage in question doesn’t cross a doubleword boundary? Keven On Mon, Jan 14, 2019 at 4:17 PM -0600, "Ngan, Robert" wrote: Protection for readers is only necessary when the storage in question is larger than a doubleword. For quadwords, you can use either LPQ or PLO function 3 (CLX). Robert Ngan HCL Technologies -Original Message- From: IBM Mainframe Assembler List On Behalf Of Joe Owens Sent: Thursday, January 10, 2019 04:28 To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Multi CPU interlock question Yes, your illustration is exactly what I was concerned about. My instinct was CS was just about updaters of storage, and not readers, so there must be some other type of protection for readers. Thanks, Joe DXC Technology Company - Headquarters: 1775 Tysons Boulevard, Tysons, Virginia 22102, USA. DXC Technology Company -- This message is transmitted to you by or on behalf of DXC Technology Company or one of its affiliates. It is intended exclusively for the addressee. The substance of this message, along with any attachments, may contain proprietary, confidential or privileged information or information that is otherwise legally exempt from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient of this message, you are not authorized to read, print, retain, copy or disseminate any part of this message. If you have received this message in error, please destroy and delete all copies and notify the sender by return e-mail. Regardless of content, this e-mail shall not operate to bind DXC Technology Company or any of its affiliates to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Re: Multi CPU interlock question
Protection for readers is only necessary when the storage in question is larger than a doubleword. For quadwords, you can use either LPQ or PLO function 3 (CLX). Robert Ngan HCL Technologies -Original Message- From: IBM Mainframe Assembler List On Behalf Of Joe Owens Sent: Thursday, January 10, 2019 04:28 To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Multi CPU interlock question Yes, your illustration is exactly what I was concerned about. My instinct was CS was just about updaters of storage, and not readers, so there must be some other type of protection for readers. Thanks, Joe DXC Technology Company - Headquarters: 1775 Tysons Boulevard, Tysons, Virginia 22102, USA. DXC Technology Company -- This message is transmitted to you by or on behalf of DXC Technology Company or one of its affiliates. It is intended exclusively for the addressee. The substance of this message, along with any attachments, may contain proprietary, confidential or privileged information or information that is otherwise legally exempt from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient of this message, you are not authorized to read, print, retain, copy or disseminate any part of this message. If you have received this message in error, please destroy and delete all copies and notify the sender by return e-mail. Regardless of content, this e-mail shall not operate to bind DXC Technology Company or any of its affiliates to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Re: Multi CPU interlock question
> But in practice there existed no machines with more than one CPU What was the IBM 9020, chopped liver? Bendix G-21? Burroughs B5000? Burroughs D825? BULL Gamma 60? CDC 6600? GE 635?Arguably, the Honeywell H-800? UNIVAC 1108? -- Shmuel (Seymour J.) Metz http://mason.gmu.edu/~smetz3 From: IBM Mainframe Assembler List on behalf of Tony Harminc Sent: Thursday, January 10, 2019 5:08 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Multi CPU interlock question On Thu, 10 Jan 2019 at 13:44, Paul Gilmartin <0014e0e4a59b-dmarc-requ...@listserv.uga.edu> wrote: > Ever since I learned of the NIL and OIL macros, I have wondered why MVI > and STC did not have the same exposure as OI and NI: > > The storage bus has been at least 16 bits wide as long as anyone can remenber. I remember the 360/30 (used one in high school), and I'm pretty sure it had an 8-bit bus. But I could be wrong. > So two processers both fetch the same 16-bit frame. One > updates the even half; the other into the odd half. Both store. > Last guy wins (sort of). Yup. But in practice there existed no machines with more than one CPU until the 360/65MP, and I don't think there ever existed a 360 or 370 with more than one processor that had anything smaller than a 64-bit bus. More to the point, I'm not sure there was a strong definition of storage access by multiple processors until fairly late in the S/370 POO days. > Are NI and OI older than CS? NI and OI are original with S/360. CS and CDS came only with DAT in S/370. Lynn Wheeler has written extensively here on CS and its origins and the internal battles associated with it. >Was there then a precursor of NIL using test-and-set? I don't believe so. (That would be OIL, wouldn't it? TS turns bits ON.) Tony H.
Re: Multi CPU interlock question
On 2019-01-10, at 15:53:59, gah wrote: > >> So two processers both fetch the same 16-bit frame. One >> updates the even half; the other into the odd half. Both store. >> Last guy wins (sort of). > > With only one processor, you still have I/O to consider. > I understand that on some systems the channels stole microcycles from the CPU, so if CPU instructions were not micro-interruptable the serialization was provided. Bit-spinning for I/O was frequently used, but risky for the naive. > And for the 360 and 370, the interval timer. Tradition was to read > the old value and replace it with.a new value with one MVC. > i've heard of that. It required dedicating some storage loctions adjacent to the interval timer. Was the interval timer always updated by an interrupt handler? But I'm still puzzled as to why on some systems with a 16-bit bus NIL and OIL were required for serialization but MVI and STC had no such hazard. -- gil
Re: Multi CPU interlock question
On Thu, 10 Jan 2019 at 17:54, gah wrote: > > I remember the 360/30 (used one in high school), and I'm pretty sure > > it had an 8-bit bus. But I could be wrong. > > The 360/30 has an 8 bit bus and 8 bit ALU. Thanks for confirming my, uh memory. > > So two processers both fetch the same 16-bit frame. One > > updates the even half; the other into the odd half. Both store. > > Last guy wins (sort of). > > With only one processor, you still have I/O to consider. But I/O is generally not covered by the same strong block concurrency rules that apply to other CPUs. Weaker rules apply in most cases. > And for the 360 and 370, the interval timer. Tradition was to read > the old value and replace it with.a new value with one MVC. More than tradition - documented in the S/360 POO as the only certain way of not missing a timer update. But that scheme was not to protect against concurrent storage access by another processor (whether CPU or channel) *during instruction execution*, but to avoid a timer update from occuring *between* instructions, as could happen if, say, Load/Store were used instead of MVC. Tony H.
Re: Multi CPU interlock question
>> The storage bus has been at least 16 bits wide as long as anyone can >> remenber. > I remember the 360/30 (used one in high school), and I'm pretty sure > it had an 8-bit bus. But I could be wrong. The 360/30 has an 8 bit bus and 8 bit ALU. The 360/40 has a 16 bit bus, but still 8 bit ALU. Memory writes can be eight or 16 bits. The 360/20 has 8 bit memory, but can write four bit units. Makes decimal instructions easier. The ALU is four bits wide, but can only add or subtract one. I had one running at the Living Computer Museum a few years ago, but it isn’t running now. The museum is working on getting a 360/30 running, but so far it isn’t. > So two processers both fetch the same 16-bit frame. One > updates the even half; the other into the odd half. Both store. > Last guy wins (sort of). With only one processor, you still have I/O to consider. And for the 360 and 370, the interval timer. Tradition was to read the old value and replace it with.a new value with one MVC.
Re: Multi CPU interlock question
On Thu, 10 Jan 2019 at 13:44, Paul Gilmartin <0014e0e4a59b-dmarc-requ...@listserv.uga.edu> wrote: > Ever since I learned of the NIL and OIL macros, I have wondered why MVI > and STC did not have the same exposure as OI and NI: > > The storage bus has been at least 16 bits wide as long as anyone can remenber. I remember the 360/30 (used one in high school), and I'm pretty sure it had an 8-bit bus. But I could be wrong. > So two processers both fetch the same 16-bit frame. One > updates the even half; the other into the odd half. Both store. > Last guy wins (sort of). Yup. But in practice there existed no machines with more than one CPU until the 360/65MP, and I don't think there ever existed a 360 or 370 with more than one processor that had anything smaller than a 64-bit bus. More to the point, I'm not sure there was a strong definition of storage access by multiple processors until fairly late in the S/370 POO days. > Are NI and OI older than CS? NI and OI are original with S/360. CS and CDS came only with DAT in S/370. Lynn Wheeler has written extensively here on CS and its origins and the internal battles associated with it. >Was there then a precursor of NIL using test-and-set? I don't believe so. (That would be OIL, wouldn't it? TS turns bits ON.) Tony H.
Re: Multi CPU interlock question
> The storage bus has been at least 16 bits wide as long as anyone can > remenber. Well, I can't remenber at all, but I can remember shorter busses. Some of us can remember farther back that others, and there was one subscriber to IBM-MAIN who was prominent in the 1950s. -- Shmuel (Seymour J.) Metz http://mason.gmu.edu/~smetz3 From: IBM Mainframe Assembler List on behalf of Paul Gilmartin <0014e0e4a59b-dmarc-requ...@listserv.uga.edu> Sent: Thursday, January 10, 2019 1:44 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Multi CPU interlock question On 2019-01-10, at 09:46:38, Peter Relson wrote: > > Many of us grew up with machines where OI and some other instructions were > done in multiple stages, and at just the wrong point could result in lost > data if CS/CDS was being used. You could not mix and match. > Ever since I learned of the NIL and OIL macros, I have wondered why MVI and STC did not have the same exposure as OI and NI: The storage bus has been at least 16 bits wide as long as anyone can remenber. So two processers both fetch the same 16-bit frame. One updates the even half; the other into the odd half. Both store. Last guy wins (sort of). Are NI and OI older than CS? Was there then a precursor of NIL using test-and-set? PDP-6 had a peculiar read-pause-write memory cycle, bypassing the restore phase of core memory access. This was a performance benefit and serialized memory updating instructions. -- gil
Re: Multi CPU interlock question
On 2019-01-10, at 09:46:38, Peter Relson wrote: > > Many of us grew up with machines where OI and some other instructions were > done in multiple stages, and at just the wrong point could result in lost > data if CS/CDS was being used. You could not mix and match. > Ever since I learned of the NIL and OIL macros, I have wondered why MVI and STC did not have the same exposure as OI and NI: The storage bus has been at least 16 bits wide as long as anyone can remenber. So two processers both fetch the same 16-bit frame. One updates the even half; the other into the odd half. Both store. Last guy wins (sort of). Are NI and OI older than CS? Was there then a precursor of NIL using test-and-set? PDP-6 had a peculiar read-pause-write memory cycle, bypassing the restore phase of core memory access. This was a performance benefit and serialized memory updating instructions. -- gil
Re: Multi CPU interlock question
On Thu, 10 Jan 2019 11:46:38 -0500, Peter Relson wrote: >If on a new enough z/OS, you can rely on the interlocked access facilities >being available. Interlocked access facility 2 was first described in the -9 edition of the zArchitecture Principles of Operation, corresponding to the zEC12. z/OS 2.3 requires at least a z12, and is the first to have that requirement. -- Tom Marchant
Re: Multi CPU interlock question
On Thu, 10 Jan 2019 at 01:07, Jim Mulder wrote: > > The coordination with other CPUs is in getting exclusive control of > the storage operand cache line. CS and ST both have to do that, so they > would be similar in performance in that regard. But CS and friends carry the perhaps quite high penalty of invoking a serialization operation both before and after. Unlike CS, the result from ST can presumably be delayed indefinitely before being actually updated in storage. Tony H.
Re: Multi CPU interlock question
On Wed, 9 Jan 2019 at 09:25, Mark Boonie wrote: > > On all z/Architecture CPUs, MVC will appear fullword-concurrent provided > both the source and target operands are fullword-aligned. You're not wrong, but the commitments for MVC (and a few similar instructions) are quite a bit stronger than that, and have been so for a very long time. Tony H.
Re: Multi CPU interlock question
Surely the cost of CS/CDS is far less than the cost of PLO. In my opinion, if you know that you can use transactional execution (which would be the case if you're on z/OS 2.3 or later *and* you know that you are not running z/OS under z/VM 6.3 or earlier), then you should never use PLO. A TBEGINC transaction, in particular, is so much more understandable and easy to code. (Unlike TBEGIN, assuming you can meet the constraints, you don't need a fallback path). Unlike PLO which serializes only against other PLO's (it does not serialize against CS/CDS), a transaction serializes against everything, in effect. I did not see mention of the interlocked access facilities. These are what makes things like "OI" work without needing to serialize via CS/CDS. Many of us grew up with machines where OI and some other instructions were done in multiple stages, and at just the wrong point could result in lost data if CS/CDS was being used. You could not mix and match. If on a new enough z/OS, you can rely on the interlocked access facilities being available. Peter Relson z/OS Core Technology Design
Re: Multi CPU interlock question
Whose illustration? (Serious question -- don't know what you are replying to.) If I understand you correctly there is no "protection" needed for readers. No matter how many readers read a word in storage, they will all retrieve the same value, until it changes. Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Joe Owens Sent: Thursday, January 10, 2019 2:28 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Multi CPU interlock question Yes, your illustration is exactly what I was concerned about. My instinct was CS was just about updaters of storage, and not readers, so there must be some other type of protection for readers. Thanks, Joe
Re: Multi CPU interlock question
Yes, your illustration is exactly what I was concerned about. My instinct was CS was just about updaters of storage, and not readers, so there must be some other type of protection for readers. Thanks, Joe
Re: Multi CPU interlock question
The coordination with other CPUs is in getting exclusive control of the storage operand cache line. CS and ST both have to do that, so they would be similar in performance in that regard. Jim Mulder z/OS Diagnosis, Design, Development, Test IBM Corp. Poughkeepsie NY > From: "Charles Mills" > To: ASSEMBLER-LIST@LISTSERV.UGA.EDU > Date: 01/10/2019 01:04 AM > Subject: Re: Multi CPU interlock question > Sent by: "IBM Mainframe Assembler List" > Avoid CS if you don't need it (and in this case you don't). CS is > expensive because every CPU has to sit up and take notice.
Re: Multi CPU interlock question
CS/CSD/CSG/CSDG would be considerably faster than PLO. CS is implemented in hardware. PLO is implemented in millicode. The millicode obtains a spin lock, performs the requested operations, and releases the lock. Jim Mulder z/OS Diagnosis, Design, Development, Test IBM Corp. Poughkeepsie NY > From: "Charles Mills" > To: ASSEMBLER-LIST@LISTSERV.UGA.EDU > Date: 01/10/2019 12:59 AM > Subject: Re: Multi CPU interlock question > Sent by: "IBM Mainframe Assembler List" > > Take your pick of answers: > > 1. I don't know. > 2. It probably depends on the model. > 3. It probably depends on what exactly is going on with the cache, > contention, etc. > 4. I am going to guess the CS is cheaper than PLO because it is simpler. > > Charles > > > -Original Message- > From: IBM Mainframe Assembler List [ mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] > On Behalf Of Seymour J Metz > Sent: Wednesday, January 9, 2019 11:52 AM > To: ASSEMBLER-LIST@LISTSERV.UGA.EDU > Subject: Re: Multi CPU interlock question > > How does the cost of CS/CDS compare to PLO? >
Re: Multi CPU interlock question
Take your pick of answers: 1. I don't know. 2. It probably depends on the model. 3. It probably depends on what exactly is going on with the cache, contention, etc. 4. I am going to guess the CS is cheaper than PLO because it is simpler. Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Seymour J Metz Sent: Wednesday, January 9, 2019 11:52 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Multi CPU interlock question How does the cost of CS/CDS compare to PLO?
Re: Multi CPU interlock question
How does the cost of CS/CDS compare to PLO? -- Shmuel (Seymour J.) Metz http://mason.gmu.edu/~smetz3 From: IBM Mainframe Assembler List on behalf of Charles Mills Sent: Wednesday, January 9, 2019 2:16 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Multi CPU interlock question Others have given you good answers. Avoid CS if you don't need it (and in this case you don't). CS is expensive because every CPU has to sit up and take notice. Answering your question in more detail, if one (or more) updaters is for example alternately storing x'' and x'' then readers will always see one of those two values, never anything like x'', assuming fullword alignment. One classic use of CS is with multiple updaters of a count: more than one updater doing L/AHI/ST. Doing it that way rather than with a CS loop will cause some increments to get lost, because one CPU's L may interleave with another CPU's update sequence. (And my example is a little out of date: more modern CPUs have a single "increment word in storage with block concurrency" instruction. CS is still valuable for older CPUs and for other applications besides the shared counter, such as a shared queue header.) In your case I would prefer ST to MVC. For what it's worth, ST has behaved this way "forever"; not sure about MVC if your code ever were to run on a much older model. Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Joe Owens Sent: Wednesday, January 9, 2019 8:11 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Multi CPU interlock question Hi Everyone, thanks for the answers. I did read the POP, but obviously not hard enough. (It is never an easy read). The block concurrency section explains it perfectly. The fullword is aligned so all should be good. The question about MVC was just out of interest.
Re: Multi CPU interlock question
Others have given you good answers. Avoid CS if you don't need it (and in this case you don't). CS is expensive because every CPU has to sit up and take notice. Answering your question in more detail, if one (or more) updaters is for example alternately storing x'' and x'' then readers will always see one of those two values, never anything like x'', assuming fullword alignment. One classic use of CS is with multiple updaters of a count: more than one updater doing L/AHI/ST. Doing it that way rather than with a CS loop will cause some increments to get lost, because one CPU's L may interleave with another CPU's update sequence. (And my example is a little out of date: more modern CPUs have a single "increment word in storage with block concurrency" instruction. CS is still valuable for older CPUs and for other applications besides the shared counter, such as a shared queue header.) In your case I would prefer ST to MVC. For what it's worth, ST has behaved this way "forever"; not sure about MVC if your code ever were to run on a much older model. Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Joe Owens Sent: Wednesday, January 9, 2019 8:11 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Multi CPU interlock question Hi Everyone, thanks for the answers. I did read the POP, but obviously not hard enough. (It is never an easy read). The block concurrency section explains it perfectly. The fullword is aligned so all should be good. The question about MVC was just out of interest.
Re: Multi CPU interlock question
> Does CS tell the right story, or does CS itself require alignment? The updated storage area is required to be fullword aligned, which is why you could/should just skip the CS. - mb
Re: Multi CPU interlock question
Hi Everyone, thanks for the answers. I did read the POP, but obviously not hard enough. (It is never an easy read). The block concurrency section explains it perfectly. The fullword is aligned so all should be good. The question about MVC was just out of interest. Thanks. Joe
Re: Multi CPU interlock question
>> Does CS tell the right story, or does CS itself require alignment? yea it does- from POP: Otherwise, a specification exception is recognized Martin
Re: Multi CPU interlock question
On 2019-01-09, at 07:34:27, Mark Boonie wrote: > Speaking for myself, I would consider it "proper" to align the operands > and skip the CS -- the architecture guarantees the behavior, so doing the > update with CS seems like overkill. (I'd probably also add a comment in > the code for each operand pointing out the reason for the alignment > requirement). > > If alignment can't be ensured (e.g., you're passed some random address by > a caller) then that's a different story. > Does CS tell the right story, or does CS itself require alignment? On 2019-01-09, at 05:06:09, Rob van der Heij wrote: > > The CPU cache is the other motivation to stay away from heavy use of shared > variables but keep things per CPU with a low-frequency distribution > process. When you keep the per-CPU objects far enough apart, you avoid > frequent invalidating cache lines on the sibling CPU. > Is there any hardware or software support for this? Operands nicely cache line separated today might be in the same line on next year's model. -- gil
Re: Multi CPU interlock question
Speaking for myself, I would consider it "proper" to align the operands and skip the CS -- the architecture guarantees the behavior, so doing the update with CS seems like overkill. (I'd probably also add a comment in the code for each operand pointing out the reason for the alignment requirement). If alignment can't be ensured (e.g., you're passed some random address by a caller) then that's a different story. - mb IBM Mainframe Assembler List wrote on 01/09/2019 06:17:29 AM: > the load (or store) will always do it on a fullwordBUT to do it > proper would require doing it with a CS.
Re: Multi CPU interlock question
On all z/Architecture CPUs, MVC will appear fullword-concurrent provided both the source and target operands are fullword-aligned. - mb IBM Mainframe Assembler List wrote on 01/09/2019 06:17:29 AM: > MVC might on some CPUs appear to do it 4 byte wise (or other multiples > thereof) -
Re: Multi CPU interlock question
On Wed, 9 Jan 2019 at 12:17, Martin Truebner wrote: > Joe, > > Robs is answer is already saying everything but let me give you > some more details. > > the load (or store) will always do it on a fullwordBUT to do it > proper would require doing it with a CS. > As long as the operand is properly aligned... > MVC might on some CPUs appear to do it 4 byte wise (or other multiples > thereof) - > > And again the question: why not do it proper in a code-segment > that is fully aware of the multi-CPU environment. > The CPU cache is the other motivation to stay away from heavy use of shared variables but keep things per CPU with a low-frequency distribution process. When you keep the per-CPU objects far enough apart, you avoid frequent invalidating cache lines on the sibling CPU. Rob
Re: Multi CPU interlock question
Joe, Robs is answer is already saying everything but let me give you some more details. the load (or store) will always do it on a fullwordBUT to do it proper would require doing it with a CS. MVC might on some CPUs appear to do it 4 byte wise (or other multiples thereof) - And again the question: why not do it proper in a code-segment that is fully aware of the multi-CPU environment. Or am I totally of the track and there is in fact a LOAD AND STORE instruction Martin
Re: Multi CPU interlock question
On Wed, 9 Jan 2019 at 11:29, Joe Owens wrote: > A 4 byte address field in virtual storage has one updater and many readers > > If using load and store instuctions, will the readers always see a > complete (valid) address, or could a CPU see a partially updated field > while a store is in progress on another CPU? > > Is the answer any different for other instructions, like MVC? > > The terminology in the Principles of Operation to look for is "block-concurrent references" (Chapter 5). That section explains MVC and the conditions under which it appears to do a double word at a time. Rob