Re: Multi CPU interlock question

2019-01-16 Thread Peter Relson
I think of the operative term as "doubleword consistency" (it is implied 
that the doubleword is on a doubleword boundary).
That is why LM of two regs from a doubleword gets you "consistency", but 
"L from 1st word followed by L from 2nd word" does not).
And yes, for the most part, it doesn't have to be 8 bytes, just something 
that does not cross a doubleword boundary. I never remember whether some 
of the moves for some lengths <=8 are an exception.

But few things have "quadword consistency" (e.g., LPQ, STPQ). LMG of two 
regs does not.

Peter Relson
z/OS Core Technology Design


Re: Multi CPU interlock question

2019-01-15 Thread Paul Gilmartin
On 2019-01-15, at 10:48:52, Ngan, Robert wrote:

> If you want to load two doublewords, block concurrency guarantees each 
> (aligned) doubleword is consistent, but if task 2 is in process of updating 
> both doublewords, using (for example) LMG may result in you loading one 
> doubleword from before task 2's change and one after.
>  
Oh, this is tricky!  I had to look it up.  PoOps says:
Block-Concurrent References
For *some* references, the accesses to all bytes within a halfword, word, 
doubleword,
or quadword are specified to appear to be block concurrent as observed by 
other CPUs
and channel programs. 

(I emphasized the "some".)  But elsewhere:
...; the instructions LOAD MULTIPLE (LMG) and STORE MULTIPLE (STMG), when 
the
operand starts on a doubleword boundary; ... access their storage operands 
in a
left-to-right direction, and all bytes accessed within each doubleword 
appear
to be accessed concurrently as observed by other CPUs.

-- gil


Re: Multi CPU interlock question

2019-01-15 Thread Ngan, Robert
If you want to load two doublewords, block concurrency guarantees each 
(aligned) doubleword is consistent, but if task 2 is in process of updating 
both doublewords, using (for example) LMG may result in you loading one 
doubleword from before task 2's change and one after.

-Original Message-
From: IBM Mainframe Assembler List  On Behalf 
Of Keven
Sent: Monday, January 14, 2019 16:54
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Multi CPU interlock question

  
  

Shouldn’t that be:Protection for readers is only necessary when the 
storage in question doesn’t cross a doubleword boundary?
Keven






  



On Mon, Jan 14, 2019 at 4:17 PM -0600, "Ngan, Robert"  wrote:










Protection for readers is only necessary when the storage in question is larger 
than a doubleword.
For quadwords, you can use either LPQ or PLO function 3 (CLX).

Robert Ngan
HCL Technologies

-Original Message-
From: IBM Mainframe Assembler List  On Behalf Of Joe Owens
Sent: Thursday, January 10, 2019 04:28
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Multi CPU interlock question

Yes, your illustration is exactly what I was concerned about. My instinct was 
CS was just about updaters of storage, and not readers, so there must be some 
other type of protection for readers.

Thanks, Joe

DXC Technology Company - Headquarters: 1775 Tysons Boulevard, Tysons, Virginia 
22102, USA.
DXC Technology Company -- This message is transmitted to you by or on behalf of 
DXC Technology Company or one of its affiliates.  It is intended exclusively 
for the addressee.  The substance of this message, along with any attachments, 
may contain proprietary, confidential or privileged information or information 
that is otherwise legally exempt from disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended recipient 
of this message, you are not authorized to read, print, retain, copy or 
disseminate any part of this message. If you have received this message in 
error, please destroy and delete all copies and notify the sender by return 
e-mail. Regardless of content, this e-mail shall not operate to bind DXC 
Technology Company or any of its affiliates to any order or other contract 
unless pursuant to explicit written agreement or government initiative 
expressly permitting the use of e-mail for such purpose.


Re: Multi CPU interlock question

2019-01-14 Thread Keven
  
  

Shouldn’t that be:Protection for readers is only necessary when the 
storage in question doesn’t cross a doubleword boundary?
Keven






  



On Mon, Jan 14, 2019 at 4:17 PM -0600, "Ngan, Robert"  wrote:










Protection for readers is only necessary when the storage in question is larger 
than a doubleword.
For quadwords, you can use either LPQ or PLO function 3 (CLX).

Robert Ngan
HCL Technologies

-Original Message-
From: IBM Mainframe Assembler List  On Behalf Of Joe Owens
Sent: Thursday, January 10, 2019 04:28
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Multi CPU interlock question

Yes, your illustration is exactly what I was concerned about. My instinct was 
CS was just about updaters of storage, and not readers, so there must be some 
other type of protection for readers.

Thanks, Joe

DXC Technology Company - Headquarters: 1775 Tysons Boulevard, Tysons, Virginia 
22102, USA.
DXC Technology Company -- This message is transmitted to you by or on behalf of 
DXC Technology Company or one of its affiliates.  It is intended exclusively 
for the addressee.  The substance of this message, along with any attachments, 
may contain proprietary, confidential or privileged information or information 
that is otherwise legally exempt from disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended recipient 
of this message, you are not authorized to read, print, retain, copy or 
disseminate any part of this message. If you have received this message in 
error, please destroy and delete all copies and notify the sender by return 
e-mail. Regardless of content, this e-mail shall not operate to bind DXC 
Technology Company or any of its affiliates to any order or other contract 
unless pursuant to explicit written agreement or government initiative 
expressly permitting the use of e-mail for such purpose.


Re: Multi CPU interlock question

2019-01-14 Thread Ngan, Robert
Protection for readers is only necessary when the storage in question is larger 
than a doubleword.
For quadwords, you can use either LPQ or PLO function 3 (CLX).

Robert Ngan
HCL Technologies

-Original Message-
From: IBM Mainframe Assembler List  On Behalf 
Of Joe Owens
Sent: Thursday, January 10, 2019 04:28
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Multi CPU interlock question

Yes, your illustration is exactly what I was concerned about. My instinct was 
CS was just about updaters of storage, and not readers, so there must be some 
other type of protection for readers.

Thanks, Joe

DXC Technology Company - Headquarters: 1775 Tysons Boulevard, Tysons, Virginia 
22102, USA.
DXC Technology Company -- This message is transmitted to you by or on behalf of 
DXC Technology Company or one of its affiliates.  It is intended exclusively 
for the addressee.  The substance of this message, along with any attachments, 
may contain proprietary, confidential or privileged information or information 
that is otherwise legally exempt from disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended recipient 
of this message, you are not authorized to read, print, retain, copy or 
disseminate any part of this message. If you have received this message in 
error, please destroy and delete all copies and notify the sender by return 
e-mail. Regardless of content, this e-mail shall not operate to bind DXC 
Technology Company or any of its affiliates to any order or other contract 
unless pursuant to explicit written agreement or government initiative 
expressly permitting the use of e-mail for such purpose.


Re: Multi CPU interlock question

2019-01-11 Thread Seymour J Metz
> But in practice there existed no machines with more than one CPU

What was the IBM 9020, chopped liver? Bendix G-21? Burroughs B5000? Burroughs 
D825? BULL Gamma 60? CDC 6600? GE 635?Arguably, the Honeywell H-800? UNIVAC 
1108?


--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3


From: IBM Mainframe Assembler List  on behalf 
of Tony Harminc 
Sent: Thursday, January 10, 2019 5:08 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Multi CPU interlock question

On Thu, 10 Jan 2019 at 13:44, Paul Gilmartin
<0014e0e4a59b-dmarc-requ...@listserv.uga.edu> wrote:

> Ever since I learned of the NIL and OIL macros, I have wondered why MVI
> and STC did not have the same exposure as OI and NI:
>
> The storage bus has been at least 16 bits wide as long as anyone can remenber.

I remember the 360/30 (used one in high school), and I'm pretty sure
it had an 8-bit bus. But I could be wrong.

> So two processers both fetch the same 16-bit frame.  One
> updates the even half; the other into the odd half.  Both store.
> Last guy wins (sort of).

Yup. But in practice there existed no machines with more than one CPU
until the 360/65MP, and I don't think there ever existed a 360 or 370
with more than one processor that had anything smaller than a 64-bit
bus. More to the point, I'm not sure there was a strong definition of
storage access by multiple processors until fairly late in the S/370
POO days.

> Are NI and OI older than CS?

NI and OI are original with S/360. CS and CDS came only with DAT in
S/370. Lynn Wheeler has written extensively here on CS and its origins
and the internal battles associated with it.

>Was there then a precursor of NIL using test-and-set?

I don't believe so. (That would be OIL, wouldn't it? TS turns bits ON.)

Tony H.


Re: Multi CPU interlock question

2019-01-11 Thread Paul Gilmartin
On 2019-01-10, at 15:53:59, gah wrote:
> 
>> So two processers both fetch the same 16-bit frame. One
>> updates the even half; the other into the odd half. Both store.
>> Last guy wins (sort of).
> 
> With only one processor, you still have I/O to consider.
>  
I understand that on some systems the channels stole microcycles from the CPU,
so if CPU instructions were not micro-interruptable the serialization was
provided.

Bit-spinning for I/O was frequently used, but risky for the naive.

> And for the 360 and 370, the interval timer.  Tradition was to read
> the old value and replace it with.a new value with one MVC.
>  
i've heard of that.  It required dedicating some storage loctions adjacent
to the interval timer.  Was the interval timer always updated by an
interrupt handler?

But I'm still puzzled as to why on some systems with a 16-bit bus
NIL and OIL were required for serialization but MVI and STC had no
such hazard.

-- gil


Re: Multi CPU interlock question

2019-01-10 Thread Tony Harminc
On Thu, 10 Jan 2019 at 17:54, gah  wrote:

> > I remember the 360/30 (used one in high school), and I'm pretty sure
> > it had an 8-bit bus. But I could be wrong.
>
> The 360/30 has an 8 bit bus and 8 bit ALU.

Thanks for confirming my, uh memory.

> > So two processers both fetch the same 16-bit frame. One
> > updates the even half; the other into the odd half. Both store.
> > Last guy wins (sort of).
>
> With only one processor, you still have I/O to consider.

But I/O is generally not covered by the same strong block concurrency
rules that apply to other CPUs. Weaker rules apply in most cases.

> And for the 360 and 370, the interval timer.  Tradition was to read
> the old value and replace it with.a new value with one MVC.

More than tradition - documented in the S/360 POO as the only certain
way of not missing a timer update. But that scheme was not to protect
against concurrent storage access by another processor (whether CPU or
channel) *during instruction execution*, but to avoid a timer update
from occuring *between* instructions, as could happen if, say,
Load/Store were used instead of MVC.

Tony H.


Re: Multi CPU interlock question

2019-01-10 Thread gah
>> The storage bus has been at least 16 bits wide as long as anyone can 
>> remenber.


> I remember the 360/30 (used one in high school), and I'm pretty sure
> it had an 8-bit bus. But I could be wrong.

The 360/30 has an 8 bit bus and 8 bit ALU.

The 360/40 has a 16 bit bus, but still 8 bit ALU.  Memory writes can be
eight or 16 bits.

The 360/20 has 8 bit memory, but can write four bit units.
Makes decimal instructions easier.  The ALU is four bits wide,
but can only add or subtract one.  I had one running at the 
Living Computer Museum a few years ago, but it isn’t running now.

The museum is working on getting a 360/30 running, but so far it isn’t.

> So two processers both fetch the same 16-bit frame. One
> updates the even half; the other into the odd half. Both store.
> Last guy wins (sort of).

With only one processor, you still have I/O to consider.

And for the 360 and 370, the interval timer.  Tradition was to read
the old value and replace it with.a new value with one MVC.  


Re: Multi CPU interlock question

2019-01-10 Thread Tony Harminc
On Thu, 10 Jan 2019 at 13:44, Paul Gilmartin
<0014e0e4a59b-dmarc-requ...@listserv.uga.edu> wrote:

> Ever since I learned of the NIL and OIL macros, I have wondered why MVI
> and STC did not have the same exposure as OI and NI:
>
> The storage bus has been at least 16 bits wide as long as anyone can remenber.

I remember the 360/30 (used one in high school), and I'm pretty sure
it had an 8-bit bus. But I could be wrong.

> So two processers both fetch the same 16-bit frame.  One
> updates the even half; the other into the odd half.  Both store.
> Last guy wins (sort of).

Yup. But in practice there existed no machines with more than one CPU
until the 360/65MP, and I don't think there ever existed a 360 or 370
with more than one processor that had anything smaller than a 64-bit
bus. More to the point, I'm not sure there was a strong definition of
storage access by multiple processors until fairly late in the S/370
POO days.

> Are NI and OI older than CS?

NI and OI are original with S/360. CS and CDS came only with DAT in
S/370. Lynn Wheeler has written extensively here on CS and its origins
and the internal battles associated with it.

>Was there then a precursor of NIL using test-and-set?

I don't believe so. (That would be OIL, wouldn't it? TS turns bits ON.)

Tony H.


Re: Multi CPU interlock question

2019-01-10 Thread Seymour J Metz
> The storage bus has been at least 16 bits wide as long as anyone can 
> remenber. 

Well, I can't remenber at all, but I can remember shorter busses. Some of us 
can remember farther back that others, and there was one subscriber to IBM-MAIN 
who was prominent in the 1950s. 


--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3


From: IBM Mainframe Assembler List  on behalf 
of Paul Gilmartin <0014e0e4a59b-dmarc-requ...@listserv.uga.edu>
Sent: Thursday, January 10, 2019 1:44 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Multi CPU interlock question

On 2019-01-10, at 09:46:38, Peter Relson wrote:
>
> Many of us grew up with machines where OI and some other instructions were
> done in multiple stages, and at just the wrong point could result in lost
> data if CS/CDS was being used. You could not mix and match.
>
Ever since I learned of the NIL and OIL macros, I have wondered why MVI
and STC did not have the same exposure as OI and NI:

The storage bus has been at least 16 bits wide as long as anyone can
remenber.  So two processers both fetch the same 16-bit frame.  One
updates the even half; the other into the odd half.  Both store.
Last guy wins (sort of).

Are NI and OI older than CS?  Was there then a precursor of NIL using
test-and-set?

PDP-6 had a peculiar read-pause-write memory cycle, bypassing the restore
phase of core memory access.  This was a performance benefit and serialized
memory updating instructions.

-- gil


Re: Multi CPU interlock question

2019-01-10 Thread Paul Gilmartin
On 2019-01-10, at 09:46:38, Peter Relson wrote: 
> 
> Many of us grew up with machines where OI and some other instructions were 
> done in multiple stages, and at just the wrong point could result in lost 
> data if CS/CDS was being used. You could not mix and match. 
>  
Ever since I learned of the NIL and OIL macros, I have wondered why MVI
and STC did not have the same exposure as OI and NI:

The storage bus has been at least 16 bits wide as long as anyone can
remenber.  So two processers both fetch the same 16-bit frame.  One
updates the even half; the other into the odd half.  Both store.
Last guy wins (sort of).

Are NI and OI older than CS?  Was there then a precursor of NIL using
test-and-set?

PDP-6 had a peculiar read-pause-write memory cycle, bypassing the restore
phase of core memory access.  This was a performance benefit and serialized
memory updating instructions.

-- gil


Re: Multi CPU interlock question

2019-01-10 Thread Tom Marchant
On Thu, 10 Jan 2019 11:46:38 -0500, Peter Relson wrote:

>If on a new enough z/OS, you can rely on the interlocked access facilities 
>being available.

Interlocked access facility 2 was first described in the -9 edition of the 
zArchitecture Principles of Operation, corresponding to the zEC12. 
z/OS 2.3 requires at least a z12, and is the first to have that requirement.

-- 
Tom Marchant


Re: Multi CPU interlock question

2019-01-10 Thread Tony Harminc
On Thu, 10 Jan 2019 at 01:07, Jim Mulder  wrote:
>
>   The coordination with other CPUs is in getting exclusive control of
> the storage operand cache line.  CS and ST both have to do that, so they
> would be similar in performance in that regard.

But CS and friends carry the perhaps quite high penalty of invoking a
serialization operation both before and after. Unlike CS, the result
from ST can presumably be delayed indefinitely before being actually
updated in storage.

Tony H.


Re: Multi CPU interlock question

2019-01-10 Thread Tony Harminc
On Wed, 9 Jan 2019 at 09:25, Mark Boonie  wrote:
>
> On all z/Architecture CPUs, MVC will appear fullword-concurrent provided
> both the source and target operands are fullword-aligned.

You're not wrong, but the commitments for MVC (and a few similar
instructions) are quite a bit stronger than that, and have been so for
a very long time.

Tony H.


Re: Multi CPU interlock question

2019-01-10 Thread Peter Relson
Surely the cost of CS/CDS is far less than the cost of PLO. In my opinion, 
if you know that you can use transactional execution (which would be the 
case if you're on z/OS 2.3 or later *and* you know that you are not 
running z/OS under z/VM 6.3 or earlier), then you should never use PLO. A 
TBEGINC transaction, in particular, is so much more understandable and 
easy to code. (Unlike TBEGIN, assuming you can meet the constraints, you 
don't need a fallback path). Unlike PLO which serializes only against 
other PLO's (it does not serialize against CS/CDS), a transaction 
serializes against everything, in effect.

I did not see mention of the interlocked access facilities. These are what 
makes things like "OI" work without needing to serialize via CS/CDS. 

Many of us grew up with machines where OI and some other instructions were 
done in multiple stages, and at just the wrong point could result in lost 
data if CS/CDS was being used. You could not mix and match. 

If on a new enough z/OS, you can rely on the interlocked access facilities 
being available.

Peter Relson
z/OS Core Technology Design


Re: Multi CPU interlock question

2019-01-10 Thread Charles Mills
Whose illustration? (Serious question -- don't know what you are replying to.)

If I understand you correctly there is no "protection" needed for readers. No 
matter how many readers read a word in storage, they will all retrieve the same 
value, until it changes.

Charles

-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Joe Owens
Sent: Thursday, January 10, 2019 2:28 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Multi CPU interlock question

Yes, your illustration is exactly what I was concerned about. My instinct was 
CS was just about updaters of storage, and not readers, so there must be some 
other type of protection for readers.

Thanks, Joe


Re: Multi CPU interlock question

2019-01-10 Thread Joe Owens
Yes, your illustration is exactly what I was concerned about. My instinct was 
CS was just about updaters of storage, and not readers, so there must be some 
other type of protection for readers.

Thanks, Joe


Re: Multi CPU interlock question

2019-01-09 Thread Jim Mulder
  The coordination with other CPUs is in getting exclusive control of
the storage operand cache line.  CS and ST both have to do that, so they
would be similar in performance in that regard. 

Jim Mulder z/OS Diagnosis, Design, Development, Test  IBM Corp. 
Poughkeepsie NY

> From: "Charles Mills" 
> To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
> Date: 01/10/2019 01:04 AM
> Subject: Re: Multi CPU interlock question
> Sent by: "IBM Mainframe Assembler List" 


> Avoid CS if you don't need it (and in this case you don't). CS is 
> expensive because every CPU has to sit up and take notice.


Re: Multi CPU interlock question

2019-01-09 Thread Jim Mulder
  CS/CSD/CSG/CSDG  would be considerably faster than PLO. 

  CS is implemented in hardware.  PLO is implemented in millicode.
 The millicode obtains a spin lock, performs the requested operations, 
and releases the lock.

Jim Mulder z/OS Diagnosis, Design, Development, Test  IBM Corp. 
Poughkeepsie NY


> From: "Charles Mills" 
> To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
> Date: 01/10/2019 12:59 AM
> Subject: Re: Multi CPU interlock question
> Sent by: "IBM Mainframe Assembler List" 

> 
> Take your pick of answers:
> 
> 1. I don't know.
> 2. It probably depends on the model.
> 3. It probably depends on what exactly is going on with the cache,
> contention, etc.
> 4. I am going to guess the CS is cheaper than PLO because it is simpler.
> 
> Charles
> 
> 
> -Original Message-
> From: IBM Mainframe Assembler List [
mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
> On Behalf Of Seymour J Metz
> Sent: Wednesday, January 9, 2019 11:52 AM
> To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
> Subject: Re: Multi CPU interlock question
> 
> How does the cost of CS/CDS compare to PLO?
> 


Re: Multi CPU interlock question

2019-01-09 Thread Charles Mills
Take your pick of answers:

1. I don't know.
2. It probably depends on the model.
3. It probably depends on what exactly is going on with the cache,
contention, etc.
4. I am going to guess the CS is cheaper than PLO because it is simpler.

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Seymour J Metz
Sent: Wednesday, January 9, 2019 11:52 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Multi CPU interlock question

How does the cost of CS/CDS compare to PLO?


Re: Multi CPU interlock question

2019-01-09 Thread Seymour J Metz
How does the cost of CS/CDS compare to PLO?


--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3


From: IBM Mainframe Assembler List  on behalf 
of Charles Mills 
Sent: Wednesday, January 9, 2019 2:16 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Multi CPU interlock question

Others have given you good answers.

Avoid CS if you don't need it (and in this case you don't). CS is expensive 
because every CPU has to sit up and take notice.

Answering your question in more detail, if one (or more) updaters is for 
example alternately storing x'' and x'' then readers will 
always see one of those two values, never anything like x'', assuming 
fullword alignment.

One classic use of CS is with multiple updaters of a count: more than one 
updater doing L/AHI/ST. Doing it that way rather than with a CS loop will cause 
some increments to get lost, because one CPU's L may interleave with another 
CPU's update sequence. (And my example is a little out of date: more modern 
CPUs have a single "increment word in storage with block concurrency" 
instruction. CS is still valuable for older CPUs and for other applications 
besides the shared counter, such as a shared queue header.)

In your case I would prefer ST to MVC. For what it's worth, ST has behaved this 
way "forever"; not sure about MVC if your code ever were to run on a much older 
model.

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Joe Owens
Sent: Wednesday, January 9, 2019 8:11 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Multi CPU interlock question

Hi Everyone, thanks for the answers. I did read the POP, but obviously not hard 
enough. (It is never an easy read).

The block concurrency section explains it perfectly. The fullword is aligned so 
all should be good. The question about MVC was just out of interest.


Re: Multi CPU interlock question

2019-01-09 Thread Charles Mills
Others have given you good answers.

Avoid CS if you don't need it (and in this case you don't). CS is expensive 
because every CPU has to sit up and take notice.

Answering your question in more detail, if one (or more) updaters is for 
example alternately storing x'' and x'' then readers will 
always see one of those two values, never anything like x'', assuming 
fullword alignment.

One classic use of CS is with multiple updaters of a count: more than one 
updater doing L/AHI/ST. Doing it that way rather than with a CS loop will cause 
some increments to get lost, because one CPU's L may interleave with another 
CPU's update sequence. (And my example is a little out of date: more modern 
CPUs have a single "increment word in storage with block concurrency" 
instruction. CS is still valuable for older CPUs and for other applications 
besides the shared counter, such as a shared queue header.)

In your case I would prefer ST to MVC. For what it's worth, ST has behaved this 
way "forever"; not sure about MVC if your code ever were to run on a much older 
model.

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Joe Owens
Sent: Wednesday, January 9, 2019 8:11 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Multi CPU interlock question

Hi Everyone, thanks for the answers. I did read the POP, but obviously not hard 
enough. (It is never an easy read).

The block concurrency section explains it perfectly. The fullword is aligned so 
all should be good. The question about MVC was just out of interest.


Re: Multi CPU interlock question

2019-01-09 Thread Mark Boonie
> Does CS tell the right story, or does CS itself require alignment?

The updated storage area is required to be fullword aligned, which is why 
you could/should just skip the CS.

- mb


Re: Multi CPU interlock question

2019-01-09 Thread Joe Owens
Hi Everyone, thanks for the answers. I did read the POP, but obviously not hard 
enough. (It is never an easy read).

The block concurrency section explains it perfectly. The fullword is aligned so 
all should be good. The question about MVC was just out of interest.

Thanks.

Joe


Re: Multi CPU interlock question

2019-01-09 Thread Martin Truebner
>> Does CS tell the right story, or does CS itself require alignment?

yea it does- from POP:

Otherwise, a specification exception is recognized

Martin


Re: Multi CPU interlock question

2019-01-09 Thread Paul Gilmartin
On 2019-01-09, at 07:34:27, Mark Boonie wrote:

> Speaking for myself, I would consider it "proper" to align the operands 
> and skip the CS -- the architecture guarantees the behavior, so doing the 
> update with CS seems like overkill.  (I'd probably also add a comment in 
> the code for each operand pointing out the reason for the alignment 
> requirement).
> 
> If alignment can't be ensured (e.g., you're passed some random address by 
> a caller) then that's a different story.
>  
Does CS tell the right story, or does CS itself require alignment?


On 2019-01-09, at 05:06:09, Rob van der Heij wrote:
> 
> The CPU cache is the other motivation to stay away from heavy use of shared
> variables but keep things per CPU with a low-frequency distribution
> process. When you keep the per-CPU objects  far enough apart, you avoid
> frequent invalidating cache lines on the sibling CPU.
> 

Is there any hardware or software support for this?  Operands nicely cache
line separated today might be in the same line on next year's model.

-- gil


Re: Multi CPU interlock question

2019-01-09 Thread Mark Boonie
Speaking for myself, I would consider it "proper" to align the operands 
and skip the CS -- the architecture guarantees the behavior, so doing the 
update with CS seems like overkill.  (I'd probably also add a comment in 
the code for each operand pointing out the reason for the alignment 
requirement).

If alignment can't be ensured (e.g., you're passed some random address by 
a caller) then that's a different story.

- mb

IBM Mainframe Assembler List  wrote on 
01/09/2019 06:17:29 AM:

> the load (or store) will always do it on a fullwordBUT to do it
> proper would require doing it with a CS.


Re: Multi CPU interlock question

2019-01-09 Thread Mark Boonie
On all z/Architecture CPUs, MVC will appear fullword-concurrent provided 
both the source and target operands are fullword-aligned.

- mb

IBM Mainframe Assembler List  wrote on 
01/09/2019 06:17:29 AM:

> MVC might on some CPUs appear to do it 4 byte wise (or other multiples
> thereof) - 


Re: Multi CPU interlock question

2019-01-09 Thread Rob van der Heij
On Wed, 9 Jan 2019 at 12:17, Martin Truebner  wrote:

> Joe,
>
> Robs is answer is already saying everything but let me give you
> some more details.
>
> the load (or store) will always do it on a fullwordBUT to do it
> proper would require doing it with a CS.
>

As long as the operand is properly aligned...


> MVC might on some CPUs appear to do it 4 byte wise (or other multiples
> thereof) -
>
> And again the question: why not do it proper in a code-segment
> that is fully aware of the multi-CPU environment.
>

The CPU cache is the other motivation to stay away from heavy use of shared
variables but keep things per CPU with a low-frequency distribution
process. When you keep the per-CPU objects  far enough apart, you avoid
frequent invalidating cache lines on the sibling CPU.

Rob


Re: Multi CPU interlock question

2019-01-09 Thread Martin Truebner
Joe,

Robs is answer is already saying everything but let me give you
some more details.

the load (or store) will always do it on a fullwordBUT to do it
proper would require doing it with a CS.

MVC might on some CPUs appear to do it 4 byte wise (or other multiples
thereof) - 

And again the question: why not do it proper in a code-segment
that is fully aware of the multi-CPU environment.

Or am I totally of the track and there is in fact a LOAD AND STORE
instruction

Martin


Re: Multi CPU interlock question

2019-01-09 Thread Rob van der Heij
On Wed, 9 Jan 2019 at 11:29, Joe Owens  wrote:

> A 4 byte address field in virtual storage has one updater and many readers
>
> If using load and store instuctions, will the readers always see a
> complete (valid) address, or could a CPU see a partially updated field
> while a store is in progress on another CPU?
>
> Is the answer any different for other instructions, like MVC?
>
> The terminology in the Principles of Operation to look for is
"block-concurrent references"  (Chapter 5). That section explains MVC and
the conditions under which it appears to do a double word at a time.

Rob