[gem5-users] Re: Question Regarding L1 Cache Transient States handling Load Hit in Ruby MOESI CMP Directory protocol
Hello Jason, Thanks a lot for your reply! I'm indeed aware that by the cache subsystem doing this, it will give CPU the view of the load as happening before the store. However, in view of the CPU commit order (the instruction order seen by the CPU), the store must have committed before the load, since the store must have been committed before its store miss request could reach L1 cache, thus causing the cacheline state to switch to SM state. I'm actually wondering if the difference between: a. the CPU store->load commit order, and b. the CPU's view of the order of these instructions accessing memory will raise a violation. Thanks again, Zhang Zhiyuan -Original Messages- From:"Jason Lowe-Power via gem5-users" Sent Time:2023-03-22 23:20:25 (Wednesday) To: "The gem5 Users mailing list" Cc: "章志元" <18300750...@fudan.edu.cn>, "Jason Lowe-Power" Subject: [gem5-users] Re: Question Regarding L1 Cache Transient States handling Load Hit in Ruby MOESI CMP Directory protocol Hello, This is a great question! The short answer is I believe that the coherence protocol is correct. (Though, there could always be unexpected bugs.) The slightly longer answer: You are probably seeing that the store happens before the load in "real" time. However, in the processors' view (i.e., *logical* time), the load is actually happening before the store. As long as the processors are correctly implementing their consistency models (e.g., if they are sequentially consistent then they don't allow any reorderings between load and store instructions within each thread), then as long as it *appears* that the load completed before the store, then it's a correct implementation. To put it another way, if the thread doing the load cannot tell that the load happened after the store (in real time) then it is safe. It's something like the Lamport Clock: https://en.wikipedia.org/wiki/Lamport_timestamp We have a saying in English: "If a tree falls in a forest and no one is there to hear it, does it make a sound?" Similarly, if a thread does a store to an address, but no other thread can tell what the ordering needs to be, it's OK to reorder it :). Cheers, Jason On Tue, Mar 21, 2023 at 11:50 PM 章志元 via gem5-users wrote: Hi all, I've been looking into the default MOESI CMP Directory Protocol, and it came to my attention that, regarding SM states in L1 Cache (Transient state during a Shared to Exclusive Upgrade due to a store miss), when a load arrives from the local core (which hits since the Cache is technically still in Shared state), the cache will return the old Shared Datablk as its load hit result. Will it cause incoherence issues in memory ordering between the core and the memory system, since the CPU commits the store first and then commit the load returning the old data, but the memory system sees the load hit finish first, and then see the GETX finish? Also I already speculate that such loads will probably not arrive at the L1 Cache controller, since it would be blocked or forwarded with newer data due to outstanding stores in the lsq or the mandatory queue. I'm just wondering if the cache protocol itself is solid in terms of request ordering. Thanks in advance! Zhang Zhiyuan 2023.3.22 -- 姓名:章志元 手机:17717877306 邮箱:zhiyuanzhan...@fudan.edu.cn ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org -- 姓名:章志元 手机:17717877306 邮箱:zhiyuanzhan...@fudan.edu.cn ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Re: Retired instructions versus ticks
Thank you, Eliot. I think this would give me what I need. Priyanka. On Wed, Mar 22, 2023, 11:55 AM Eliot Moss wrote: > On 3/22/2023 12:09 PM, Priyanka Ankolekar via gem5-users wrote: > > > > Regarding the other part of your email: > > Let me begin by saying I am a novice to both RISCV and gem5. > > I have a RISCV RTL with a certain config. I have set up gem5 to match > that configuration. I want to > > make sure that they are indeed equivalent so that I can run some > experiments on gem5 (instead of on > > RTL) since that would be faster and easier. In order to establish that > equivalence, I am running a > > simple benchmark test on both RTL and gem5. The final numbers like > DMIPS/MHz etc match fairly > > closely. But I want to dig further to see if the retired instruction/s > at a given tick, for both > > these setups, are also a close match. > > Hence the questions. > > My suggestion would be to: > > - Read the CSR at points of interest - one hopes not *too* many points to > avoid being overwhelmed > with output. Do this in gem5 and in your RTL. > > - Add code to gem5 to print the value the tick and the value read when the > CSR is read. A DNPRINTF > call would serve nicely. grep can help you find where the right code is > using the register name. > > Would this do the trick? > > Best - EM > ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Re: Retired instructions versus ticks
On 3/22/2023 12:09 PM, Priyanka Ankolekar via gem5-users wrote: Regarding the other part of your email: Let me begin by saying I am a novice to both RISCV and gem5. I have a RISCV RTL with a certain config. I have set up gem5 to match that configuration. I want to make sure that they are indeed equivalent so that I can run some experiments on gem5 (instead of on RTL) since that would be faster and easier. In order to establish that equivalence, I am running a simple benchmark test on both RTL and gem5. The final numbers like DMIPS/MHz etc match fairly closely. But I want to dig further to see if the retired instruction/s at a given tick, for both these setups, are also a close match. Hence the questions. My suggestion would be to: - Read the CSR at points of interest - one hopes not *too* many points to avoid being overwhelmed with output. Do this in gem5 and in your RTL. - Add code to gem5 to print the value the tick and the value read when the CSR is read. A DNPRINTF call would serve nicely. grep can help you find where the right code is using the register name. Would this do the trick? Best - EM ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Re: Retired instructions versus ticks
Regarding the other part of your email: Let me begin by saying I am a novice to both RISCV and gem5. I have a RISCV RTL with a certain config. I have set up gem5 to match that configuration. I want to make sure that they are indeed equivalent so that I can run some experiments on gem5 (instead of on RTL) since that would be faster and easier. In order to establish that equivalence, I am running a simple benchmark test on both RTL and gem5. The final numbers like DMIPS/MHz etc match fairly closely. But I want to dig further to see if the retired instruction/s at a given tick, for both these setups, are also a close match. Hence the questions. On Wed, Mar 22, 2023 at 8:23 AM Eliot Moss wrote: > On 3/22/2023 11:11 AM, Priyanka Ankolekar wrote: > > Sorry, I should have clarified. I am using the RISCV ISA in gem5. > > (As you could have done,) I checked the gem5 sources, > and it *does* model that register, returning totalInsts > as gem5 calculates that. Presumably that is the same as > statistics will give you, but you could read it on the > fly. Not sure if the instruction to read that is > privileged, though if it is, you could (as a hack) > change gem5 to allow it to be read in user mode. > > Cheers - EM > > PS: You did not respond to the other part of that I > said: What is it that you are really trying to do that > the previous suggestions do not satisfy? Cheers - EM > ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Re: Retired instructions versus ticks
On 3/22/2023 11:11 AM, Priyanka Ankolekar wrote: Sorry, I should have clarified. I am using the RISCV ISA in gem5. (As you could have done,) I checked the gem5 sources, and it *does* model that register, returning totalInsts as gem5 calculates that. Presumably that is the same as statistics will give you, but you could read it on the fly. Not sure if the instruction to read that is privileged, though if it is, you could (as a hack) change gem5 to allow it to be read in user mode. Cheers - EM PS: You did not respond to the other part of that I said: What is it that you are really trying to do that the previous suggestions do not satisfy? Cheers - EM ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Re: Question Regarding L1 Cache Transient States handling Load Hit in Ruby MOESI CMP Directory protocol
Hello, This is a great question! The short answer is I believe that the coherence protocol is correct. (Though, there could always be unexpected bugs.) The slightly longer answer: You are probably seeing that the store happens before the load in "real" time. However, in the processors' view (i.e., *logical* time), the load is actually happening before the store. As long as the processors are correctly implementing their consistency models (e.g., if they are sequentially consistent then they don't allow any reorderings between load and store instructions within each thread), then as long as it *appears* that the load completed before the store, then it's a correct implementation. To put it another way, if the thread doing the load cannot tell that the load happened after the store (in real time) then it is safe. It's something like the Lamport Clock: https://en.wikipedia.org/wiki/Lamport_timestamp We have a saying in English: "If a tree falls in a forest and no one is there to hear it, does it make a sound?" Similarly, if a thread does a store to an address, but no other thread can tell what the ordering needs to be, it's OK to reorder it :). Cheers, Jason On Tue, Mar 21, 2023 at 11:50 PM 章志元 via gem5-users wrote: > Hi all, > I've been looking into the default MOESI CMP Directory Protocol, and it > came to my attention that, regarding SM states in L1 Cache (Transient state > during a Shared to Exclusive Upgrade due to a store miss), when a load > arrives from the local core (which hits since the Cache is technically > still in Shared state), the cache will return the old Shared Datablk as its > load hit result. Will it cause incoherence issues in memory ordering > between the core and the memory system, since the CPU commits the store > first and then commit the load returning the old data, but the memory > system sees the load hit finish first, and then see the GETX finish? > Also I already speculate that such loads will probably not arrive at the > L1 Cache controller, since it would be blocked or forwarded with newer data > due to outstanding stores in the lsq or the mandatory queue. I'm just > wondering if the cache protocol itself is solid in terms of request > ordering. > Thanks in advance! > Zhang Zhiyuan > 2023.3.22 > -- > 姓名:章志元 > 手机:17717877306 > 邮箱:zhiyuanzhan...@fudan.edu.cn > > > ___ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-le...@gem5.org > ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Re: Retired instructions versus ticks
Sorry, I should have clarified. I am using the RISCV ISA in gem5. On Wed, Mar 22, 2023, 5:44 AM Eliot Moss wrote: > On 3/22/2023 8:37 AM, Priyanka Ankolekar via gem5-users wrote: > > Thank you, Eliot. > > > > Is there a way to probe minstret CSR to get the retired instructions? > > ?? What ISA are you talking about? > > I doubt that gem5 would model such details of a processor > architecture. Maybe you should back up a little and tell > us what you're really trying to do, since neither the > retired instructions stats nor a full trace seem to meet > your need ... > > Best - Eliot Moss > ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Re: Retired instructions versus ticks
On 3/22/2023 8:37 AM, Priyanka Ankolekar via gem5-users wrote: Thank you, Eliot. Is there a way to probe minstret CSR to get the retired instructions? ?? What ISA are you talking about? I doubt that gem5 would model such details of a processor architecture. Maybe you should back up a little and tell us what you're really trying to do, since neither the retired instructions stats nor a full trace seem to meet your need ... Best - Eliot Moss ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Re: Retired instructions versus ticks
Thank you, Eliot. Is there a way to probe minstret CSR to get the retired instructions? Thanks Priyanka. On Mon, Mar 20, 2023, 2:45 PM Eliot Moss wrote: > On 3/20/2023 5:05 PM, Priyanka Ankolekar via gem5-users wrote: > > Hi Eliot, > > (Picking this up again after a while.) :-) > > > > Thank you for your detailed answer. I was able to get a lot of useful > data points from these statistics. > > Is there a way to get what instruction was retired/committed and when > (tick)? > > That would be a full trace. For that, look into the various debug flags. > > Be prepared for a LOT of output!! > > Best - EM > ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
[gem5-users] Question Regarding L1 Cache Transient States handling Load Hit in Ruby MOESI CMP Directory protocol
Hi all, I've been looking into the default MOESI CMP Directory Protocol, and it came to my attention that, regarding SM states in L1 Cache (Transient state during a Shared to Exclusive Upgrade due to a store miss), when a load arrives from the local core (which hits since the Cache is technically still in Shared state), the cache will return the old Shared Datablk as its load hit result. Will it cause incoherence issues in memory ordering between the core and the memory system, since the CPU commits the store first and then commit the load returning the old data, but the memory system sees the load hit finish first, and then see the GETX finish? Also I already speculate that such loads will probably not arrive at the L1 Cache controller, since it would be blocked or forwarded with newer data due to outstanding stores in the lsq or the mandatory queue. I'm just wondering if the cache protocol itself is solid in terms of request ordering. Thanks in advance! Zhang Zhiyuan 2023.3.22 -- 姓名:章志元 手机:17717877306 邮箱:zhiyuanzhan...@fudan.edu.cn ___ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org