Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2017-01-18 Thread Tony Gutierrez


> On Jan. 5, 2017, 8:32 a.m., Tony Gutierrez wrote:
> > Is there anything holding this up from being shipped?
> 
> Andreas Hansson wrote:
> In my view the patch needs two things:
> 
> 1) Some thought around the design. I am still hoping there is a less 
> invasive way of accommodating the functionality, possibly with some changes 
> to the existing functions of the cache.
> 
> 2) A way to test it. This should preferably include both synthetic and 
> real use-cases. The memtester and memchecker may be a good starting point for 
> the synthetic part.
> 
> It would be great if someone could dig into these issues.
> 
> Steve Reinhardt wrote:
> It's been quite a while since I've looked at this... in fact I was going 
> to say that there might be a memory leak, but looking back at the patch 
> history I see I fixed that already in May 2015 :).
> 
> It's pretty easy to come up with a simple test case, just using C++11 
> atomic ops or pthread mutexes. Unfortunately the ones I had were on my AMD 
> work system that I had to give back, and it looks like I missed them when I 
> went to copy off useful non-proprietary stuff before I left. I know I didn't 
> spend much time writing tests though, so they should be easy to recreate.
> 
> As far as being less invasive, I think my reply to Andreas's comment of 
> May 7, 2015 is still relevant: given the current structure of the cache code, 
> I think the existing solution is really not that invasive. There isn't really 
> that much code being added here; most of it is comments. Generating a 
> synthetic request to allow allocating an MSHR so that we can accumulate the 
> deferred operations that happen while the lock is held is a little strange, 
> but it allows us to reuse the bulk of the code in recvTimingResp() with just 
> a few changes.
> 
> If the cache code were heavily restructured, we could probably layer it 
> in such a way that the fake packets would not be necessary, and thus the 
> invasiveness of the locked RMW support could be reduced. However, I started 
> down this path a long time ago (shortly after his comment), and after putting 
> a fair amount of work into the restructuring, I still didn't even have the 
> code working at all, so I gave up. Thus I think the net invasiveness of 
> restructuring the cache to reduce the amount of change required specifically 
> for this feature is much much larger than the invasiveness of this patch by 
> itself. The end result would probably be better code, but it's a huge amount 
> of work that I don't see anyone signing up for right now, so I think it's 
> better to put this change in as is for now, and put "restructuring the cache" 
> on the to-do list.
> 
> Nikos Nikoleris wrote:
> I agree, the implementation of locked instructions wouldn't be trivial 
> with the current structure of the cache. It would be nice if we didn't have 
> to abuse the MSHR in that way, it makes it hard to reason about memory leaks, 
> and it can't solve the problem for atomic and functional accesses.
> 
> Wouldn't it be easier to implement this in the same way as far atomics? 
> If I understand correctly the semantics of locked instructions, it would be 
> significantly simpler if we could perform these operations with a single RMW 
> packet rather than two separate, a read and then a write. Wouldn't the 
> AtomicOpFunctor in the packet class be sufficient?
> 
> Tony would that work for you?
> 
> Steve Reinhardt wrote:
> Thanks for the input, Nikos. While it would be a lot simpler to do the 
> atomic ops in the cache using functors, x86 lets you put a LOCK prefix in 
> front of a fairly long list of instructions, and then you have to combine 
> that with all the different data sizes that each instruction supports. Plus 
> it would be a pretty bi disruption to the whole CPU ISA definition to create 
> and plumb those packets through to the memory interface, and it would make 
> the locked instructions copmletely different from their unlocked 
> counterparts. These are all the reasons we didn't do it this way to begin 
> with, though in retrospect it's not as infeasible as perhaps it seemed up 
> front; just complicated and a little ugly. Not necessarily worse than the 
> current patch.
> 
> Actually though there is no problem with atomic and functional 
> accesses... it's easy to make them atomic, as you just issue the read and the 
> write within the same event tick/process() method (i.e., without going back 
> to the event queue). It's only timing accesses that are hard with this 
> interface.
> 
> Nikos Nikoleris wrote:
> I thought that the AtomicOpFunctor solves exactly this issue. While 
> decoding the instruction you pass a function pointer which operates on the 
> data in the memory system. Does the memory system need to be aware of the 
> specifics of the operation? If I understand correctly the cache will only 
> call the function 

Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2017-01-16 Thread Nikos Nikoleris


> On Jan. 5, 2017, 4:32 p.m., Tony Gutierrez wrote:
> > Is there anything holding this up from being shipped?
> 
> Andreas Hansson wrote:
> In my view the patch needs two things:
> 
> 1) Some thought around the design. I am still hoping there is a less 
> invasive way of accommodating the functionality, possibly with some changes 
> to the existing functions of the cache.
> 
> 2) A way to test it. This should preferably include both synthetic and 
> real use-cases. The memtester and memchecker may be a good starting point for 
> the synthetic part.
> 
> It would be great if someone could dig into these issues.
> 
> Steve Reinhardt wrote:
> It's been quite a while since I've looked at this... in fact I was going 
> to say that there might be a memory leak, but looking back at the patch 
> history I see I fixed that already in May 2015 :).
> 
> It's pretty easy to come up with a simple test case, just using C++11 
> atomic ops or pthread mutexes. Unfortunately the ones I had were on my AMD 
> work system that I had to give back, and it looks like I missed them when I 
> went to copy off useful non-proprietary stuff before I left. I know I didn't 
> spend much time writing tests though, so they should be easy to recreate.
> 
> As far as being less invasive, I think my reply to Andreas's comment of 
> May 7, 2015 is still relevant: given the current structure of the cache code, 
> I think the existing solution is really not that invasive. There isn't really 
> that much code being added here; most of it is comments. Generating a 
> synthetic request to allow allocating an MSHR so that we can accumulate the 
> deferred operations that happen while the lock is held is a little strange, 
> but it allows us to reuse the bulk of the code in recvTimingResp() with just 
> a few changes.
> 
> If the cache code were heavily restructured, we could probably layer it 
> in such a way that the fake packets would not be necessary, and thus the 
> invasiveness of the locked RMW support could be reduced. However, I started 
> down this path a long time ago (shortly after his comment), and after putting 
> a fair amount of work into the restructuring, I still didn't even have the 
> code working at all, so I gave up. Thus I think the net invasiveness of 
> restructuring the cache to reduce the amount of change required specifically 
> for this feature is much much larger than the invasiveness of this patch by 
> itself. The end result would probably be better code, but it's a huge amount 
> of work that I don't see anyone signing up for right now, so I think it's 
> better to put this change in as is for now, and put "restructuring the cache" 
> on the to-do list.
> 
> Nikos Nikoleris wrote:
> I agree, the implementation of locked instructions wouldn't be trivial 
> with the current structure of the cache. It would be nice if we didn't have 
> to abuse the MSHR in that way, it makes it hard to reason about memory leaks, 
> and it can't solve the problem for atomic and functional accesses.
> 
> Wouldn't it be easier to implement this in the same way as far atomics? 
> If I understand correctly the semantics of locked instructions, it would be 
> significantly simpler if we could perform these operations with a single RMW 
> packet rather than two separate, a read and then a write. Wouldn't the 
> AtomicOpFunctor in the packet class be sufficient?
> 
> Tony would that work for you?
> 
> Steve Reinhardt wrote:
> Thanks for the input, Nikos. While it would be a lot simpler to do the 
> atomic ops in the cache using functors, x86 lets you put a LOCK prefix in 
> front of a fairly long list of instructions, and then you have to combine 
> that with all the different data sizes that each instruction supports. Plus 
> it would be a pretty bi disruption to the whole CPU ISA definition to create 
> and plumb those packets through to the memory interface, and it would make 
> the locked instructions copmletely different from their unlocked 
> counterparts. These are all the reasons we didn't do it this way to begin 
> with, though in retrospect it's not as infeasible as perhaps it seemed up 
> front; just complicated and a little ugly. Not necessarily worse than the 
> current patch.
> 
> Actually though there is no problem with atomic and functional 
> accesses... it's easy to make them atomic, as you just issue the read and the 
> write within the same event tick/process() method (i.e., without going back 
> to the event queue). It's only timing accesses that are hard with this 
> interface.
> 
> Nikos Nikoleris wrote:
> I thought that the AtomicOpFunctor solves exactly this issue. While 
> decoding the instruction you pass a function pointer which operates on the 
> data in the memory system. Does the memory system need to be aware of the 
> specifics of the operation? If I understand correctly the cache will only 
> call the function 

Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2017-01-16 Thread Tony Gutierrez


> On Jan. 5, 2017, 8:32 a.m., Tony Gutierrez wrote:
> > Is there anything holding this up from being shipped?
> 
> Andreas Hansson wrote:
> In my view the patch needs two things:
> 
> 1) Some thought around the design. I am still hoping there is a less 
> invasive way of accommodating the functionality, possibly with some changes 
> to the existing functions of the cache.
> 
> 2) A way to test it. This should preferably include both synthetic and 
> real use-cases. The memtester and memchecker may be a good starting point for 
> the synthetic part.
> 
> It would be great if someone could dig into these issues.
> 
> Steve Reinhardt wrote:
> It's been quite a while since I've looked at this... in fact I was going 
> to say that there might be a memory leak, but looking back at the patch 
> history I see I fixed that already in May 2015 :).
> 
> It's pretty easy to come up with a simple test case, just using C++11 
> atomic ops or pthread mutexes. Unfortunately the ones I had were on my AMD 
> work system that I had to give back, and it looks like I missed them when I 
> went to copy off useful non-proprietary stuff before I left. I know I didn't 
> spend much time writing tests though, so they should be easy to recreate.
> 
> As far as being less invasive, I think my reply to Andreas's comment of 
> May 7, 2015 is still relevant: given the current structure of the cache code, 
> I think the existing solution is really not that invasive. There isn't really 
> that much code being added here; most of it is comments. Generating a 
> synthetic request to allow allocating an MSHR so that we can accumulate the 
> deferred operations that happen while the lock is held is a little strange, 
> but it allows us to reuse the bulk of the code in recvTimingResp() with just 
> a few changes.
> 
> If the cache code were heavily restructured, we could probably layer it 
> in such a way that the fake packets would not be necessary, and thus the 
> invasiveness of the locked RMW support could be reduced. However, I started 
> down this path a long time ago (shortly after his comment), and after putting 
> a fair amount of work into the restructuring, I still didn't even have the 
> code working at all, so I gave up. Thus I think the net invasiveness of 
> restructuring the cache to reduce the amount of change required specifically 
> for this feature is much much larger than the invasiveness of this patch by 
> itself. The end result would probably be better code, but it's a huge amount 
> of work that I don't see anyone signing up for right now, so I think it's 
> better to put this change in as is for now, and put "restructuring the cache" 
> on the to-do list.
> 
> Nikos Nikoleris wrote:
> I agree, the implementation of locked instructions wouldn't be trivial 
> with the current structure of the cache. It would be nice if we didn't have 
> to abuse the MSHR in that way, it makes it hard to reason about memory leaks, 
> and it can't solve the problem for atomic and functional accesses.
> 
> Wouldn't it be easier to implement this in the same way as far atomics? 
> If I understand correctly the semantics of locked instructions, it would be 
> significantly simpler if we could perform these operations with a single RMW 
> packet rather than two separate, a read and then a write. Wouldn't the 
> AtomicOpFunctor in the packet class be sufficient?
> 
> Tony would that work for you?
> 
> Steve Reinhardt wrote:
> Thanks for the input, Nikos. While it would be a lot simpler to do the 
> atomic ops in the cache using functors, x86 lets you put a LOCK prefix in 
> front of a fairly long list of instructions, and then you have to combine 
> that with all the different data sizes that each instruction supports. Plus 
> it would be a pretty bi disruption to the whole CPU ISA definition to create 
> and plumb those packets through to the memory interface, and it would make 
> the locked instructions copmletely different from their unlocked 
> counterparts. These are all the reasons we didn't do it this way to begin 
> with, though in retrospect it's not as infeasible as perhaps it seemed up 
> front; just complicated and a little ugly. Not necessarily worse than the 
> current patch.
> 
> Actually though there is no problem with atomic and functional 
> accesses... it's easy to make them atomic, as you just issue the read and the 
> write within the same event tick/process() method (i.e., without going back 
> to the event queue). It's only timing accesses that are hard with this 
> interface.
> 
> Nikos Nikoleris wrote:
> I thought that the AtomicOpFunctor solves exactly this issue. While 
> decoding the instruction you pass a function pointer which operates on the 
> data in the memory system. Does the memory system need to be aware of the 
> specifics of the operation? If I understand correctly the cache will only 
> call the function 

Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2017-01-09 Thread Nikos Nikoleris


> On Jan. 5, 2017, 4:32 p.m., Tony Gutierrez wrote:
> > Is there anything holding this up from being shipped?
> 
> Andreas Hansson wrote:
> In my view the patch needs two things:
> 
> 1) Some thought around the design. I am still hoping there is a less 
> invasive way of accommodating the functionality, possibly with some changes 
> to the existing functions of the cache.
> 
> 2) A way to test it. This should preferably include both synthetic and 
> real use-cases. The memtester and memchecker may be a good starting point for 
> the synthetic part.
> 
> It would be great if someone could dig into these issues.
> 
> Steve Reinhardt wrote:
> It's been quite a while since I've looked at this... in fact I was going 
> to say that there might be a memory leak, but looking back at the patch 
> history I see I fixed that already in May 2015 :).
> 
> It's pretty easy to come up with a simple test case, just using C++11 
> atomic ops or pthread mutexes. Unfortunately the ones I had were on my AMD 
> work system that I had to give back, and it looks like I missed them when I 
> went to copy off useful non-proprietary stuff before I left. I know I didn't 
> spend much time writing tests though, so they should be easy to recreate.
> 
> As far as being less invasive, I think my reply to Andreas's comment of 
> May 7, 2015 is still relevant: given the current structure of the cache code, 
> I think the existing solution is really not that invasive. There isn't really 
> that much code being added here; most of it is comments. Generating a 
> synthetic request to allow allocating an MSHR so that we can accumulate the 
> deferred operations that happen while the lock is held is a little strange, 
> but it allows us to reuse the bulk of the code in recvTimingResp() with just 
> a few changes.
> 
> If the cache code were heavily restructured, we could probably layer it 
> in such a way that the fake packets would not be necessary, and thus the 
> invasiveness of the locked RMW support could be reduced. However, I started 
> down this path a long time ago (shortly after his comment), and after putting 
> a fair amount of work into the restructuring, I still didn't even have the 
> code working at all, so I gave up. Thus I think the net invasiveness of 
> restructuring the cache to reduce the amount of change required specifically 
> for this feature is much much larger than the invasiveness of this patch by 
> itself. The end result would probably be better code, but it's a huge amount 
> of work that I don't see anyone signing up for right now, so I think it's 
> better to put this change in as is for now, and put "restructuring the cache" 
> on the to-do list.
> 
> Nikos Nikoleris wrote:
> I agree, the implementation of locked instructions wouldn't be trivial 
> with the current structure of the cache. It would be nice if we didn't have 
> to abuse the MSHR in that way, it makes it hard to reason about memory leaks, 
> and it can't solve the problem for atomic and functional accesses.
> 
> Wouldn't it be easier to implement this in the same way as far atomics? 
> If I understand correctly the semantics of locked instructions, it would be 
> significantly simpler if we could perform these operations with a single RMW 
> packet rather than two separate, a read and then a write. Wouldn't the 
> AtomicOpFunctor in the packet class be sufficient?
> 
> Tony would that work for you?
> 
> Steve Reinhardt wrote:
> Thanks for the input, Nikos. While it would be a lot simpler to do the 
> atomic ops in the cache using functors, x86 lets you put a LOCK prefix in 
> front of a fairly long list of instructions, and then you have to combine 
> that with all the different data sizes that each instruction supports. Plus 
> it would be a pretty bi disruption to the whole CPU ISA definition to create 
> and plumb those packets through to the memory interface, and it would make 
> the locked instructions copmletely different from their unlocked 
> counterparts. These are all the reasons we didn't do it this way to begin 
> with, though in retrospect it's not as infeasible as perhaps it seemed up 
> front; just complicated and a little ugly. Not necessarily worse than the 
> current patch.
> 
> Actually though there is no problem with atomic and functional 
> accesses... it's easy to make them atomic, as you just issue the read and the 
> write within the same event tick/process() method (i.e., without going back 
> to the event queue). It's only timing accesses that are hard with this 
> interface.

I thought that the AtomicOpFunctor solves exactly this issue. While decoding 
the instruction you pass a function pointer which operates on the data in the 
memory system. Does the memory system need to be aware of the specifics of the 
operation? If I understand correctly the cache will only call the function 
pointer and it won't need to decode the 

Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2017-01-06 Thread Steve Reinhardt


> On Jan. 5, 2017, 8:32 a.m., Tony Gutierrez wrote:
> > Is there anything holding this up from being shipped?
> 
> Andreas Hansson wrote:
> In my view the patch needs two things:
> 
> 1) Some thought around the design. I am still hoping there is a less 
> invasive way of accommodating the functionality, possibly with some changes 
> to the existing functions of the cache.
> 
> 2) A way to test it. This should preferably include both synthetic and 
> real use-cases. The memtester and memchecker may be a good starting point for 
> the synthetic part.
> 
> It would be great if someone could dig into these issues.
> 
> Steve Reinhardt wrote:
> It's been quite a while since I've looked at this... in fact I was going 
> to say that there might be a memory leak, but looking back at the patch 
> history I see I fixed that already in May 2015 :).
> 
> It's pretty easy to come up with a simple test case, just using C++11 
> atomic ops or pthread mutexes. Unfortunately the ones I had were on my AMD 
> work system that I had to give back, and it looks like I missed them when I 
> went to copy off useful non-proprietary stuff before I left. I know I didn't 
> spend much time writing tests though, so they should be easy to recreate.
> 
> As far as being less invasive, I think my reply to Andreas's comment of 
> May 7, 2015 is still relevant: given the current structure of the cache code, 
> I think the existing solution is really not that invasive. There isn't really 
> that much code being added here; most of it is comments. Generating a 
> synthetic request to allow allocating an MSHR so that we can accumulate the 
> deferred operations that happen while the lock is held is a little strange, 
> but it allows us to reuse the bulk of the code in recvTimingResp() with just 
> a few changes.
> 
> If the cache code were heavily restructured, we could probably layer it 
> in such a way that the fake packets would not be necessary, and thus the 
> invasiveness of the locked RMW support could be reduced. However, I started 
> down this path a long time ago (shortly after his comment), and after putting 
> a fair amount of work into the restructuring, I still didn't even have the 
> code working at all, so I gave up. Thus I think the net invasiveness of 
> restructuring the cache to reduce the amount of change required specifically 
> for this feature is much much larger than the invasiveness of this patch by 
> itself. The end result would probably be better code, but it's a huge amount 
> of work that I don't see anyone signing up for right now, so I think it's 
> better to put this change in as is for now, and put "restructuring the cache" 
> on the to-do list.
> 
> Nikos Nikoleris wrote:
> I agree, the implementation of locked instructions wouldn't be trivial 
> with the current structure of the cache. It would be nice if we didn't have 
> to abuse the MSHR in that way, it makes it hard to reason about memory leaks, 
> and it can't solve the problem for atomic and functional accesses.
> 
> Wouldn't it be easier to implement this in the same way as far atomics? 
> If I understand correctly the semantics of locked instructions, it would be 
> significantly simpler if we could perform these operations with a single RMW 
> packet rather than two separate, a read and then a write. Wouldn't the 
> AtomicOpFunctor in the packet class be sufficient?
> 
> Tony would that work for you?

Thanks for the input, Nikos. While it would be a lot simpler to do the atomic 
ops in the cache using functors, x86 lets you put a LOCK prefix in front of a 
fairly long list of instructions, and then you have to combine that with all 
the different data sizes that each instruction supports. Plus it would be a 
pretty bi disruption to the whole CPU ISA definition to create and plumb those 
packets through to the memory interface, and it would make the locked 
instructions copmletely different from their unlocked counterparts. These are 
all the reasons we didn't do it this way to begin with, though in retrospect 
it's not as infeasible as perhaps it seemed up front; just complicated and a 
little ugly. Not necessarily worse than the current patch.

Actually though there is no problem with atomic and functional accesses... it's 
easy to make them atomic, as you just issue the read and the write within the 
same event tick/process() method (i.e., without going back to the event queue). 
It's only timing accesses that are hard with this interface.


- Steve


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/#review9228
---


On April 14, 2016, 10:42 p.m., Steve Reinhardt wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> 

Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2017-01-06 Thread Nikos Nikoleris


> On Jan. 5, 2017, 4:32 p.m., Tony Gutierrez wrote:
> > Is there anything holding this up from being shipped?
> 
> Andreas Hansson wrote:
> In my view the patch needs two things:
> 
> 1) Some thought around the design. I am still hoping there is a less 
> invasive way of accommodating the functionality, possibly with some changes 
> to the existing functions of the cache.
> 
> 2) A way to test it. This should preferably include both synthetic and 
> real use-cases. The memtester and memchecker may be a good starting point for 
> the synthetic part.
> 
> It would be great if someone could dig into these issues.
> 
> Steve Reinhardt wrote:
> It's been quite a while since I've looked at this... in fact I was going 
> to say that there might be a memory leak, but looking back at the patch 
> history I see I fixed that already in May 2015 :).
> 
> It's pretty easy to come up with a simple test case, just using C++11 
> atomic ops or pthread mutexes. Unfortunately the ones I had were on my AMD 
> work system that I had to give back, and it looks like I missed them when I 
> went to copy off useful non-proprietary stuff before I left. I know I didn't 
> spend much time writing tests though, so they should be easy to recreate.
> 
> As far as being less invasive, I think my reply to Andreas's comment of 
> May 7, 2015 is still relevant: given the current structure of the cache code, 
> I think the existing solution is really not that invasive. There isn't really 
> that much code being added here; most of it is comments. Generating a 
> synthetic request to allow allocating an MSHR so that we can accumulate the 
> deferred operations that happen while the lock is held is a little strange, 
> but it allows us to reuse the bulk of the code in recvTimingResp() with just 
> a few changes.
> 
> If the cache code were heavily restructured, we could probably layer it 
> in such a way that the fake packets would not be necessary, and thus the 
> invasiveness of the locked RMW support could be reduced. However, I started 
> down this path a long time ago (shortly after his comment), and after putting 
> a fair amount of work into the restructuring, I still didn't even have the 
> code working at all, so I gave up. Thus I think the net invasiveness of 
> restructuring the cache to reduce the amount of change required specifically 
> for this feature is much much larger than the invasiveness of this patch by 
> itself. The end result would probably be better code, but it's a huge amount 
> of work that I don't see anyone signing up for right now, so I think it's 
> better to put this change in as is for now, and put "restructuring the cache" 
> on the to-do list.

I agree, the implementation of locked instructions wouldn't be trivial with the 
current structure of the cache. It would be nice if we didn't have to abuse the 
MSHR in that way, it makes it hard to reason about memory leaks, and it can't 
solve the problem for atomic and functional accesses.

Wouldn't it be easier to implement this in the same way as far atomics? If I 
understand correctly the semantics of locked instructions, it would be 
significantly simpler if we could perform these operations with a single RMW 
packet rather than two separate, a read and then a write. Wouldn't the 
AtomicOpFunctor in the packet class be sufficient?

Tony would that work for you?


- Nikos


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/#review9228
---


On April 15, 2016, 5:42 a.m., Steve Reinhardt wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2691/
> ---
> 
> (Updated April 15, 2016, 5:42 a.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> ---
> 
> Changeset 11444:8a1419dbbfa6
> ---
> mem: implement x86 locked accesses in timing-mode classic cache
> 
> Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
> use a combination of clearing permission bits and leaving
> an MSHR in place to prevent accesses & snoops from touching
> a locked block between the read and write parts of an locked
> RMW sequence.
> 
> 
> Diffs
> -
> 
>   src/mem/cache/cache.cc df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/cache/mshr.hh df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/packet.hh df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/packet.cc df24b9af42c72606f1fa8e5aa0502b53e81ea176 
> 
> Diff: http://reviews.gem5.org/r/2691/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Steve Reinhardt
> 
>

___
gem5-dev mailing list
gem5-dev@gem5.org

Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2017-01-05 Thread Steve Reinhardt


> On Jan. 5, 2017, 8:32 a.m., Tony Gutierrez wrote:
> > Is there anything holding this up from being shipped?
> 
> Andreas Hansson wrote:
> In my view the patch needs two things:
> 
> 1) Some thought around the design. I am still hoping there is a less 
> invasive way of accommodating the functionality, possibly with some changes 
> to the existing functions of the cache.
> 
> 2) A way to test it. This should preferably include both synthetic and 
> real use-cases. The memtester and memchecker may be a good starting point for 
> the synthetic part.
> 
> It would be great if someone could dig into these issues.

It's been quite a while since I've looked at this... in fact I was going to say 
that there might be a memory leak, but looking back at the patch history I see 
I fixed that already in May 2015 :).

It's pretty easy to come up with a simple test case, just using C++11 atomic 
ops or pthread mutexes. Unfortunately the ones I had were on my AMD work system 
that I had to give back, and it looks like I missed them when I went to copy 
off useful non-proprietary stuff before I left. I know I didn't spend much time 
writing tests though, so they should be easy to recreate.

As far as being less invasive, I think my reply to Andreas's comment of May 7, 
2015 is still relevant: given the current structure of the cache code, I think 
the existing solution is really not that invasive. There isn't really that much 
code being added here; most of it is comments. Generating a synthetic request 
to allow allocating an MSHR so that we can accumulate the deferred operations 
that happen while the lock is held is a little strange, but it allows us to 
reuse the bulk of the code in recvTimingResp() with just a few changes.

If the cache code were heavily restructured, we could probably layer it in such 
a way that the fake packets would not be necessary, and thus the invasiveness 
of the locked RMW support could be reduced. However, I started down this path a 
long time ago (shortly after his comment), and after putting a fair amount of 
work into the restructuring, I still didn't even have the code working at all, 
so I gave up. Thus I think the net invasiveness of restructuring the cache to 
reduce the amount of change required specifically for this feature is much much 
larger than the invasiveness of this patch by itself. The end result would 
probably be better code, but it's a huge amount of work that I don't see anyone 
signing up for right now, so I think it's better to put this change in as is 
for now, and put "restructuring the cache" on the to-do list.


- Steve


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/#review9228
---


On April 14, 2016, 10:42 p.m., Steve Reinhardt wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2691/
> ---
> 
> (Updated April 14, 2016, 10:42 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> ---
> 
> Changeset 11444:8a1419dbbfa6
> ---
> mem: implement x86 locked accesses in timing-mode classic cache
> 
> Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
> use a combination of clearing permission bits and leaving
> an MSHR in place to prevent accesses & snoops from touching
> a locked block between the read and write parts of an locked
> RMW sequence.
> 
> 
> Diffs
> -
> 
>   src/mem/cache/cache.cc df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/cache/mshr.hh df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/packet.hh df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/packet.cc df24b9af42c72606f1fa8e5aa0502b53e81ea176 
> 
> Diff: http://reviews.gem5.org/r/2691/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Steve Reinhardt
> 
>

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2017-01-05 Thread Andreas Hansson


> On Jan. 5, 2017, 4:32 p.m., Tony Gutierrez wrote:
> > Is there anything holding this up from being shipped?

In my view the patch needs two things:

1) Some thought around the design. I am still hoping there is a less invasive 
way of accommodating the functionality, possibly with some changes to the 
existing functions of the cache.

2) A way to test it. This should preferably include both synthetic and real 
use-cases. The memtester and memchecker may be a good starting point for the 
synthetic part.

It would be great if someone could dig into these issues.


- Andreas


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/#review9228
---


On April 15, 2016, 5:42 a.m., Steve Reinhardt wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2691/
> ---
> 
> (Updated April 15, 2016, 5:42 a.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> ---
> 
> Changeset 11444:8a1419dbbfa6
> ---
> mem: implement x86 locked accesses in timing-mode classic cache
> 
> Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
> use a combination of clearing permission bits and leaving
> an MSHR in place to prevent accesses & snoops from touching
> a locked block between the read and write parts of an locked
> RMW sequence.
> 
> 
> Diffs
> -
> 
>   src/mem/cache/cache.cc df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/cache/mshr.hh df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/packet.hh df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/packet.cc df24b9af42c72606f1fa8e5aa0502b53e81ea176 
> 
> Diff: http://reviews.gem5.org/r/2691/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Steve Reinhardt
> 
>

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2017-01-05 Thread Tony Gutierrez

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/#review9228
---


Is there anything holding this up from being shipped?

- Tony Gutierrez


On April 14, 2016, 10:42 p.m., Steve Reinhardt wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2691/
> ---
> 
> (Updated April 14, 2016, 10:42 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> ---
> 
> Changeset 11444:8a1419dbbfa6
> ---
> mem: implement x86 locked accesses in timing-mode classic cache
> 
> Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
> use a combination of clearing permission bits and leaving
> an MSHR in place to prevent accesses & snoops from touching
> a locked block between the read and write parts of an locked
> RMW sequence.
> 
> 
> Diffs
> -
> 
>   src/mem/cache/cache.cc df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/cache/mshr.hh df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/packet.hh df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/packet.cc df24b9af42c72606f1fa8e5aa0502b53e81ea176 
> 
> Diff: http://reviews.gem5.org/r/2691/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Steve Reinhardt
> 
>

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2016-04-14 Thread Steve Reinhardt


> On April 13, 2016, 7:38 a.m., Bjoern A. Zeeb wrote:
> > Can you please update it?
> > It neither applies cleanly nor does it compile afterwards.

Done.


- Steve


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/#review8197
---


On April 14, 2016, 10:42 p.m., Steve Reinhardt wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2691/
> ---
> 
> (Updated April 14, 2016, 10:42 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> ---
> 
> Changeset 11444:8a1419dbbfa6
> ---
> mem: implement x86 locked accesses in timing-mode classic cache
> 
> Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
> use a combination of clearing permission bits and leaving
> an MSHR in place to prevent accesses & snoops from touching
> a locked block between the read and write parts of an locked
> RMW sequence.
> 
> 
> Diffs
> -
> 
>   src/mem/cache/cache.cc df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/cache/mshr.hh df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/packet.hh df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/packet.cc df24b9af42c72606f1fa8e5aa0502b53e81ea176 
> 
> Diff: http://reviews.gem5.org/r/2691/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Steve Reinhardt
> 
>

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2016-04-14 Thread Steve Reinhardt

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/
---

(Updated April 14, 2016, 10:42 p.m.)


Review request for Default.


Changes
---

Updated again to work with latest head (rev 11443:df24b9af42c7)


Repository: gem5


Description (updated)
---

Changeset 11444:8a1419dbbfa6
---
mem: implement x86 locked accesses in timing-mode classic cache

Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
use a combination of clearing permission bits and leaving
an MSHR in place to prevent accesses & snoops from touching
a locked block between the read and write parts of an locked
RMW sequence.


Diffs (updated)
-

  src/mem/cache/cache.cc df24b9af42c72606f1fa8e5aa0502b53e81ea176 
  src/mem/cache/mshr.hh df24b9af42c72606f1fa8e5aa0502b53e81ea176 
  src/mem/packet.hh df24b9af42c72606f1fa8e5aa0502b53e81ea176 
  src/mem/packet.cc df24b9af42c72606f1fa8e5aa0502b53e81ea176 

Diff: http://reviews.gem5.org/r/2691/diff/


Testing
---


Thanks,

Steve Reinhardt

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2016-04-13 Thread Bjoern A. Zeeb

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/#review8197
---


Can you please update it?
It neither applies cleanly nor does it compile afterwards.

- Bjoern A. Zeeb


On Jan. 19, 2016, 3:46 a.m., Steve Reinhardt wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2691/
> ---
> 
> (Updated Jan. 19, 2016, 3:46 a.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> ---
> 
> Changeset 10745:9b84d1b570e3
> ---
> mem: implement x86 locked accesses in timing-mode classic cache
> 
> Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
> use a combination of clearing permission bits and leaving
> an MSHR in place to prevent accesses & snoops from touching
> a locked block between the read and write parts of an locked
> RMW sequence.
> 
> 
> Diffs
> -
> 
>   src/mem/packet.cc d06e5a6b4b7f05a642c3e2bee12cfeb130dede16 
>   src/mem/cache/cache.cc d06e5a6b4b7f05a642c3e2bee12cfeb130dede16 
>   src/mem/cache/mshr.hh d06e5a6b4b7f05a642c3e2bee12cfeb130dede16 
>   src/mem/packet.hh d06e5a6b4b7f05a642c3e2bee12cfeb130dede16 
> 
> Diff: http://reviews.gem5.org/r/2691/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Steve Reinhardt
> 
>

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2016-01-18 Thread Steve Reinhardt

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/
---

(Updated Jan. 18, 2016, 7:46 p.m.)


Review request for Default.


Changes
---

This revision doesn't address any reviewer concerns, but it does update the old 
implementation to work with the latest tree (for those who are still 
downloading this manually).  It may also fix a bug; there was a bug in one of 
our internal versions but I'm not sure if that bug made it out onto reviewboard 
or not.


Repository: gem5


Description
---

Changeset 10745:9b84d1b570e3
---
mem: implement x86 locked accesses in timing-mode classic cache

Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
use a combination of clearing permission bits and leaving
an MSHR in place to prevent accesses & snoops from touching
a locked block between the read and write parts of an locked
RMW sequence.


Diffs (updated)
-

  src/mem/packet.cc d06e5a6b4b7f05a642c3e2bee12cfeb130dede16 
  src/mem/cache/cache.cc d06e5a6b4b7f05a642c3e2bee12cfeb130dede16 
  src/mem/cache/mshr.hh d06e5a6b4b7f05a642c3e2bee12cfeb130dede16 
  src/mem/packet.hh d06e5a6b4b7f05a642c3e2bee12cfeb130dede16 

Diff: http://reviews.gem5.org/r/2691/diff/


Testing
---


Thanks,

Steve Reinhardt

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2016-01-01 Thread Andreas Hansson


> On May 7, 2015, 7:55 a.m., Andreas Hansson wrote:
> > src/mem/cache/cache_impl.hh, line 639
> > 
> >
> > Is it safe to call this here and ignore the return value?
> > 
> > It would be nice if we do not assume that the response is always 
> > accepted.
> > 
> > Perhaps schedule an event and put it in a queue rather?
> 
> Steve Reinhardt wrote:
> recvTimingResp() doesn't have a return value... responses are always 
> accepted.

Actually responses are not always accepted directly. They are always guarnateed 
to be sunk without dependencies, but there are modules that use flow control on 
responses (simply for rate regulation reasons).


- Andreas


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/#review6124
---


On May 6, 2015, 10:59 p.m., Steve Reinhardt wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2691/
> ---
> 
> (Updated May 6, 2015, 10:59 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> ---
> 
> Changeset 10745:9b84d1b570e3
> ---
> mem: implement x86 locked accesses in timing-mode classic cache
> 
> Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
> use a combination of clearing permission bits and leaving
> an MSHR in place to prevent accesses & snoops from touching
> a locked block between the read and write parts of an locked
> RMW sequence.
> 
> 
> Diffs
> -
> 
>   src/mem/cache/cache_impl.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   src/mem/cache/mshr.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   src/mem/packet.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   src/mem/packet.cc fbdaa08aaa426b9f4660c366f934ccb670d954ec 
> 
> Diff: http://reviews.gem5.org/r/2691/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Steve Reinhardt
> 
>

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2015-05-07 Thread Andreas Hansson

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/#review6124
---


I vote for the non-goto verison.

Is there really no way we can achieve this without the extra packets going back 
and forth? It seems the shortcircuiting we do with packets we should be able to 
do without them if we tapped in one level below the 
recvTimingReq/recvTimingResp.


src/mem/cache/cache_impl.hh (line 527)
http://reviews.gem5.org/r/2691/#comment5327

later on we test pkt-isLockedRMW and then read or write. Could we not do 
the same here rather and avoid looking at the actual command, i.e.

pkt-isLockedRMW()  pkt-isWrite()



src/mem/cache/cache_impl.hh (line 621)
http://reviews.gem5.org/r/2691/#comment5328

status should be 0 even, or could it still have the dirty flag set?

the readable flag is essentially valid, so here we invent a new state

Readable = 0
Dirty = ?
Writeable = 0

Still the block is valid.

This scares me a bit. Partly because the binary encoding of cache states is 
not that easy to follow, and partly because we now have MOESI + some new state.



src/mem/cache/cache_impl.hh (line 638)
http://reviews.gem5.org/r/2691/#comment5329

Why is this MSHR needed? It was faked, was it not?



src/mem/cache/cache_impl.hh (line 639)
http://reviews.gem5.org/r/2691/#comment5330

Is it safe to call this here and ignore the return value?

It would be nice if we do not assume that the response is always accepted.

Perhaps schedule an event and put it in a queue rather?



src/mem/cache/cache_impl.hh (line 1296)
http://reviews.gem5.org/r/2691/#comment5331

potentially still dirty, or always came from Exclusive, i.e. Readable | 
Writeable?



src/mem/packet.hh (line 107)
http://reviews.gem5.org/r/2691/#comment5326

Do we need the actual LockedRMW... or could if just be a flag on the 
existing read and write req/resp?


- Andreas Hansson


On May 6, 2015, 10:59 p.m., Steve Reinhardt wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2691/
 ---
 
 (Updated May 6, 2015, 10:59 p.m.)
 
 
 Review request for Default.
 
 
 Repository: gem5
 
 
 Description
 ---
 
 Changeset 10745:9b84d1b570e3
 ---
 mem: implement x86 locked accesses in timing-mode classic cache
 
 Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
 use a combination of clearing permission bits and leaving
 an MSHR in place to prevent accesses  snoops from touching
 a locked block between the read and write parts of an locked
 RMW sequence.
 
 
 Diffs
 -
 
   src/mem/cache/cache_impl.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
   src/mem/cache/mshr.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
   src/mem/packet.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
   src/mem/packet.cc fbdaa08aaa426b9f4660c366f934ccb670d954ec 
 
 Diff: http://reviews.gem5.org/r/2691/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Steve Reinhardt
 


___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2015-05-07 Thread Steve Reinhardt


 On May 7, 2015, 12:55 a.m., Andreas Hansson wrote:
  I vote for the non-goto verison.
  
  Is there really no way we can achieve this without the extra packets going 
  back and forth? It seems the shortcircuiting we do with packets we should 
  be able to do without them if we tapped in one level below the 
  recvTimingReq/recvTimingResp.

Thanks for the review!  I'm not sure what you mean by one level below though.

There are lots of ways this can be done; I spent a lot of time thinking about 
how to do this before I started on it.  The key is that you have to check 
whether a block is locked when snoops or other requests come in, defer those 
snoops and other accesses in some kind of list during the RMW window, then 
process that list of deferred snoops/requests when the RMW completes.

Given that we already have this entire mechanism already in place with the 
MSHRs, it seems obvious to me that we should leverage it.  Faking the packets 
is a little weird but allows us to make the most use of existing code paths and 
minimizes changes.  This is less than 60 net lines of non-comment code added to 
cache_impl.hh, which I think is pretty good for a feature that I initially 
expected would be much more complex.

I'm sure there are ways we could do this without the fake packets, at the cost 
of some significant restructuring (e.g., taking the 'while 
(mshr-hasTargets())' loop out of recvTimingResp() and making it callable from 
multiple places) and probably more special-casing in the existing code paths.  
For example, we could have a flag in MSHR::Target that indicates that the block 
is locked, and leave the packet pointer null, but then we'd probably end up 
adding null pointer checks in several places.  That was more than I really 
wanted to bite off though. I suppose I could give it a shot but it might be a 
while until I have time.


 On May 7, 2015, 12:55 a.m., Andreas Hansson wrote:
  src/mem/cache/cache_impl.hh, line 527
  http://reviews.gem5.org/r/2691/diff/3/?file=44871#file44871line527
 
  later on we test pkt-isLockedRMW and then read or write. Could we not 
  do the same here rather and avoid looking at the actual command, i.e.
  
  pkt-isLockedRMW()  pkt-isWrite()

Sure.


 On May 7, 2015, 12:55 a.m., Andreas Hansson wrote:
  src/mem/cache/cache_impl.hh, line 621
  http://reviews.gem5.org/r/2691/diff/3/?file=44871#file44871line621
 
  status should be 0 even, or could it still have the dirty flag set?
  
  the readable flag is essentially valid, so here we invent a new state
  
  Readable = 0
  Dirty = ?
  Writeable = 0
  
  Still the block is valid.
  
  This scares me a bit. Partly because the binary encoding of cache 
  states is not that easy to follow, and partly because we now have MOESI + 
  some new state.

It could still have the dirty bit set, though practically speaking that's moot, 
since it will be set by the write part of the RMW regardless.  There are other 
bits like BlkHWPrefetched and BlkSecure that we probably don't want to mess 
with though. 

It is obviously not a MOESI state, but that's OK, because coherence protocol 
operations will never observe this state, as all snoops are also being 
deferred. Its only purpose is to make all CPU-side accesses (read or write) go 
down the miss path, where we can defer them until the RMW completes.  There is 
precedent for this; we also clear the Readable bit on valid blocks in the 
window after a write miss (while the upgrade request is outstanding); see the 
comment around line 788 of this file.


 On May 7, 2015, 12:55 a.m., Andreas Hansson wrote:
  src/mem/cache/cache_impl.hh, line 638
  http://reviews.gem5.org/r/2691/diff/3/?file=44871#file44871line638
 
  Why is this MSHR needed? It was faked, was it not?

Not necessarily... if the read part of the RMW was an actual cache miss, this 
is still the non-fake MSHR that was allocated at that time.  The initial target 
of that MSHR will have a fake packet at this point, but there could be real 
deferred targets waiting behind it.


 On May 7, 2015, 12:55 a.m., Andreas Hansson wrote:
  src/mem/cache/cache_impl.hh, line 639
  http://reviews.gem5.org/r/2691/diff/3/?file=44871#file44871line639
 
  Is it safe to call this here and ignore the return value?
  
  It would be nice if we do not assume that the response is always 
  accepted.
  
  Perhaps schedule an event and put it in a queue rather?

recvTimingResp() doesn't have a return value... responses are always accepted.


 On May 7, 2015, 12:55 a.m., Andreas Hansson wrote:
  src/mem/cache/cache_impl.hh, line 1296
  http://reviews.gem5.org/r/2691/diff/3/?file=44871#file44871line1296
 
  potentially still dirty, or always came from Exclusive, i.e. Readable 
  | Writeable?

BlkDirty may have been set in handleFill()


 On May 7, 2015, 12:55 a.m., Andreas Hansson wrote:
  src/mem/packet.hh, line 107
  

Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2015-05-06 Thread Steve Reinhardt

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/
---

(Updated May 6, 2015, 3:59 p.m.)


Review request for Default.


Changes
---

This is an alternative version of the patch that uses a goto to narrow the 
scope of the early_exit flag and reduce indentation levels.  I'm not 
necessarily replacing the prior patch with this one; I really just want to post 
two alternatives, and this seemed easier than creating a whole new patch to 
review.  You probably want to focus just on the diff between this revision and 
the previous one.


Repository: gem5


Description
---

Changeset 10745:9b84d1b570e3
---
mem: implement x86 locked accesses in timing-mode classic cache

Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
use a combination of clearing permission bits and leaving
an MSHR in place to prevent accesses  snoops from touching
a locked block between the read and write parts of an locked
RMW sequence.


Diffs (updated)
-

  src/mem/cache/cache_impl.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
  src/mem/cache/mshr.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
  src/mem/packet.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
  src/mem/packet.cc fbdaa08aaa426b9f4660c366f934ccb670d954ec 

Diff: http://reviews.gem5.org/r/2691/diff/


Testing
---


Thanks,

Steve Reinhardt

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2015-05-06 Thread Steve Reinhardt

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/
---

(Updated May 6, 2015, 3:21 p.m.)


Review request for Default.


Changes
---

Updated to tip of tree.  Fixed memory leak and spelling error.

Note that I had to remove the recently added 'const' modifier from the pkt 
field of the MSHR::Target object as part of the update.  As mentioned in the 
comment, we really need to use the existing Packet object for the response and 
substitute in a dummy packet, since the former has the allocated data pointer 
for the response data.  I tried the option of creating a new packet for the 
response, but there's no clean way to move an dynamically allocated data 
pointer from an existing packet to a new one.  We could add methods to Packet 
to do this, but since it's all just to avoid removing 'const' from the pkt 
field, it seemed easier just to remove 'const'.  If anyone has a better idea on 
how to handle this, I'm listening.


Repository: gem5


Description
---

Changeset 10745:9b84d1b570e3
---
mem: implement x86 locked accesses in timing-mode classic cache

Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
use a combination of clearing permission bits and leaving
an MSHR in place to prevent accesses  snoops from touching
a locked block between the read and write parts of an locked
RMW sequence.


Diffs (updated)
-

  src/mem/cache/cache_impl.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
  src/mem/cache/mshr.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
  src/mem/packet.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
  src/mem/packet.cc fbdaa08aaa426b9f4660c366f934ccb670d954ec 

Diff: http://reviews.gem5.org/r/2691/diff/


Testing
---


Thanks,

Steve Reinhardt

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2015-05-06 Thread Steve Reinhardt


 On April 8, 2015, 1:43 p.m., Stephan Diestelhorst wrote:
  src/mem/cache/cache_impl.hh, line 1286
  http://reviews.gem5.org/r/2691/diff/1/?file=44226#file44226line1286
 
  Why don't you break and drop the early_exit?

[Hmm, I had a comment here but reviewboard silently dropped it when I posted.]

Basically break doesn't work here because you still need to exit the code after 
the if/else-if/else-if... sequence, but before the break.  That is, the set of 
lines starting with:
tgt_pkt-makeTimingResponse();


- Steve


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/#review6028
---


On May 6, 2015, 3:21 p.m., Steve Reinhardt wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2691/
 ---
 
 (Updated May 6, 2015, 3:21 p.m.)
 
 
 Review request for Default.
 
 
 Repository: gem5
 
 
 Description
 ---
 
 Changeset 10745:9b84d1b570e3
 ---
 mem: implement x86 locked accesses in timing-mode classic cache
 
 Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
 use a combination of clearing permission bits and leaving
 an MSHR in place to prevent accesses  snoops from touching
 a locked block between the read and write parts of an locked
 RMW sequence.
 
 
 Diffs
 -
 
   src/mem/cache/cache_impl.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
   src/mem/cache/mshr.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
   src/mem/packet.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
   src/mem/packet.cc fbdaa08aaa426b9f4660c366f934ccb670d954ec 
 
 Diff: http://reviews.gem5.org/r/2691/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Steve Reinhardt
 


___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2015-05-06 Thread Steve Reinhardt


 On April 8, 2015, 1:43 p.m., Stephan Diestelhorst wrote:
  src/mem/cache/cache_impl.hh, line 1359
  http://reviews.gem5.org/r/2691/diff/1/?file=44226#file44226line1359
 
  This should really be:
  
  if (!mshr.hasTargets())
  
  and all the stuff below has too deep an (or two) indentation level, 
  IMHO.  I seriously consider goto an appropriate way to improve readability 
  here.

Changing the test to 'if (!mshr.hasTargets())' is equivalent (I believe) but 
it's not clear to me that it's better... the only reason that condition could 
be true would be because early_exit is false, and early_exit is the more 
meaningful cause, I'd say.

I could use a goto... as I mentioned above, I seriously considered it, but 
rejected it since it wouldn't completely replace early_exit. I'd have no 
problem with doing it here, but others might, so I was doing it as an if-block 
to avoid the controversy.  I will post another version of the patch with a goto 
for comments on this specifically.


- Steve


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/#review6028
---


On May 6, 2015, 3:21 p.m., Steve Reinhardt wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2691/
 ---
 
 (Updated May 6, 2015, 3:21 p.m.)
 
 
 Review request for Default.
 
 
 Repository: gem5
 
 
 Description
 ---
 
 Changeset 10745:9b84d1b570e3
 ---
 mem: implement x86 locked accesses in timing-mode classic cache
 
 Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
 use a combination of clearing permission bits and leaving
 an MSHR in place to prevent accesses  snoops from touching
 a locked block between the read and write parts of an locked
 RMW sequence.
 
 
 Diffs
 -
 
   src/mem/cache/cache_impl.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
   src/mem/cache/mshr.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
   src/mem/packet.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
   src/mem/packet.cc fbdaa08aaa426b9f4660c366f934ccb670d954ec 
 
 Diff: http://reviews.gem5.org/r/2691/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Steve Reinhardt
 


___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2015-04-08 Thread Stephan Diestelhorst

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/#review6028
---


Get the idea, but I am burnt when it comes to debugging request + packet deep 
copies.


src/mem/cache/cache_impl.hh
http://reviews.gem5.org/r/2691/#comment5250

Since I just spent a few good hours hunting memory allocation bugs... where 
will these get deallocated again?  It is not obvious, so maybe chuck in a 
comment, here?



src/mem/cache/cache_impl.hh
http://reviews.gem5.org/r/2691/#comment5251

Again.. I'd like to see a comment of where these guys (or the originals) 
are deallocated.



src/mem/cache/cache_impl.hh
http://reviews.gem5.org/r/2691/#comment5252

Why don't you break and drop the early_exit?



src/mem/cache/cache_impl.hh
http://reviews.gem5.org/r/2691/#comment5253

This should really be:

if (!mshr.hasTargets())

and all the stuff below has too deep an (or two) indentation level, IMHO.  
I seriously consider goto an appropriate way to improve readability here.


- Stephan Diestelhorst


On März 14, 2015, 5:19 nachm., Steve Reinhardt wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2691/
 ---
 
 (Updated März 14, 2015, 5:19 nachm.)
 
 
 Review request for Default.
 
 
 Repository: gem5
 
 
 Description
 ---
 
 Changeset 10745:9b84d1b570e3
 ---
 mem: implement x86 locked accesses in timing-mode classic cache
 
 Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
 use a combination of clearing permission bits and leaving
 an MSHR in place to prevent accesses  snoops from touching
 a locked block between the read and write parts of an locked
 RMW sequence.
 
 
 Diffs
 -
 
   src/mem/cache/cache_impl.hh 655ff3f6352d7aa4021f8840b68698b22806 
   src/mem/packet.hh 655ff3f6352d7aa4021f8840b68698b22806 
   src/mem/packet.cc 655ff3f6352d7aa4021f8840b68698b22806 
 
 Diff: http://reviews.gem5.org/r/2691/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Steve Reinhardt
 


___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2015-04-07 Thread Nilay Vaish

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/#review6026
---



src/mem/cache/cache_impl.hh
http://reviews.gem5.org/r/2691/#comment5249

Spelling mistake.


- Nilay Vaish


On March 14, 2015, 5:19 p.m., Steve Reinhardt wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2691/
 ---
 
 (Updated March 14, 2015, 5:19 p.m.)
 
 
 Review request for Default.
 
 
 Repository: gem5
 
 
 Description
 ---
 
 Changeset 10745:9b84d1b570e3
 ---
 mem: implement x86 locked accesses in timing-mode classic cache
 
 Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
 use a combination of clearing permission bits and leaving
 an MSHR in place to prevent accesses  snoops from touching
 a locked block between the read and write parts of an locked
 RMW sequence.
 
 
 Diffs
 -
 
   src/mem/cache/cache_impl.hh 655ff3f6352d7aa4021f8840b68698b22806 
   src/mem/packet.hh 655ff3f6352d7aa4021f8840b68698b22806 
   src/mem/packet.cc 655ff3f6352d7aa4021f8840b68698b22806 
 
 Diff: http://reviews.gem5.org/r/2691/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Steve Reinhardt
 


___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2015-03-23 Thread Andreas Hansson

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/#review5961
---


Great to see this functionality added. I am not 100% convinced of the way it is 
done, but perhaps there is no neater way of doing it. Especially the new 
creation of requests and packets is making me nervous. Who is deleting these?

Also, how does this interact with existing MSHRs? Targets and deferred targets? 
Similarly, now we fake an MSHR even if the block is present in the cache. Would 
it make sense to flush/downgrade somehow and then treat this like a miss?

- Andreas Hansson


On March 14, 2015, 5:19 p.m., Steve Reinhardt wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2691/
 ---
 
 (Updated March 14, 2015, 5:19 p.m.)
 
 
 Review request for Default.
 
 
 Repository: gem5
 
 
 Description
 ---
 
 Changeset 10745:9b84d1b570e3
 ---
 mem: implement x86 locked accesses in timing-mode classic cache
 
 Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
 use a combination of clearing permission bits and leaving
 an MSHR in place to prevent accesses  snoops from touching
 a locked block between the read and write parts of an locked
 RMW sequence.
 
 
 Diffs
 -
 
   src/mem/cache/cache_impl.hh 655ff3f6352d7aa4021f8840b68698b22806 
   src/mem/packet.hh 655ff3f6352d7aa4021f8840b68698b22806 
   src/mem/packet.cc 655ff3f6352d7aa4021f8840b68698b22806 
 
 Diff: http://reviews.gem5.org/r/2691/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Steve Reinhardt
 


___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2015-03-18 Thread Steve Reinhardt


 On March 17, 2015, 12:07 p.m., Andreas Hansson wrote:
  src/mem/cache/cache_impl.hh, line 1417
  http://reviews.gem5.org/r/2691/diff/1/?file=44226#file44226line1417
 
  the changed order of the write buffer allocation and dealing with the 
  writebacks is important?

Good question.  I reordered the two blocks only so I could put the tempBlock 
clause inside of 'if (!early_exit)' and leave the writebacks code outside.  
It's necessary to do the writebacks before exiting the function, since that's a 
local variable and otherwise the writebacks would be lost.  In contrast, 
tempBlock is part of the cache object so it will stick around, and while I 
haven't thought it through carefully, I assume there could be cases where 
you're doing a locked RMW in tempBlock, so you don't want to kick it out 
prematurely.

Looking at handleFill(), it seems unlikely (dare I say impossible) that you 
would have both writebacks and the use of tempBlock at the same time, since the 
only time you use tempBlock is if you can't use any existing cache block frame, 
and if you had bothered to do a writeback then by definition you would be 
clearing out an existing cache block frame.  So by that reasoning the order 
shouldn't matter.


- Steve


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/#review5943
---


On March 14, 2015, 10:19 a.m., Steve Reinhardt wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2691/
 ---
 
 (Updated March 14, 2015, 10:19 a.m.)
 
 
 Review request for Default.
 
 
 Repository: gem5
 
 
 Description
 ---
 
 Changeset 10745:9b84d1b570e3
 ---
 mem: implement x86 locked accesses in timing-mode classic cache
 
 Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
 use a combination of clearing permission bits and leaving
 an MSHR in place to prevent accesses  snoops from touching
 a locked block between the read and write parts of an locked
 RMW sequence.
 
 
 Diffs
 -
 
   src/mem/cache/cache_impl.hh 655ff3f6352d7aa4021f8840b68698b22806 
   src/mem/packet.hh 655ff3f6352d7aa4021f8840b68698b22806 
   src/mem/packet.cc 655ff3f6352d7aa4021f8840b68698b22806 
 
 Diff: http://reviews.gem5.org/r/2691/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Steve Reinhardt
 


___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2015-03-17 Thread Andreas Hansson

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/#review5943
---



src/mem/cache/cache_impl.hh
http://reviews.gem5.org/r/2691/#comment5194

the changed order of the write buffer allocation and dealing with the 
writebacks is important?


- Andreas Hansson


On March 14, 2015, 5:19 p.m., Steve Reinhardt wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2691/
 ---
 
 (Updated March 14, 2015, 5:19 p.m.)
 
 
 Review request for Default.
 
 
 Repository: gem5
 
 
 Description
 ---
 
 Changeset 10745:9b84d1b570e3
 ---
 mem: implement x86 locked accesses in timing-mode classic cache
 
 Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
 use a combination of clearing permission bits and leaving
 an MSHR in place to prevent accesses  snoops from touching
 a locked block between the read and write parts of an locked
 RMW sequence.
 
 
 Diffs
 -
 
   src/mem/cache/cache_impl.hh 655ff3f6352d7aa4021f8840b68698b22806 
   src/mem/packet.hh 655ff3f6352d7aa4021f8840b68698b22806 
   src/mem/packet.cc 655ff3f6352d7aa4021f8840b68698b22806 
 
 Diff: http://reviews.gem5.org/r/2691/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Steve Reinhardt
 


___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

2015-03-14 Thread Steve Reinhardt

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/
---

Review request for Default.


Repository: gem5


Description
---

Changeset 10745:9b84d1b570e3
---
mem: implement x86 locked accesses in timing-mode classic cache

Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
use a combination of clearing permission bits and leaving
an MSHR in place to prevent accesses  snoops from touching
a locked block between the read and write parts of an locked
RMW sequence.


Diffs
-

  src/mem/cache/cache_impl.hh 655ff3f6352d7aa4021f8840b68698b22806 
  src/mem/packet.hh 655ff3f6352d7aa4021f8840b68698b22806 
  src/mem/packet.cc 655ff3f6352d7aa4021f8840b68698b22806 

Diff: http://reviews.gem5.org/r/2691/diff/


Testing
---


Thanks,

Steve Reinhardt

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev