[m5-dev] TBH/TBB ARM instructions should potentially be split into 2 micro-ops?

2011-04-26 Thread Geoffrey Blake
I've run into a buggy interaction for the ARM ISA between a TBH (or TBB)
instruction and a dependent memory operation (that gets squashed) in the O3
model leading to erroneous behavior when diffed against the Atomic model.
The TBH instruction is a table-based branch that has to index into memory to
calculate its branch destination, so it is both a branch and a memory op.
The buggy behavior is as follows:

1) Fetch a TBH, predict branch destination
2) Begin fetching from predicted PC (which happens to be correct in my buggy
run)
3) Issue younger dependent memory op to LSQ and send request to cache ahead
of TBH which is waiting on register operands
4) Issue TBH to LSQ to read memory for branch destination
5) Memory violation detection with younger instruction and squash for memory
ordering
--- This squash then calls squashDueToMemOrder(...), which redirects the
PC of Fetch to a stale PC value stored in the TBH dyn-inst object as it
hasn't yet calculated its true PC
6) Start fetching down wrong path
7) TBH completes, but since the branch part was predicted correctly, no
additional squash happens in checkMisprediction (which it may not even check
due the already outstanding squash)

I see two ways to fix this, either hack up the O3 model to handle this case
of a fused memory-op and branch instruction (recheck to squash when the TBH
finally resolves for the special case of squashing dependent memory ops
causing the fetch to screw up the branch), or split the instruction into 2
micro-ops (the load and then a dependent branch).  Which one do people think
would be the better option?  I'm currently leaning toward micro-coding the
instruction.

Thanks,
Geoff Blake
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] TBH/TBB ARM instructions should potentially be split into 2 micro-ops?

2011-04-26 Thread Korey Sewell
on #5, is it the case that the branch is always mispredicted on a
squashDueToMemOrder() ??? I would think so because you havent got the branch
back right?

Sounds like there is a couple issues to tackle:
1) Where to start fetching while you wait for resolution?
--- a: look in the BTB for a predicted address and if so use that PC to
fetch from.
-If it's not in the BTB, you can start fetching down the not-taken
path there.
--- b: Stall fetch at that point until the branch resolves. This would be
similar to the trapPending flags that is used to keep fetch from going
down a known wrong path I think.

2) Micro-op or fix up the branch?
- I would say  change the branch at that point there to always mispredict
(maybe set the predictedPC to 0) to prevent using a dated prediction.
- If you do the BTB think I suggested above then use that as the predictedPC
- Then, once the branch resolves, let the normal mechanisms check against
the predictedPC and it should squash always (if you reset the predPC to 0)
or squash conditionally if you use the BTB to updated your predictedPC.
- With regards to already outstanding squash, as long as that outstanding
squash is not the oldest squash then your most recent squash will be used.
This may provide a quirky problem with the microops depending on how those
get their sequence numbers. However, I'll leave further elaboration on the
micro-op option to Gabe/Ali since they are more in tune with any gotchas on
that end, but I think you could do this without a microop a little cleaner.

On Tue, Apr 26, 2011 at 3:12 PM, Geoffrey Blake bla...@umich.edu wrote:

 I've run into a buggy interaction for the ARM ISA between a TBH (or TBB)
 instruction and a dependent memory operation (that gets squashed) in the O3
 model leading to erroneous behavior when diffed against the Atomic model.
 The TBH instruction is a table-based branch that has to index into memory
 to
 calculate its branch destination, so it is both a branch and a memory op.
 The buggy behavior is as follows:

 1) Fetch a TBH, predict branch destination
 2) Begin fetching from predicted PC (which happens to be correct in my
 buggy
 run)
 3) Issue younger dependent memory op to LSQ and send request to cache ahead
 of TBH which is waiting on register operands
 4) Issue TBH to LSQ to read memory for branch destination
 5) Memory violation detection with younger instruction and squash for
 memory
 ordering
--- This squash then calls squashDueToMemOrder(...), which redirects the
 PC of Fetch to a stale PC value stored in the TBH dyn-inst object as it
 hasn't yet calculated its true PC
 6) Start fetching down wrong path
 7) TBH completes, but since the branch part was predicted correctly, no
 additional squash happens in checkMisprediction (which it may not even
 check
 due the already outstanding squash)

 I see two ways to fix this, either hack up the O3 model to handle this case
 of a fused memory-op and branch instruction (recheck to squash when the TBH
 finally resolves for the special case of squashing dependent memory ops
 causing the fetch to screw up the branch), or split the instruction into 2
 micro-ops (the load and then a dependent branch).  Which one do people
 think
 would be the better option?  I'm currently leaning toward micro-coding the
 instruction.

 Thanks,
 Geoff Blake
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev




-- 
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev