Re: [gem5-users] Micro-op Data Dependency
Yeah, I looked at them first to figure out what I had to do--I don't think they have intermediate registers like mine have to, or at least I didn't see it when I first looked. Anyway, your suggestion for creating a constant with the value of the register index to use in the operand definition worked, and so now the RMW instructions work for all four CPU models. Thanks for your help! On Tue, Aug 2, 2016 at 5:49 PM, Steve Reinhardtwrote: > I don't know that off the top of my head---the ISAs I'm familiar with are > either not microcoded, or use a micro-op assembler to generate all the > micro-ops (i.e., x86). Have you looked at how ARM micro-ops are > constructed? That's the one ISA that I believe is mostly not microcoded > but still has some microcode in it. > > Though come to think of it, it may be as easy as just using a constant > where the other operands specify the machine code bitfield, if there's > syntax that allows that. > > Steve > > > On Tue, Aug 2, 2016 at 1:54 PM Alec Roelke wrote: > >> Okay, thanks. How do I tell the ISA parser that the 'Rt' operand I've >> created refers to the extra architectural register? Or is there some >> function I can call inside the instruction's code that writes directly to >> an architectural register? All I can see from the code GEM5 generates is >> "setIntRegOperand," which takes indices into _destRegIdx rather than >> register indices. >> >> On Mon, Aug 1, 2016 at 10:58 AM, Steve Reinhardt >> wrote: >> >>> You don't need to worry about the size of the bitfield in the >>> instruction encoding, because the temporary register(s) will never be >>> directly addressed by any machine instruction. You should define a new >>> architectural register using an index that doesn't appear in any >>> instruction (e.g., if the ISA includes r0 to r31, then the temp reg can be >>> r32). This register will get renamed in the O3 model. >>> >>> Steve >>> >>> >>> On Sun, Jul 31, 2016 at 7:21 AM Alec Roelke wrote: >>> That makes sense. Would it be enough for me to just create a new IntReg operand, like this: 'Rt': ('IntReg', 'ud', None, 'IsInteger', 4) and then increase the number of integer registers? The other integer operands have a bit field from the instruction bits, but since the ISA doesn't specify that these RMW instructions should be microcoded, there's no way to decode a temporary register from the instruction bits. Will GEM5 understand that and pick any integer register that's available? The memory address is taken from Rs1 before the load micro-op, and then stored in a C++ variable for the remainder of the instruction. That was done to ensure that other intervening instructions that might get executed in the O3 model don't change Rs1 between the load and modify-write micro-ops, but if I can get the temp register to work then that might fix itself. I was only setting _srcRegIdx and _destRegIdx for disassembly reasons; since the macro-op and first micro-op don't make use of Rs2, the instruction wasn't setting _srcRegIdx[1] and the disassembly would show something like 4294967295. Then it presented a potential solution to the minor CPU model problem I described before. No, most of the ISA is not microcoded. In fact, as I said, these RMW instructions are not specified to be microcoded by the ISA, but since they each have two memory transactions they didn't appear to work unless I split them into two micro-ops. On Sat, Jul 30, 2016 at 2:14 PM, Steve Reinhardt wrote: > You shouldn't be passing values between micro-ops using C++ variables, > you should pass the data in a register. (If necessary, create > microcode-only temporary registers for this purpose, like x86 does.) This > is microarchitectural state so you can't hide it from the CPU model. The > main problem here is that, since this "hidden" data dependency isn't > visible to the CPU model, it doesn't know that the micro-ops must be > executed in order. If you pass that data in a register, the pipeline > model > will enforce the dependency. > > Also, where do you set the address for the memory accesses? Again, > both micro-ops should read that out of a register, it should not be passed > implicitly via hidden variables. > > You shouldn't have to explicitly set the internal fields like > _srcRegIdx and _destRegIdx, the ISA parser should do that for you. > > Unfortunately the ISA description system wasn't originally designed to > support microcode, and that support was kind of shoehorned in after the > fact, so it is a little messy. Is your whole ISA microcoded, or just a > few > specific instructions? > > Steve > > > On Fri, Jul 29, 2016 at
Re: [gem5-users] Micro-op Data Dependency
I don't know that off the top of my head---the ISAs I'm familiar with are either not microcoded, or use a micro-op assembler to generate all the micro-ops (i.e., x86). Have you looked at how ARM micro-ops are constructed? That's the one ISA that I believe is mostly not microcoded but still has some microcode in it. Though come to think of it, it may be as easy as just using a constant where the other operands specify the machine code bitfield, if there's syntax that allows that. Steve On Tue, Aug 2, 2016 at 1:54 PM Alec Roelkewrote: > Okay, thanks. How do I tell the ISA parser that the 'Rt' operand I've > created refers to the extra architectural register? Or is there some > function I can call inside the instruction's code that writes directly to > an architectural register? All I can see from the code GEM5 generates is > "setIntRegOperand," which takes indices into _destRegIdx rather than > register indices. > > On Mon, Aug 1, 2016 at 10:58 AM, Steve Reinhardt wrote: > >> You don't need to worry about the size of the bitfield in the instruction >> encoding, because the temporary register(s) will never be directly >> addressed by any machine instruction. You should define a new >> architectural register using an index that doesn't appear in any >> instruction (e.g., if the ISA includes r0 to r31, then the temp reg can be >> r32). This register will get renamed in the O3 model. >> >> Steve >> >> >> On Sun, Jul 31, 2016 at 7:21 AM Alec Roelke wrote: >> >>> That makes sense. Would it be enough for me to just create a new IntReg >>> operand, like this: >>> >>> 'Rt': ('IntReg', 'ud', None, 'IsInteger', 4) >>> >>> and then increase the number of integer registers? The other integer >>> operands have a bit field from the instruction bits, but since the ISA >>> doesn't specify that these RMW instructions should be microcoded, there's >>> no way to decode a temporary register from the instruction bits. Will GEM5 >>> understand that and pick any integer register that's available? >>> >>> The memory address is taken from Rs1 before the load micro-op, and then >>> stored in a C++ variable for the remainder of the instruction. That was >>> done to ensure that other intervening instructions that might get executed >>> in the O3 model don't change Rs1 between the load and modify-write >>> micro-ops, but if I can get the temp register to work then that might fix >>> itself. >>> >>> I was only setting _srcRegIdx and _destRegIdx for disassembly reasons; >>> since the macro-op and first micro-op don't make use of Rs2, the >>> instruction wasn't setting _srcRegIdx[1] and the disassembly would show >>> something like 4294967295. Then it presented a potential solution to the >>> minor CPU model problem I described before. >>> >>> No, most of the ISA is not microcoded. In fact, as I said, these RMW >>> instructions are not specified to be microcoded by the ISA, but since they >>> each have two memory transactions they didn't appear to work unless I split >>> them into two micro-ops. >>> >>> On Sat, Jul 30, 2016 at 2:14 PM, Steve Reinhardt >>> wrote: >>> You shouldn't be passing values between micro-ops using C++ variables, you should pass the data in a register. (If necessary, create microcode-only temporary registers for this purpose, like x86 does.) This is microarchitectural state so you can't hide it from the CPU model. The main problem here is that, since this "hidden" data dependency isn't visible to the CPU model, it doesn't know that the micro-ops must be executed in order. If you pass that data in a register, the pipeline model will enforce the dependency. Also, where do you set the address for the memory accesses? Again, both micro-ops should read that out of a register, it should not be passed implicitly via hidden variables. You shouldn't have to explicitly set the internal fields like _srcRegIdx and _destRegIdx, the ISA parser should do that for you. Unfortunately the ISA description system wasn't originally designed to support microcode, and that support was kind of shoehorned in after the fact, so it is a little messy. Is your whole ISA microcoded, or just a few specific instructions? Steve On Fri, Jul 29, 2016 at 7:37 PM Alec Roelke wrote: > Sure, I can show some code snippets. First, here is the code for the > read micro-op for an atomic read-add-write: > > temp = Mem_sd; > > And the modify-write micro-op: > > Rd_sd = temp; > Mem_sd = Rs2_sd + temp; > > The memory address comes from Rs1. The variable "temp" is a temporary > location shared between the read and modify-write micro-ops (the address > from Rs1 is shared similarly to ensure it's the same when the instructions > are issued). > > In the
Re: [gem5-users] Micro-op Data Dependency
Okay, thanks. How do I tell the ISA parser that the 'Rt' operand I've created refers to the extra architectural register? Or is there some function I can call inside the instruction's code that writes directly to an architectural register? All I can see from the code GEM5 generates is "setIntRegOperand," which takes indices into _destRegIdx rather than register indices. On Mon, Aug 1, 2016 at 10:58 AM, Steve Reinhardtwrote: > You don't need to worry about the size of the bitfield in the instruction > encoding, because the temporary register(s) will never be directly > addressed by any machine instruction. You should define a new > architectural register using an index that doesn't appear in any > instruction (e.g., if the ISA includes r0 to r31, then the temp reg can be > r32). This register will get renamed in the O3 model. > > Steve > > > On Sun, Jul 31, 2016 at 7:21 AM Alec Roelke wrote: > >> That makes sense. Would it be enough for me to just create a new IntReg >> operand, like this: >> >> 'Rt': ('IntReg', 'ud', None, 'IsInteger', 4) >> >> and then increase the number of integer registers? The other integer >> operands have a bit field from the instruction bits, but since the ISA >> doesn't specify that these RMW instructions should be microcoded, there's >> no way to decode a temporary register from the instruction bits. Will GEM5 >> understand that and pick any integer register that's available? >> >> The memory address is taken from Rs1 before the load micro-op, and then >> stored in a C++ variable for the remainder of the instruction. That was >> done to ensure that other intervening instructions that might get executed >> in the O3 model don't change Rs1 between the load and modify-write >> micro-ops, but if I can get the temp register to work then that might fix >> itself. >> >> I was only setting _srcRegIdx and _destRegIdx for disassembly reasons; >> since the macro-op and first micro-op don't make use of Rs2, the >> instruction wasn't setting _srcRegIdx[1] and the disassembly would show >> something like 4294967295. Then it presented a potential solution to the >> minor CPU model problem I described before. >> >> No, most of the ISA is not microcoded. In fact, as I said, these RMW >> instructions are not specified to be microcoded by the ISA, but since they >> each have two memory transactions they didn't appear to work unless I split >> them into two micro-ops. >> >> On Sat, Jul 30, 2016 at 2:14 PM, Steve Reinhardt >> wrote: >> >>> You shouldn't be passing values between micro-ops using C++ variables, >>> you should pass the data in a register. (If necessary, create >>> microcode-only temporary registers for this purpose, like x86 does.) This >>> is microarchitectural state so you can't hide it from the CPU model. The >>> main problem here is that, since this "hidden" data dependency isn't >>> visible to the CPU model, it doesn't know that the micro-ops must be >>> executed in order. If you pass that data in a register, the pipeline model >>> will enforce the dependency. >>> >>> Also, where do you set the address for the memory accesses? Again, both >>> micro-ops should read that out of a register, it should not be passed >>> implicitly via hidden variables. >>> >>> You shouldn't have to explicitly set the internal fields like _srcRegIdx >>> and _destRegIdx, the ISA parser should do that for you. >>> >>> Unfortunately the ISA description system wasn't originally designed to >>> support microcode, and that support was kind of shoehorned in after the >>> fact, so it is a little messy. Is your whole ISA microcoded, or just a few >>> specific instructions? >>> >>> Steve >>> >>> >>> On Fri, Jul 29, 2016 at 7:37 PM Alec Roelke wrote: >>> Sure, I can show some code snippets. First, here is the code for the read micro-op for an atomic read-add-write: temp = Mem_sd; And the modify-write micro-op: Rd_sd = temp; Mem_sd = Rs2_sd + temp; The memory address comes from Rs1. The variable "temp" is a temporary location shared between the read and modify-write micro-ops (the address from Rs1 is shared similarly to ensure it's the same when the instructions are issued). In the constructor for the macro-op, I've included some code that explicitly sets the src and dest register indices so that they are displayed properly for execution traces: _numSrcRegs = 2; _srcRegIdx[0] = RS1; _srcRegIdx[1] = RS2; _numDestRegs = 1; _destRegIdx[0] = RD; So far, this works for the O3 model. But, in the minor model, it tries to execute the modify-write micro-op before the read micro-op is executed. The address is never loaded from Rs1, and so a segmentation fault often occurs. To try to fix it, I added this code to the constructors of each of the two micro-ops:
Re: [gem5-users] Micro-op Data Dependency
You don't need to worry about the size of the bitfield in the instruction encoding, because the temporary register(s) will never be directly addressed by any machine instruction. You should define a new architectural register using an index that doesn't appear in any instruction (e.g., if the ISA includes r0 to r31, then the temp reg can be r32). This register will get renamed in the O3 model. Steve On Sun, Jul 31, 2016 at 7:21 AM Alec Roelkewrote: > That makes sense. Would it be enough for me to just create a new IntReg > operand, like this: > > 'Rt': ('IntReg', 'ud', None, 'IsInteger', 4) > > and then increase the number of integer registers? The other integer > operands have a bit field from the instruction bits, but since the ISA > doesn't specify that these RMW instructions should be microcoded, there's > no way to decode a temporary register from the instruction bits. Will GEM5 > understand that and pick any integer register that's available? > > The memory address is taken from Rs1 before the load micro-op, and then > stored in a C++ variable for the remainder of the instruction. That was > done to ensure that other intervening instructions that might get executed > in the O3 model don't change Rs1 between the load and modify-write > micro-ops, but if I can get the temp register to work then that might fix > itself. > > I was only setting _srcRegIdx and _destRegIdx for disassembly reasons; > since the macro-op and first micro-op don't make use of Rs2, the > instruction wasn't setting _srcRegIdx[1] and the disassembly would show > something like 4294967295. Then it presented a potential solution to the > minor CPU model problem I described before. > > No, most of the ISA is not microcoded. In fact, as I said, these RMW > instructions are not specified to be microcoded by the ISA, but since they > each have two memory transactions they didn't appear to work unless I split > them into two micro-ops. > > On Sat, Jul 30, 2016 at 2:14 PM, Steve Reinhardt wrote: > >> You shouldn't be passing values between micro-ops using C++ variables, >> you should pass the data in a register. (If necessary, create >> microcode-only temporary registers for this purpose, like x86 does.) This >> is microarchitectural state so you can't hide it from the CPU model. The >> main problem here is that, since this "hidden" data dependency isn't >> visible to the CPU model, it doesn't know that the micro-ops must be >> executed in order. If you pass that data in a register, the pipeline model >> will enforce the dependency. >> >> Also, where do you set the address for the memory accesses? Again, both >> micro-ops should read that out of a register, it should not be passed >> implicitly via hidden variables. >> >> You shouldn't have to explicitly set the internal fields like _srcRegIdx >> and _destRegIdx, the ISA parser should do that for you. >> >> Unfortunately the ISA description system wasn't originally designed to >> support microcode, and that support was kind of shoehorned in after the >> fact, so it is a little messy. Is your whole ISA microcoded, or just a few >> specific instructions? >> >> Steve >> >> >> On Fri, Jul 29, 2016 at 7:37 PM Alec Roelke wrote: >> >>> Sure, I can show some code snippets. First, here is the code for the >>> read micro-op for an atomic read-add-write: >>> >>> temp = Mem_sd; >>> >>> And the modify-write micro-op: >>> >>> Rd_sd = temp; >>> Mem_sd = Rs2_sd + temp; >>> >>> The memory address comes from Rs1. The variable "temp" is a temporary >>> location shared between the read and modify-write micro-ops (the address >>> from Rs1 is shared similarly to ensure it's the same when the instructions >>> are issued). >>> >>> In the constructor for the macro-op, I've included some code that >>> explicitly sets the src and dest register indices so that they are >>> displayed properly for execution traces: >>> >>> _numSrcRegs = 2; >>> _srcRegIdx[0] = RS1; >>> _srcRegIdx[1] = RS2; >>> _numDestRegs = 1; >>> _destRegIdx[0] = RD; >>> >>> So far, this works for the O3 model. But, in the minor model, it tries >>> to execute the modify-write micro-op before the read micro-op is executed. >>> The address is never loaded from Rs1, and so a segmentation fault often >>> occurs. To try to fix it, I added this code to the constructors of each of >>> the two micro-ops: >>> >>> _numSrcRegs = _p->_numSrcRegs; >>> for (int i = 0; i < _numSrcRegs; i++) >>> _srcRegIdx[i] = _p->_srcRegIdx[i]; >>> _numDestRegs = _p->_numDestRegs; >>> for (int i = 0; i < _numDestRegs; i++) >>> _destRegIdx[i] = _p->_destRegIdx[i]; >>> >>> _p is a pointer to the "parent" macro-op. With this code, it works with >>> minor model, but the final calculated value in the modify-write micro-op >>> never gets written at the end of the instruction in the O3 model. >>> >>> >>> On Fri, Jul 29, 2016 at 2:50 PM, Steve Reinhardt >>> wrote: >>> I'm
Re: [gem5-users] Micro-op Data Dependency
That makes sense. Would it be enough for me to just create a new IntReg operand, like this: 'Rt': ('IntReg', 'ud', None, 'IsInteger', 4) and then increase the number of integer registers? The other integer operands have a bit field from the instruction bits, but since the ISA doesn't specify that these RMW instructions should be microcoded, there's no way to decode a temporary register from the instruction bits. Will GEM5 understand that and pick any integer register that's available? The memory address is taken from Rs1 before the load micro-op, and then stored in a C++ variable for the remainder of the instruction. That was done to ensure that other intervening instructions that might get executed in the O3 model don't change Rs1 between the load and modify-write micro-ops, but if I can get the temp register to work then that might fix itself. I was only setting _srcRegIdx and _destRegIdx for disassembly reasons; since the macro-op and first micro-op don't make use of Rs2, the instruction wasn't setting _srcRegIdx[1] and the disassembly would show something like 4294967295. Then it presented a potential solution to the minor CPU model problem I described before. No, most of the ISA is not microcoded. In fact, as I said, these RMW instructions are not specified to be microcoded by the ISA, but since they each have two memory transactions they didn't appear to work unless I split them into two micro-ops. On Sat, Jul 30, 2016 at 2:14 PM, Steve Reinhardtwrote: > You shouldn't be passing values between micro-ops using C++ variables, you > should pass the data in a register. (If necessary, create microcode-only > temporary registers for this purpose, like x86 does.) This is > microarchitectural state so you can't hide it from the CPU model. The main > problem here is that, since this "hidden" data dependency isn't visible to > the CPU model, it doesn't know that the micro-ops must be executed in > order. If you pass that data in a register, the pipeline model will > enforce the dependency. > > Also, where do you set the address for the memory accesses? Again, both > micro-ops should read that out of a register, it should not be passed > implicitly via hidden variables. > > You shouldn't have to explicitly set the internal fields like _srcRegIdx > and _destRegIdx, the ISA parser should do that for you. > > Unfortunately the ISA description system wasn't originally designed to > support microcode, and that support was kind of shoehorned in after the > fact, so it is a little messy. Is your whole ISA microcoded, or just a few > specific instructions? > > Steve > > > On Fri, Jul 29, 2016 at 7:37 PM Alec Roelke wrote: > >> Sure, I can show some code snippets. First, here is the code for the >> read micro-op for an atomic read-add-write: >> >> temp = Mem_sd; >> >> And the modify-write micro-op: >> >> Rd_sd = temp; >> Mem_sd = Rs2_sd + temp; >> >> The memory address comes from Rs1. The variable "temp" is a temporary >> location shared between the read and modify-write micro-ops (the address >> from Rs1 is shared similarly to ensure it's the same when the instructions >> are issued). >> >> In the constructor for the macro-op, I've included some code that >> explicitly sets the src and dest register indices so that they are >> displayed properly for execution traces: >> >> _numSrcRegs = 2; >> _srcRegIdx[0] = RS1; >> _srcRegIdx[1] = RS2; >> _numDestRegs = 1; >> _destRegIdx[0] = RD; >> >> So far, this works for the O3 model. But, in the minor model, it tries >> to execute the modify-write micro-op before the read micro-op is executed. >> The address is never loaded from Rs1, and so a segmentation fault often >> occurs. To try to fix it, I added this code to the constructors of each of >> the two micro-ops: >> >> _numSrcRegs = _p->_numSrcRegs; >> for (int i = 0; i < _numSrcRegs; i++) >> _srcRegIdx[i] = _p->_srcRegIdx[i]; >> _numDestRegs = _p->_numDestRegs; >> for (int i = 0; i < _numDestRegs; i++) >> _destRegIdx[i] = _p->_destRegIdx[i]; >> >> _p is a pointer to the "parent" macro-op. With this code, it works with >> minor model, but the final calculated value in the modify-write micro-op >> never gets written at the end of the instruction in the O3 model. >> >> >> On Fri, Jul 29, 2016 at 2:50 PM, Steve Reinhardt >> wrote: >> >>> I'm still confused about the problems you're having. Stores should >>> never be executed speculatively in O3, even without the non-speculative >>> flag. Also, assuming the store micro-op reads a register that is written >>> by the load micro-op, then that true data dependence through the >>> intermediate register should enforce an ordering. Whether that destination >>> register is also a source or not should be irrelevant, particularly in O3 >>> where all the registers get renamed anyway. >>> >>> Perhaps if you show some snippets of your actual code it will be clearer >>> to me what's going on. >>> >>>
Re: [gem5-users] Micro-op Data Dependency
You shouldn't be passing values between micro-ops using C++ variables, you should pass the data in a register. (If necessary, create microcode-only temporary registers for this purpose, like x86 does.) This is microarchitectural state so you can't hide it from the CPU model. The main problem here is that, since this "hidden" data dependency isn't visible to the CPU model, it doesn't know that the micro-ops must be executed in order. If you pass that data in a register, the pipeline model will enforce the dependency. Also, where do you set the address for the memory accesses? Again, both micro-ops should read that out of a register, it should not be passed implicitly via hidden variables. You shouldn't have to explicitly set the internal fields like _srcRegIdx and _destRegIdx, the ISA parser should do that for you. Unfortunately the ISA description system wasn't originally designed to support microcode, and that support was kind of shoehorned in after the fact, so it is a little messy. Is your whole ISA microcoded, or just a few specific instructions? Steve On Fri, Jul 29, 2016 at 7:37 PM Alec Roelkewrote: > Sure, I can show some code snippets. First, here is the code for the read > micro-op for an atomic read-add-write: > > temp = Mem_sd; > > And the modify-write micro-op: > > Rd_sd = temp; > Mem_sd = Rs2_sd + temp; > > The memory address comes from Rs1. The variable "temp" is a temporary > location shared between the read and modify-write micro-ops (the address > from Rs1 is shared similarly to ensure it's the same when the instructions > are issued). > > In the constructor for the macro-op, I've included some code that > explicitly sets the src and dest register indices so that they are > displayed properly for execution traces: > > _numSrcRegs = 2; > _srcRegIdx[0] = RS1; > _srcRegIdx[1] = RS2; > _numDestRegs = 1; > _destRegIdx[0] = RD; > > So far, this works for the O3 model. But, in the minor model, it tries to > execute the modify-write micro-op before the read micro-op is executed. > The address is never loaded from Rs1, and so a segmentation fault often > occurs. To try to fix it, I added this code to the constructors of each of > the two micro-ops: > > _numSrcRegs = _p->_numSrcRegs; > for (int i = 0; i < _numSrcRegs; i++) > _srcRegIdx[i] = _p->_srcRegIdx[i]; > _numDestRegs = _p->_numDestRegs; > for (int i = 0; i < _numDestRegs; i++) > _destRegIdx[i] = _p->_destRegIdx[i]; > > _p is a pointer to the "parent" macro-op. With this code, it works with > minor model, but the final calculated value in the modify-write micro-op > never gets written at the end of the instruction in the O3 model. > > > On Fri, Jul 29, 2016 at 2:50 PM, Steve Reinhardt wrote: > >> I'm still confused about the problems you're having. Stores should never >> be executed speculatively in O3, even without the non-speculative flag. >> Also, assuming the store micro-op reads a register that is written by the >> load micro-op, then that true data dependence through the intermediate >> register should enforce an ordering. Whether that destination register is >> also a source or not should be irrelevant, particularly in O3 where all the >> registers get renamed anyway. >> >> Perhaps if you show some snippets of your actual code it will be clearer >> to me what's going on. >> >> Steve >> >> >> On Fri, Jul 29, 2016 at 9:33 AM Alec Roelke wrote: >> >>> Yes, that sums up my issues. I haven't gotten to tackling the second >>> one yet; I'm still working on the first. Thanks for the patch link, >>> though, that should help a lot when I get to it. >>> >>> To be more specific, I can get it to work with either the minor CPU >>> model or the O3 model, but not both at the same time. To get it to work >>> with the O3 model, I added the "IsNonSpeculative" flag to the modify-write >>> micro-op, which I assumed would prevent the O3 model from speculating on >>> its execution (which I also had to do with regular store instructions to >>> ensure that registers containing addresses would have the proper values >>> when the instruction executed). This works, but when I use it in the minor >>> CPU model, it issues the modify-write micro-op before the read micro-op >>> executes, meaning it hasn't loaded the memory address from the register >>> file yet and causes a segmentation fault. >>> >>> I assume this is caused by the fact that the code for the read operation >>> doesn't reference any register, as the instruction writes the value that >>> was read from memory to a dest register before modifying it and writing it >>> back. Because the dest register can be the same as a source register, I >>> have to pass the memory value from the read micro-op to the modify-write >>> micro-op without writing it to a register to avoid potentially polluting >>> the data written back. >>> >>> My fix was to explicitly set the source and dest registers of both >>> micro-ops to what
Re: [gem5-users] Micro-op Data Dependency
Sure, I can show some code snippets. First, here is the code for the read micro-op for an atomic read-add-write: temp = Mem_sd; And the modify-write micro-op: Rd_sd = temp; Mem_sd = Rs2_sd + temp; The memory address comes from Rs1. The variable "temp" is a temporary location shared between the read and modify-write micro-ops (the address from Rs1 is shared similarly to ensure it's the same when the instructions are issued). In the constructor for the macro-op, I've included some code that explicitly sets the src and dest register indices so that they are displayed properly for execution traces: _numSrcRegs = 2; _srcRegIdx[0] = RS1; _srcRegIdx[1] = RS2; _numDestRegs = 1; _destRegIdx[0] = RD; So far, this works for the O3 model. But, in the minor model, it tries to execute the modify-write micro-op before the read micro-op is executed. The address is never loaded from Rs1, and so a segmentation fault often occurs. To try to fix it, I added this code to the constructors of each of the two micro-ops: _numSrcRegs = _p->_numSrcRegs; for (int i = 0; i < _numSrcRegs; i++) _srcRegIdx[i] = _p->_srcRegIdx[i]; _numDestRegs = _p->_numDestRegs; for (int i = 0; i < _numDestRegs; i++) _destRegIdx[i] = _p->_destRegIdx[i]; _p is a pointer to the "parent" macro-op. With this code, it works with minor model, but the final calculated value in the modify-write micro-op never gets written at the end of the instruction in the O3 model. On Fri, Jul 29, 2016 at 2:50 PM, Steve Reinhardtwrote: > I'm still confused about the problems you're having. Stores should never > be executed speculatively in O3, even without the non-speculative flag. > Also, assuming the store micro-op reads a register that is written by the > load micro-op, then that true data dependence through the intermediate > register should enforce an ordering. Whether that destination register is > also a source or not should be irrelevant, particularly in O3 where all the > registers get renamed anyway. > > Perhaps if you show some snippets of your actual code it will be clearer > to me what's going on. > > Steve > > > On Fri, Jul 29, 2016 at 9:33 AM Alec Roelke wrote: > >> Yes, that sums up my issues. I haven't gotten to tackling the second one >> yet; I'm still working on the first. Thanks for the patch link, though, >> that should help a lot when I get to it. >> >> To be more specific, I can get it to work with either the minor CPU model >> or the O3 model, but not both at the same time. To get it to work with the >> O3 model, I added the "IsNonSpeculative" flag to the modify-write micro-op, >> which I assumed would prevent the O3 model from speculating on its >> execution (which I also had to do with regular store instructions to ensure >> that registers containing addresses would have the proper values when the >> instruction executed). This works, but when I use it in the minor CPU >> model, it issues the modify-write micro-op before the read micro-op >> executes, meaning it hasn't loaded the memory address from the register >> file yet and causes a segmentation fault. >> >> I assume this is caused by the fact that the code for the read operation >> doesn't reference any register, as the instruction writes the value that >> was read from memory to a dest register before modifying it and writing it >> back. Because the dest register can be the same as a source register, I >> have to pass the memory value from the read micro-op to the modify-write >> micro-op without writing it to a register to avoid potentially polluting >> the data written back. >> >> My fix was to explicitly set the source and dest registers of both >> micro-ops to what was decoded by the macro-op so GEM5 can infer >> dependencies, but then when I try it using the O3 model, the modify-write >> portion does not appear to actually write back to memory. >> >> On Fri, Jul 29, 2016 at 12:00 PM, wrote: >> >>> There are really two issues here, I think: >>> >>> 1. Managing the ordering of the two micro-ops in the pipeline, which >>> seems >>> to be the issue you're facing. >>> 2. Providing atomicity when you have multiple cores. >>> >>> I'm surprised you're having problems with #1, because that's the easy >>> part. >>> I'd assume that you'd have a direct data dependency between the micro-ops >>> (the load would write a register that the store reads, for the load to >>> pass >>> data to the store) which should enforce ordering. In addition, since >>> they're both accessing the same memory location, there shouldn't be any >>> reordering of the memory operations either. >>> >>> Providing atomicity in the memory system is the harder part. The x86 >>> atomic >>> RMW memory ops are implemented by setting LOCKED_RMW on both the load and >>> store operations (see >>> http://grok.gem5.org/source/xref/gem5/src/mem/request.hh#145, as well >>> as src/arch/x86/isa/microops/ldstop.isa). This works with
Re: [gem5-users] Micro-op Data Dependency
I'm still confused about the problems you're having. Stores should never be executed speculatively in O3, even without the non-speculative flag. Also, assuming the store micro-op reads a register that is written by the load micro-op, then that true data dependence through the intermediate register should enforce an ordering. Whether that destination register is also a source or not should be irrelevant, particularly in O3 where all the registers get renamed anyway. Perhaps if you show some snippets of your actual code it will be clearer to me what's going on. Steve On Fri, Jul 29, 2016 at 9:33 AM Alec Roelkewrote: > Yes, that sums up my issues. I haven't gotten to tackling the second one > yet; I'm still working on the first. Thanks for the patch link, though, > that should help a lot when I get to it. > > To be more specific, I can get it to work with either the minor CPU model > or the O3 model, but not both at the same time. To get it to work with the > O3 model, I added the "IsNonSpeculative" flag to the modify-write micro-op, > which I assumed would prevent the O3 model from speculating on its > execution (which I also had to do with regular store instructions to ensure > that registers containing addresses would have the proper values when the > instruction executed). This works, but when I use it in the minor CPU > model, it issues the modify-write micro-op before the read micro-op > executes, meaning it hasn't loaded the memory address from the register > file yet and causes a segmentation fault. > > I assume this is caused by the fact that the code for the read operation > doesn't reference any register, as the instruction writes the value that > was read from memory to a dest register before modifying it and writing it > back. Because the dest register can be the same as a source register, I > have to pass the memory value from the read micro-op to the modify-write > micro-op without writing it to a register to avoid potentially polluting > the data written back. > > My fix was to explicitly set the source and dest registers of both > micro-ops to what was decoded by the macro-op so GEM5 can infer > dependencies, but then when I try it using the O3 model, the modify-write > portion does not appear to actually write back to memory. > > On Fri, Jul 29, 2016 at 12:00 PM, wrote: > >> There are really two issues here, I think: >> >> 1. Managing the ordering of the two micro-ops in the pipeline, which seems >> to be the issue you're facing. >> 2. Providing atomicity when you have multiple cores. >> >> I'm surprised you're having problems with #1, because that's the easy >> part. >> I'd assume that you'd have a direct data dependency between the micro-ops >> (the load would write a register that the store reads, for the load to >> pass >> data to the store) which should enforce ordering. In addition, since >> they're both accessing the same memory location, there shouldn't be any >> reordering of the memory operations either. >> >> Providing atomicity in the memory system is the harder part. The x86 >> atomic >> RMW memory ops are implemented by setting LOCKED_RMW on both the load and >> store operations (see >> http://grok.gem5.org/source/xref/gem5/src/mem/request.hh#145, as well >> as src/arch/x86/isa/microops/ldstop.isa). This works with AtomicSimpleCPU >> and with Ruby, but there is no support for enforcing this atomicity in the >> classic cache in timing mode. I have a patch that provides this but you >> have to apply it manually: http://reviews.gem5.org/r/2691. >> >> Steve >> >> >> >> On Wed, Jul 27, 2016 at 9:10 AM Alec Roelke wrote: >> >> > Hello, >> > >> > I'm trying to add an ISA to gem5 which has several atomic >> > read-modify-write instructions. Currently I have them implemented as >> pairs >> > of micro-ops which read data in the first operation and then >> modify-write >> > in the second. This works for the simple CPU model, but runs into >> trouble >> > for the minor and O3 models, which want to execute the modify-write half >> > before the load half is complete. I tried forcing both parts of the >> > instruction to have the same src and dest register indices, but that >> causes >> > other problems with the O3 model. >> > >> > Is there a way to indicate that there is a data dependency between the >> two >> > micro-ops in the instruction? Or, better yet, is there a way I could >> > somehow have two memory accesses in one instruction without having to >> split >> > it into micro-ops? >> > >> > Thanks, >> > Alec Roelke >> > ___ >> > gem5-users mailing list >> > gem5-users@gem5.org >> > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> > -- next part -- >> An HTML attachment was scrubbed... >> URL: < >> http://m5sim.org/cgi-bin/mailman/private/gem5-users/attachments/20160728/dc22e5dd/attachment-0001.html >> > >> >
Re: [gem5-users] Micro-op Data Dependency
Yes, that sums up my issues. I haven't gotten to tackling the second one yet; I'm still working on the first. Thanks for the patch link, though, that should help a lot when I get to it. To be more specific, I can get it to work with either the minor CPU model or the O3 model, but not both at the same time. To get it to work with the O3 model, I added the "IsNonSpeculative" flag to the modify-write micro-op, which I assumed would prevent the O3 model from speculating on its execution (which I also had to do with regular store instructions to ensure that registers containing addresses would have the proper values when the instruction executed). This works, but when I use it in the minor CPU model, it issues the modify-write micro-op before the read micro-op executes, meaning it hasn't loaded the memory address from the register file yet and causes a segmentation fault. I assume this is caused by the fact that the code for the read operation doesn't reference any register, as the instruction writes the value that was read from memory to a dest register before modifying it and writing it back. Because the dest register can be the same as a source register, I have to pass the memory value from the read micro-op to the modify-write micro-op without writing it to a register to avoid potentially polluting the data written back. My fix was to explicitly set the source and dest registers of both micro-ops to what was decoded by the macro-op so GEM5 can infer dependencies, but then when I try it using the O3 model, the modify-write portion does not appear to actually write back to memory. On Fri, Jul 29, 2016 at 12:00 PM,wrote: > > There are really two issues here, I think: > > 1. Managing the ordering of the two micro-ops in the pipeline, which seems > to be the issue you're facing. > 2. Providing atomicity when you have multiple cores. > > I'm surprised you're having problems with #1, because that's the easy part. > I'd assume that you'd have a direct data dependency between the micro-ops > (the load would write a register that the store reads, for the load to pass > data to the store) which should enforce ordering. In addition, since > they're both accessing the same memory location, there shouldn't be any > reordering of the memory operations either. > > Providing atomicity in the memory system is the harder part. The x86 atomic > RMW memory ops are implemented by setting LOCKED_RMW on both the load and > store operations (see > http://grok.gem5.org/source/xref/gem5/src/mem/request.hh#145, as well > as src/arch/x86/isa/microops/ldstop.isa). This works with AtomicSimpleCPU > and with Ruby, but there is no support for enforcing this atomicity in the > classic cache in timing mode. I have a patch that provides this but you > have to apply it manually: http://reviews.gem5.org/r/2691. > > Steve > > > > On Wed, Jul 27, 2016 at 9:10 AM Alec Roelke wrote: > > > Hello, > > > > I'm trying to add an ISA to gem5 which has several atomic > > read-modify-write instructions. Currently I have them implemented as > pairs > > of micro-ops which read data in the first operation and then modify-write > > in the second. This works for the simple CPU model, but runs into > trouble > > for the minor and O3 models, which want to execute the modify-write half > > before the load half is complete. I tried forcing both parts of the > > instruction to have the same src and dest register indices, but that > causes > > other problems with the O3 model. > > > > Is there a way to indicate that there is a data dependency between the > two > > micro-ops in the instruction? Or, better yet, is there a way I could > > somehow have two memory accesses in one instruction without having to > split > > it into micro-ops? > > > > Thanks, > > Alec Roelke > > ___ > > gem5-users mailing list > > gem5-users@gem5.org > > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > -- next part -- > An HTML attachment was scrubbed... > URL: < > http://m5sim.org/cgi-bin/mailman/private/gem5-users/attachments/20160728/dc22e5dd/attachment-0001.html > > > ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Re: [gem5-users] Micro-op Data Dependency
There are really two issues here, I think: 1. Managing the ordering of the two micro-ops in the pipeline, which seems to be the issue you're facing. 2. Providing atomicity when you have multiple cores. I'm surprised you're having problems with #1, because that's the easy part. I'd assume that you'd have a direct data dependency between the micro-ops (the load would write a register that the store reads, for the load to pass data to the store) which should enforce ordering. In addition, since they're both accessing the same memory location, there shouldn't be any reordering of the memory operations either. Providing atomicity in the memory system is the harder part. The x86 atomic RMW memory ops are implemented by setting LOCKED_RMW on both the load and store operations (see http://grok.gem5.org/source/xref/gem5/src/mem/request.hh#145, as well as src/arch/x86/isa/microops/ldstop.isa). This works with AtomicSimpleCPU and with Ruby, but there is no support for enforcing this atomicity in the classic cache in timing mode. I have a patch that provides this but you have to apply it manually: http://reviews.gem5.org/r/2691. Steve On Wed, Jul 27, 2016 at 9:10 AM Alec Roelkewrote: > Hello, > > I'm trying to add an ISA to gem5 which has several atomic > read-modify-write instructions. Currently I have them implemented as pairs > of micro-ops which read data in the first operation and then modify-write > in the second. This works for the simple CPU model, but runs into trouble > for the minor and O3 models, which want to execute the modify-write half > before the load half is complete. I tried forcing both parts of the > instruction to have the same src and dest register indices, but that causes > other problems with the O3 model. > > Is there a way to indicate that there is a data dependency between the two > micro-ops in the instruction? Or, better yet, is there a way I could > somehow have two memory accesses in one instruction without having to split > it into micro-ops? > > Thanks, > Alec Roelke > ___ > gem5-users mailing list > gem5-users@gem5.org > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[gem5-users] Micro-op Data Dependency
Hello, I'm trying to add an ISA to gem5 which has several atomic read-modify-write instructions. Currently I have them implemented as pairs of micro-ops which read data in the first operation and then modify-write in the second. This works for the simple CPU model, but runs into trouble for the minor and O3 models, which want to execute the modify-write half before the load half is complete. I tried forcing both parts of the instruction to have the same src and dest register indices, but that causes other problems with the O3 model. Is there a way to indicate that there is a data dependency between the two micro-ops in the instruction? Or, better yet, is there a way I could somehow have two memory accesses in one instruction without having to split it into micro-ops? Thanks, Alec Roelke ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users