On Mon, Oct 26, 2009 at 5:45 PM, Petter Urkedal <[email protected]> wrote: > On 2009-10-26, Timothy Normand Miller wrote: >> Also, no flags. I'm very much a fan of the MIPS approach because it >> doesn't create any of these extra inter-instruction dependencies >> through a narrow straw. > > Yes. I think read too much out of the Google doc. > >> > 3. Finally on this chain of though, f2u and f2i will be equivalent >> > except for overflow detection. Again we can use the carry flag for the >> > unsigned case and overflow for the signed case. >> >> Without any flags, we might need the extra instructions. Or we could >> support only f2i and let software do a branch on the sign bit for >> clamping. > > Ah, I though we'd do modulus 2^32 truncation (or whatever it's called), > but now that you mention it, clamping the result is much more useful for > graphics, and in that case I think it's best to keep the current spec. > >> > 4. I like the idea of adding minimum and maximum functions in the >> > instruction set; if we need them, that is. They should not require much >> > logic. But in this case, note that there is a difference between signed >> > and unsigned. Do we want both? On the other hand it only takes 2 to 3 >> > cycles to compute any of these if my idea of the instruction set is >> > correct. >> >> We CAN implement this with branches, which are a bit cheaper because >> we don't have a delay slot. We'll softof automatically have both >> signed and unsigned variants of branch instructions in some cases. > > I'm hoping we can support a fairly complete set of branch conditions. > If we have space in the instruction world, conditional write-back would > reduce min and max to two instructions, that seems questionable at the > moment. Still, I'm not against min/max primitive if they're common.
We can either have a write-back flag, or we can make r0 the bit bucket. > >> Will we need special float branch instructions? We may need special >> ge/le instructions as usual since we can't recover those from the >> difference, but for floats and ints, the sign bits are in the same >> place and zero is always zero. > > You are not thinking of two-operand compare-and-branch instructions, but > rather test instructions which store 0 or 1? I outlined an option of > using bits 32 and 33 of the BRAM below before I realised what you meant. Well, actually, that's an excellent idea. The sign bit is only one bit, so we don't need anything special for that. When doing write-back we can check to see if the value is zero and set a flag bit. We could also store whether or not the computation was the result of an overflow. This would be like processor flags, but one set for each register. (And since we get 4 bits free, why not use them!) The main problem with the overflow bit is that it would get lost on a context switch. Right now, we can't do context switches, so it doesn't matter. But what if we change our minds in the future? Having a zero flag is a good idea since it's easy to recompute every time. I'm not bothered by having a few extra compare instructions. Most will just rely on subtract, but some signed ones will require special compares, which are know about from the MIPS instruction set. Here are some ideas for "summary flags" that make use of the extra bits in the register word: - zero - fp infinity - fp NaN I'm not sure what else. We have sign for free. We can live without overflow. What else might we want to know about quickly? Note that when doing integer math, we'd compute the fp flags anyhow; they'd just be meaningless. > >> > 5. To complete the set of shift instructions we need the arithmetic and >> > logic shift distinction (aka signed and unsigned shifts). >> >> There are just three. Andre wants to use the multiplier as a way to >> do shifts. I like that idea. However, the problem I see is that >> while it's easy to do left shifts, right shifts are another matter. >> If we have a 64-bit product, then it's just 32-shift and use the upper >> 32 bits. But if we have only a 48-bit product, we can't do that. > > I'm not convinced about using the multiplier. Will we support constants > in the instruction word? I though we would not, and then these shifts > would turn into two instructions. I haven't thought enough about packing bits into instructions. With 256 regs, we need three 8 bits values for the typical case. We could also have a 16-bit immediate load. If we have enough opcodes, we could include some that have 8-bit immediates. But I'm not afraid of living without them. Immediates will be commonly used, so they could speed up some code. But I'm also not afraid of using extra instructions. We don't know where the bottlenecks will be. If they're memory, then extra instructions don't matter. > Also, these are constant shifts only. > In HQ we did variable shifts and even treated negative and big exponents > (RHSs) correctly. I don't think it costed us that much hardware. Why do they have to be constant shifts? We can use a decoder to turn a binary value into a one-hot and make that the multiplier. The decoder plus the multiplier will be smaller than a barrel shifter, especially considering that the multiplier has multiple uses. >> > 6. There may be more flags than we need. For integer division by >> > zero we can use the overflow, since that's the only way a division can >> > be out of range. For floating point division by zero, infinity seems >> > like a natural choice, but I recall there was some discussion on the >> > list some time ago about special requirement for the Inf and NaN >> > semantics for rendering so I'm not sure whether we need to differentiate >> > Inf and zero division for float. >> >> Yeah. I hate flags. Let's see if we can do without them. > > Okay, no flags. > >> > 7. Assuming we've agreed to not run threads in dependent groups, do we >> > care about the loop unit? Maybe a decrement-and-jump-if-nonzero would >> > do? >> >> MAYBE. But without it, it's only an overhead of two instructions. >> With the special instruction, it's one instruction. And with loop >> hardware, maybe we can do it in zero. But those one or two >> instructions only matter much if the loop is REALLY short. And if it >> is, we can sometimes unroll. > > Okay, I can see this may be of interest, since we also want to save > program space. Well, we do. Ultimately. I'm all in favor of reserving opcodes that we don't implement right away. At some point, we can do an analysis to determine which opcodes will have the greatest performance impact with the least additional area and add them to the design one or a few at a time. >> > 8. We are going with conventional flag-based branch instructions, >> > right? How many bits of address do we need in the branch instructions? >> > Do we need computed branches? >> >> (a) Not if we can avoid it. Mind you, in this case, it's not so evil, >> but I still don't like it. Convince me otherwise by showing me how it >> makes both the hardware simpler and the software faster. > > No, I agree. We should have space for at least one operand in the > instruction word for branches. The flag-less ISA naturally supports the > negative and zero flags, but lacks the equivalent of carry and overflow > flags for integer. Floating point have Inf and NaN, so if we can > condition on these, that'll pass for overflow. Integer is more tricky. > We could utilise the extra width of the BRAM register, and put the two > flags in bit 32 and 33. MIPS is able to do all the various comparisons with a handful of extra instructions. >> (b) Ah, bits in address. Good question. It's fairly long. We need >> to support lengthy shader programs. Note that we no longer have a >> local program file; instead, we have a proper icache. > > As mentioned above, we could differentiate long condition-less jumps and > short relative branches if we need to. Sure. >> (c) Someone else will have to tell us about computed branches. > > (Just quoting this as a reminder to the readers.) > _______________________________________________ > Open-graphics mailing list > [email protected] > http://lists.duskglow.com/mailman/listinfo/open-graphics > List service provided by Duskglow Consulting, LLC (www.duskglow.com) > -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
