On Mon, Mar 18, 2013 at 10:22:42PM -0400, Andre Pouliot wrote: > 2013/3/18 Troy Benjegerdes <[email protected]>: > >> > R0-R7: global, shared, constant, writeable only by host CPU > >> > R8-R15: global, shared, scratchpad, writes are broadcast to all others > >> > R16-RXX: regular, thread-context registers > >> > > >> > The compelling advantage over a memory scratchpad is that even though you > >> > can 'hide' latency, *its still there*, you've just hidden the problem. > >> > > >> > >> The only difference is energy, although lower energy is a solid argument. > >> We want maximum throughput per unit area and maximum throughput per watt. > >> (And incidentally, we often assume power and area are linearly related, > >> for back-of-the-envelope calculations.) > >> > >> > >> > > >> > There's no latency or pipeline hazards on the writes, and register > >> > latency > >> > on the reads. It would be excessively convenient to do a really clean > >> > 'barrier()' implementation by writing to the broadcast/scratchpad > >> > register > >> > and knowing that you will not see the result of the write until it has > >> > been > >> > broadcast and visible to every other compute element. > >> > > >> > >> I've investigated barriers before. See my Booster and VRSync papers. > >> They're a pain all-around, and I'd rather we found ways to avoid them. I > >> can see an argument for them in HPC workloads, but for graphics workloads, > >> I think we should find another solution. > > > > > > Personally, I think the solution is to include the voltage regulator on the > > chip and tell it to turn on the juice a few cycles ahead of when all the > > cores wake up. > > > > If the voltage regulator has lookahead into the barrier/broadcast sync logic > > you should be able to know everyone is going to wake up (or is likely to > > wake > > up), and boost the voltage ahead of, or even simultaneously to the power > > spike. > > > > Given how often I see GPUs mentioned in the HPC context, designing only for > > graphics workloads sounds like a bad idea. > > Voltage regulator on a cpu sorry to tell you is simply a bad idea. The > process isn't the same, a regulator produce a lot of heat or the most > efficient one need large passive component that can't be integrate in > an effective manner. > > Also enabling a look ahead to power up a regulator require some > serious look-up in the sync logic. The workload speed and the > regulator speed of reaction are totally different. Regulator we are > talking tens of millisecond logic we are talking nanosecond cycle > time. The time for the regulator modulation based on workload would be > totally out of sync.
Intel seems to think this is a good idea: http://www.xbitlabs.com/news/cpu/display/20121226225930_Intel_s_Haswell_to_Feature_Secrete_Weapon_Integrated_Voltage_Regulator.html Now have a look at http://powergoldconsultant.com/photogallery.html and it won't be long before the power transistors are etched on the backside of the CPU silicon, or flip-chip bonded. The I-R drop at 25-100 amps in bond wires creates more heat than a high frequency 99% efficient switchmode converter would. -->COPYRIGHT/IP NOTICE/Submarine patent defense: The following text describing a power regulation algorithm is Copright 2013 Troy Benjegerdes and a derivative work of the patent-pending q3ube IP. Available under AGPLv3 terms. (sorry to be pedantic, I need to make it quite clear this is open/libre hardware/algorithm/software) The 'look-ahead' logic can be as simple as a 'boost' input to the regulator analog feedback section that has a voltage corresponding to some function of the utilization of reads and/or writes to the global shared register set. Or maybe, in patent-claim-ish-terms 1) a method for controlling on-chip voltage regulators to react to step changes in chip power consumption 2) the method of claim 1 in which an input to the regulation feedback loop is a function of utilization of a chip-level synchronization network 3) a high-performance computing system using the method of claim 1) and 2) to manage voltage margin across and entire large-scale computing cluster of many compute elements -->END IP NOTICE _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
