Re: [Open-graphics] Another architecture debate: Write-back flag vs. bit bucket

Troy Benjegerdes Mon, 18 Mar 2013 21:34:10 -0700

On Mon, Mar 18, 2013 at 10:22:42PM -0400, Andre Pouliot wrote:
> 2013/3/18 Troy Benjegerdes <[email protected]>:
> >> > R0-R7: global, shared, constant, writeable only by host CPU
> >> > R8-R15: global, shared, scratchpad, writes are broadcast to all others
> >> > R16-RXX: regular, thread-context registers
> >> >
> >> > The compelling advantage over a memory scratchpad is that even though you
> >> > can 'hide' latency, *its still there*, you've just hidden the problem.
> >> >
> >>
> >> The only difference is energy, although lower energy is a solid argument.
> >>  We want maximum throughput per unit area and maximum throughput per watt.
> >>  (And incidentally, we often assume power and area are linearly related,
> >> for back-of-the-envelope calculations.)
> >>
> >>
> >> >
> >> > There's no latency or pipeline hazards on the writes, and register 
> >> > latency
> >> > on the reads. It would be excessively convenient to do a really clean
> >> > 'barrier()' implementation by writing to the broadcast/scratchpad 
> >> > register
> >> > and knowing that you will not see the result of the write until it has 
> >> > been
> >> > broadcast and visible to every other compute element.
> >> >
> >>
> >> I've investigated barriers before.  See my Booster and VRSync papers.
> >>  They're a pain all-around, and I'd rather we found ways to avoid them.  I
> >> can see an argument for them in HPC workloads, but for graphics workloads,
> >> I think we should find another solution.
> >
> >
> > Personally, I think the solution is to include the voltage regulator on the
> > chip and tell it to turn on the juice a few cycles ahead of when all the
> > cores wake up.
> >
> > If the voltage regulator has lookahead into the barrier/broadcast sync logic
> > you should be able to know everyone is going to wake up (or is likely to 
> > wake
> > up), and boost the voltage ahead of, or even simultaneously to the power
> > spike.
> >
> > Given how often I see GPUs mentioned in the HPC context, designing only for
> > graphics workloads sounds like a bad idea.
> 
> Voltage regulator on a cpu sorry to tell you is simply a bad idea. The
> process isn't the same, a regulator produce a lot of heat or the most
> efficient one need large passive component that can't be integrate in
> an effective manner.
> 
> Also enabling a look ahead to power up a regulator require some
> serious look-up in the sync logic. The workload speed and the
> regulator speed of reaction are totally different. Regulator we are
> talking tens of millisecond  logic we are talking nanosecond cycle
> time. The time for the regulator modulation based on workload would be
> totally out of sync.


Intel seems to think this is a good idea:

http://www.xbitlabs.com/news/cpu/display/20121226225930_Intel_s_Haswell_to_Feature_Secrete_Weapon_Integrated_Voltage_Regulator.html

Now have a look at http://powergoldconsultant.com/photogallery.html and
it won't be long before the power transistors are etched on the backside
of the CPU silicon, or flip-chip bonded.

The I-R drop at 25-100 amps in bond wires creates more heat than a high
frequency 99% efficient switchmode converter would.

-->COPYRIGHT/IP NOTICE/Submarine patent defense: The following text describing a
power regulation algorithm is Copright 2013 Troy Benjegerdes and a derivative
work of the patent-pending q3ube IP. Available under AGPLv3 terms.  (sorry to
be pedantic, I need to make it quite clear this is open/libre 
hardware/algorithm/software)

The 'look-ahead' logic can be as simple as a 'boost' input to the regulator 
analog feedback section that has a voltage corresponding to some function of
the utilization of reads and/or writes to the global shared register set.

Or maybe, in patent-claim-ish-terms

1) a method for controlling on-chip voltage regulators to react to step changes
in chip power consumption

2) the method of claim 1 in which an input to the regulation feedback loop is
a function of utilization of a chip-level synchronization network

3) a high-performance computing system using the method of claim 1) and 2) to
manage voltage margin across and entire large-scale computing cluster of many
compute elements

-->END IP NOTICE
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Another architecture debate: Write-back flag vs. bit bucket

Reply via email to