2013/3/18 Troy Benjegerdes <[email protected]>:
>> > R0-R7: global, shared, constant, writeable only by host CPU
>> > R8-R15: global, shared, scratchpad, writes are broadcast to all others
>> > R16-RXX: regular, thread-context registers
>> >
>> > The compelling advantage over a memory scratchpad is that even though you
>> > can 'hide' latency, *its still there*, you've just hidden the problem.
>> >
>>
>> The only difference is energy, although lower energy is a solid argument.
>>  We want maximum throughput per unit area and maximum throughput per watt.
>>  (And incidentally, we often assume power and area are linearly related,
>> for back-of-the-envelope calculations.)
>>
>>
>> >
>> > There's no latency or pipeline hazards on the writes, and register latency
>> > on the reads. It would be excessively convenient to do a really clean
>> > 'barrier()' implementation by writing to the broadcast/scratchpad register
>> > and knowing that you will not see the result of the write until it has been
>> > broadcast and visible to every other compute element.
>> >
>>
>> I've investigated barriers before.  See my Booster and VRSync papers.
>>  They're a pain all-around, and I'd rather we found ways to avoid them.  I
>> can see an argument for them in HPC workloads, but for graphics workloads,
>> I think we should find another solution.
>
>
> Personally, I think the solution is to include the voltage regulator on the
> chip and tell it to turn on the juice a few cycles ahead of when all the
> cores wake up.
>
> If the voltage regulator has lookahead into the barrier/broadcast sync logic
> you should be able to know everyone is going to wake up (or is likely to wake
> up), and boost the voltage ahead of, or even simultaneously to the power
> spike.
>
> Given how often I see GPUs mentioned in the HPC context, designing only for
> graphics workloads sounds like a bad idea.

Voltage regulator on a cpu sorry to tell you is simply a bad idea. The
process isn't the same, a regulator produce a lot of heat or the most
efficient one need large passive component that can't be integrate in
an effective manner.

Also enabling a look ahead to power up a regulator require some
serious look-up in the sync logic. The workload speed and the
regulator speed of reaction are totally different. Regulator we are
talking tens of millisecond  logic we are talking nanosecond cycle
time. The time for the regulator modulation based on workload would be
totally out of sync.

What's could currently be done is voltage and frequency variation on
_overall_ processor utilization.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to