On Mon, Mar 18, 2013 at 10:01 PM, Troy Benjegerdes <[email protected]> wrote:

> > > R0-R7: global, shared, constant, writeable only by host CPU
> > > R8-R15: global, shared, scratchpad, writes are broadcast to all others
> > > R16-RXX: regular, thread-context registers
> > >
> > > The compelling advantage over a memory scratchpad is that even though
> you
> > > can 'hide' latency, *its still there*, you've just hidden the problem.
> > >
> >
> > The only difference is energy, although lower energy is a solid argument.
> >  We want maximum throughput per unit area and maximum throughput per
> watt.
> >  (And incidentally, we often assume power and area are linearly related,
> > for back-of-the-envelope calculations.)
> >
> >
> > >
> > > There's no latency or pipeline hazards on the writes, and register
> latency
> > > on the reads. It would be excessively convenient to do a really clean
> > > 'barrier()' implementation by writing to the broadcast/scratchpad
> register
> > > and knowing that you will not see the result of the write until it has
> been
> > > broadcast and visible to every other compute element.
> > >
> >
> > I've investigated barriers before.  See my Booster and VRSync papers.
> >  They're a pain all-around, and I'd rather we found ways to avoid them.
>  I
> > can see an argument for them in HPC workloads, but for graphics
> workloads,
> > I think we should find another solution.
>
>
> Personally, I think the solution is to include the voltage regulator on the
> chip and tell it to turn on the juice a few cycles ahead of when all the
> cores wake up.
>

Until recently, on-die regulators have been horrid.  Less than 50%
conversion efficiency.  I've heard that a researcher at Rochester (IIRC)
recently had some breakthroughs in that area, but I haven't had time to
check them out.


>
> If the voltage regulator has lookahead into the barrier/broadcast sync
> logic
> you should be able to know everyone is going to wake up (or is likely to
> wake
> up), and boost the voltage ahead of, or even simultaneously to the power
> spike.
>

I think that even on-die regulators can't adjust voltage that quickly.
 It's something we tinkered with in simulation before opting for dual-rail
instead.


>
> Given how often I see GPUs mentioned in the HPC context, designing only for
> graphics workloads sounds like a bad idea.
>

Sure, but we're not trying to do a kitchen sink here either (yet!).  The
OGP's goal is an open source GRAPHICS CARD.  My research goal is to provide
people with tools they can use to experiment with GPUs.  An obvious benefit
of this is that someone should be able to take what we've done and modify
it for different contexts.  But we need to take this one step at a time.
 Researcher X wonders what would happen if nVidia processors (which
outperform ours by orders of magnitude) supported feature Y, so he takes
our design (which is open and can be hacked) and tries it both ways;
although it's not the same as modifying the nVidia hardware, it's highly
informative.

We're not being overly restrictive.  We're just trying to make relatively
narrow milestones that we can meet so that we get working products sooner,
even if they're more limited.  In the long run, we won't get your super HPC
evolved-from-GPU chip any sooner regardless of which path we take.

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to