Just to clarify a point, although L2's are often shared, L1's are (almost)
by definition private.  Caches are already a performance bottleneck, and
trying to share L1s makes it even worse.  The only times I've seem this
worthwhile are when the L1 and CPU are at different voltages.  For instance
at near threshold voltage, the minimum energy point is different for random
logic and SRAM, making the SRAM about 3 times faster than the CPU, at which
point it's worth sharing the L1 cache.

As for GPUs, they can't make a lot of use of L1D, because temporal locality
for pixel data is almost non-existant.  However, they definitely need
L1I's, private to each streaming multiprocessor.

So, what I envision giving you, for your embedded design, is a single
streaming multiprocessor that can run 64 threads (4 wide, 16 deep) for
fragments.  All of those will share a single L1I, but that's completely
separate from the CPU's L1 caches.  And then a duplicate module for
vertices, optionally.  (And since the architecture is open, that division
is merely a loose recommendation.)  These will have to interface with your
memory system, and if that's virtually addressed, that's just great.  For
your memory system, I'm expecting to be able to queue up read requests,
where each read address comes along with a tag that indicates the
requestor, and the read data them comes back in another queue with the
identifying tags.  Other analogous solutions can be implemented, but the
main concern is that since there are many requestors, we need a way to
identify where the read data has to go back to.


On Sun, Nov 11, 2012 at 8:17 PM, Troy Benjegerdes <[email protected]> wrote:

> On Thu, Nov 08, 2012 at 04:25:11PM -0500, Timothy Normand Miller wrote:
> > On Thu, Nov 8, 2012 at 3:43 PM, Troy Benjegerdes <[email protected]>
> wrote:
> >
> > > > > 1. Do you need dedicated RAM, and in that case, which size and
> > > bandwidth?
> > > > >
> > > >
> > > > Actually, I'd prefer to have unified RAM and moreover, I'd prefer to
> have
> > > > virtual addressing for the GPU.
> > >
> > > <kernel developer hat>
> > > If you have userspace applications pass the GPU virtual addresses
> through
> > > something like what Infiniband uses, you save us ALL a lot of debugging
> > > headache, and make it fast
> > > </kernel dev>
> > >
> > > From a hardware point of view, we should have a single TLB/cache/VM
> > > subsystem, and save duplicating the same functionality in two places.
> > >
> >
> > Virtual memory in GPUs is an active research project.  Although existing
> > GPUs support address translation, none support demand paging, because
> until
> > now, no GPU has supported precise exceptions, and doing a GPU context
> > switch is a challenge.  I'm not going to try to solve this either, so
> we'll
> > have to include the standard requirement that all pages touched by the
> GPU
> > must be paged in ahead of time and pinned.
> >
> > As for Infiniband, I don't know its relation to virtual addresses.  Also,
> > I'm not sure that the GPU and CPU can necessarily share the TLB systems.
> >  Those are generally integrated into L1 caching systems.  Virtually
> > indexed, physically tagged (VIPT) is popular in CPUs for latency reasons,
> > while in the GPU, with all SPs sharing the same address space, we'd
> likely
> > do the virtual to physical translation before or after the last-level
> > cache.  Put another way, GPU memory systems are quite different and have
> > different optimization goals.
>
> For an initial opencores asic system, I think it would be significantly
> easier to have a single unified TLB and L1/L2/etc/cache, with some sort
> of pre-pinned hugepage support, and if the GPU ever triggers a TLB miss,
> then the appropriate response is probably segfault the process that 'owns'
> that virtual address.
>
> This is a somewhat old (linux kernel dev) oriented article on RDMA, which
> I think is worthwhile: http://lwn.net/Articles/133649/
>
> There's also some rather interesting stuff here about GPUDirect for
> clusters
> https://developer.nvidia.com/content/trenches-gtc-cuda-5-and-beyond
>
> which amounts to one GPU doing RDMA directly to another GPU on a different
> node. This is way out of scope for the first opencores ASIC, but if we can
> get to one that includes InfiniBand, or decide to include iWarp (RDMA over
> ethernet)
>
> http://www.intel.com/content/www/us/en/network-adapters/high-performance-computing.html
>
> then things could get rather interesting fast.
>
> >
> >
> > >
> > > >
> > > >
> > > > > 2. How many gates is the current design?
> > >
> > > [snip]
> > >
> > > > > 3. Is it in a state that could be targeting an ASIC right now, or
> do
> > > > > you need more functionality and verification?
> > > > >
> > >
> > > What about OGD-1? Does svn://svn.opengraphics.org/ogp have working
> HDL?
> > >
> >
> > Yes, but no GPU.  Absolutely everything but a rendering engine is in
> there.
> >
> >
> > > I'm also curious about project VGA: http://wacco.mveas.com/
> >
> >
> > I don't think this had rendering either, or if it did, it was simple 2D.
> >
> >
> > >
> > >
> > > There are a lot of embedded applications where a basic framebuffer with
> > > HDMI/Displayport/LCD driver output would be rather quite fantastic.
> > >
> >
> > Sure.  But you don't need a GPU for this.  If all you want is a dumb
> > framebuffer and software rendering, we already have plenty of logic you
> can
> > adapt.
> >
> >
> > >
> > > >
> > > > > 4. We are not sure yet if we will be targeting an ASIC with gigabit
> > > > > transceivers. Would that be a requirement?
> > > > >
> > > >
> > > > I think that even if you decide that we're not what you're after, I
> think
> > > > your project would be an excellent design target.  How would you feel
> > > about
> > > > a design that was appropriate for embedded devices?  Look at the
> PowerVR
> > > > designs used in mobile devices; only like 4 shader engines.
> > > >
> > > > The only gigabit transceivers I can think of would be for if you
> were to
> > > > incorporate DVI encoding directly into your ASIC.
> > >
> > > I'm the one running around ranting about Gigabit transcievers, but for
> > > network, either Ethernet, or
> https://bitbucket.org/dahozer/infiniband-fpga
> > >
> > > There are plenty of PHY chips that do gig-E and DVI, so that might be
> > > a quicker short-term, but long-term I'd really like to do the
> mixed-signal
> > > analog design and serdes design for Ghz transcievers as a full
> open-source
> > > project if I can find the tools.
> > >
> > > I think we need to start with *something* that's raspberry-pi like, and
> > > work on delivering new silicon revs once a month, and by the time Tim
> gets
> > > done with OGD2 HDL simulation, we're going to end up with a pretty
> smoking
> > > fast platform.
> > >
> > > Has anyone here ever used http://www.mosis.com/ ? Any idea what a run
> of
> > > say 100 or 1000 devices costs? I don't think it really matters how many
> > > we make, or if they even work the first time, so long as we can keep
> the
> > > cost down, and keep the pipeline full with 1 new rev a month.
> > >
> >
> > I'm also going to see if I can get the NSF or ES2 or SRC or some
> > organization to fund fabrication, but for research purposes, so we won't
> > get many chips out of it.  But we would get masks that we could reuse,
> and
> > for a better process tech than mosis is likely to deliver (although that
> > depends on what foundries actually get used and how your chips are
> packaged
> > with others' designs, etc.).
>
> Most of the hardware I use on a day to day basis is a rather old process
> tech, and frankly, I'd rather have an older, slower process if it meant
> we had open-source masks for a full SOC, even if it's only an 800mhz clock
> rate with a single shader.
>



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/<http://www.cse.ohio-state.edu/~millerti>
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to