On Thu, Nov 08, 2012 at 04:25:11PM -0500, Timothy Normand Miller wrote:
> On Thu, Nov 8, 2012 at 3:43 PM, Troy Benjegerdes <[email protected]> wrote:
> 
> > > > 1. Do you need dedicated RAM, and in that case, which size and
> > bandwidth?
> > > >
> > >
> > > Actually, I'd prefer to have unified RAM and moreover, I'd prefer to have
> > > virtual addressing for the GPU.
> >
> > <kernel developer hat>
> > If you have userspace applications pass the GPU virtual addresses through
> > something like what Infiniband uses, you save us ALL a lot of debugging
> > headache, and make it fast
> > </kernel dev>
> >
> > From a hardware point of view, we should have a single TLB/cache/VM
> > subsystem, and save duplicating the same functionality in two places.
> >
> 
> Virtual memory in GPUs is an active research project.  Although existing
> GPUs support address translation, none support demand paging, because until
> now, no GPU has supported precise exceptions, and doing a GPU context
> switch is a challenge.  I'm not going to try to solve this either, so we'll
> have to include the standard requirement that all pages touched by the GPU
> must be paged in ahead of time and pinned.
> 
> As for Infiniband, I don't know its relation to virtual addresses.  Also,
> I'm not sure that the GPU and CPU can necessarily share the TLB systems.
>  Those are generally integrated into L1 caching systems.  Virtually
> indexed, physically tagged (VIPT) is popular in CPUs for latency reasons,
> while in the GPU, with all SPs sharing the same address space, we'd likely
> do the virtual to physical translation before or after the last-level
> cache.  Put another way, GPU memory systems are quite different and have
> different optimization goals.

For an initial opencores asic system, I think it would be significantly 
easier to have a single unified TLB and L1/L2/etc/cache, with some sort
of pre-pinned hugepage support, and if the GPU ever triggers a TLB miss,
then the appropriate response is probably segfault the process that 'owns'
that virtual address.

This is a somewhat old (linux kernel dev) oriented article on RDMA, which
I think is worthwhile: http://lwn.net/Articles/133649/

There's also some rather interesting stuff here about GPUDirect for clusters
https://developer.nvidia.com/content/trenches-gtc-cuda-5-and-beyond

which amounts to one GPU doing RDMA directly to another GPU on a different
node. This is way out of scope for the first opencores ASIC, but if we can
get to one that includes InfiniBand, or decide to include iWarp (RDMA over
ethernet) 
http://www.intel.com/content/www/us/en/network-adapters/high-performance-computing.html

then things could get rather interesting fast.

> 
> 
> >
> > >
> > >
> > > > 2. How many gates is the current design?
> >
> > [snip]
> >
> > > > 3. Is it in a state that could be targeting an ASIC right now, or do
> > > > you need more functionality and verification?
> > > >
> >
> > What about OGD-1? Does svn://svn.opengraphics.org/ogp have working HDL?
> >
> 
> Yes, but no GPU.  Absolutely everything but a rendering engine is in there.
> 
> 
> > I'm also curious about project VGA: http://wacco.mveas.com/
> 
> 
> I don't think this had rendering either, or if it did, it was simple 2D.
> 
> 
> >
> >
> > There are a lot of embedded applications where a basic framebuffer with
> > HDMI/Displayport/LCD driver output would be rather quite fantastic.
> >
> 
> Sure.  But you don't need a GPU for this.  If all you want is a dumb
> framebuffer and software rendering, we already have plenty of logic you can
> adapt.
> 
> 
> >
> > >
> > > > 4. We are not sure yet if we will be targeting an ASIC with gigabit
> > > > transceivers. Would that be a requirement?
> > > >
> > >
> > > I think that even if you decide that we're not what you're after, I think
> > > your project would be an excellent design target.  How would you feel
> > about
> > > a design that was appropriate for embedded devices?  Look at the PowerVR
> > > designs used in mobile devices; only like 4 shader engines.
> > >
> > > The only gigabit transceivers I can think of would be for if you were to
> > > incorporate DVI encoding directly into your ASIC.
> >
> > I'm the one running around ranting about Gigabit transcievers, but for
> > network, either Ethernet, or https://bitbucket.org/dahozer/infiniband-fpga
> >
> > There are plenty of PHY chips that do gig-E and DVI, so that might be
> > a quicker short-term, but long-term I'd really like to do the mixed-signal
> > analog design and serdes design for Ghz transcievers as a full open-source
> > project if I can find the tools.
> >
> > I think we need to start with *something* that's raspberry-pi like, and
> > work on delivering new silicon revs once a month, and by the time Tim gets
> > done with OGD2 HDL simulation, we're going to end up with a pretty smoking
> > fast platform.
> >
> > Has anyone here ever used http://www.mosis.com/ ? Any idea what a run of
> > say 100 or 1000 devices costs? I don't think it really matters how many
> > we make, or if they even work the first time, so long as we can keep the
> > cost down, and keep the pipeline full with 1 new rev a month.
> >
> 
> I'm also going to see if I can get the NSF or ES2 or SRC or some
> organization to fund fabrication, but for research purposes, so we won't
> get many chips out of it.  But we would get masks that we could reuse, and
> for a better process tech than mosis is likely to deliver (although that
> depends on what foundries actually get used and how your chips are packaged
> with others' designs, etc.).

Most of the hardware I use on a day to day basis is a rather old process 
tech, and frankly, I'd rather have an older, slower process if it meant
we had open-source masks for a full SOC, even if it's only an 800mhz clock
rate with a single shader.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to