Karl Rupp <r...@iue.tuwien.ac.at> writes:
> Even though high core counts look promising in terms of FLOPs/Watt, the 
> tricky part is to actually use them efficiently... :-)
> I don't think we will see 64K cores by that time, though. Their inital 
> roadmap listed 1k cores for 2014 already, which is (imho) way to 
> aggressive...

Yeah, having done some more digging, I agree. And their memory model
seems alien to the OpenCL abstraction, with each "processing element"
having "private memory", but no directly shared memory across elements,
except for the global memory -- and ignoring their OpenCL extensions to
do memory access and copying across "private" memory segments. It also
seems that each processing element does not have to compute in lock-step
with each other, and so there is no concept of a "wavefront": this is
probably the reason for the different memory model.

Though alien, the memory model is interesting. The memory thoughput
within the processor has a theoretical maximum of 64 billion bytes/sec
(at 1GHz), but each core is only connected to the mesh by
nearest-neighbour connections. The main limitation seems to be the
limited amount of 'private memory' immediately accessible to each
core. I don't know how well that's mitigated by the mesh network.

As a poor student, I think I'll wait for the 64-core version to be more
widely available, and concentrate any investment for now on a new AMD
FM2+ APU, when they're around. I don't know if OpenCL 2.0 has a memory
model more appropriate for the Epiphany coprocessor -- but it is
interesting to have each processing element unconstrained to wavefronts.

> The academic program gives away 1 board to academic partners 
> (distributed equally) for every 100 boards sold. I actually prefer to 
> spend the ~100 USD rather than spending time on writing a partnership 
> proposal just to get a first device for experimentation. If our tests 
> succeed, we may still try to enter the program in order to build up a 
> small cluster over time.

I think a cluster of Parallella boards will be most interesting,
especially connected by their proprietary expansion connector. The host
runtime will have to be pretty intelligent to distribute load across
that efficiently. I suspect such a cluster to be best at a fairly
heterogenous load; both task- and data-parallel.. It'll be interesting
also to see the OpenCL abstraction mapped to such hardware.

> Haven't heard of it before, no. There are too many language attempts out 
> there, so I don't even try to keep track of all of them...

Quite. I was particularly excited about that one, though...


Anyway, back to ViennaCL for now..!

Toby


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to