Karl Rupp <r...@iue.tuwien.ac.at> writes: > Even though high core counts look promising in terms of FLOPs/Watt, the > tricky part is to actually use them efficiently... :-) > I don't think we will see 64K cores by that time, though. Their inital > roadmap listed 1k cores for 2014 already, which is (imho) way to > aggressive...
Yeah, having done some more digging, I agree. And their memory model seems alien to the OpenCL abstraction, with each "processing element" having "private memory", but no directly shared memory across elements, except for the global memory -- and ignoring their OpenCL extensions to do memory access and copying across "private" memory segments. It also seems that each processing element does not have to compute in lock-step with each other, and so there is no concept of a "wavefront": this is probably the reason for the different memory model. Though alien, the memory model is interesting. The memory thoughput within the processor has a theoretical maximum of 64 billion bytes/sec (at 1GHz), but each core is only connected to the mesh by nearest-neighbour connections. The main limitation seems to be the limited amount of 'private memory' immediately accessible to each core. I don't know how well that's mitigated by the mesh network. As a poor student, I think I'll wait for the 64-core version to be more widely available, and concentrate any investment for now on a new AMD FM2+ APU, when they're around. I don't know if OpenCL 2.0 has a memory model more appropriate for the Epiphany coprocessor -- but it is interesting to have each processing element unconstrained to wavefronts. > The academic program gives away 1 board to academic partners > (distributed equally) for every 100 boards sold. I actually prefer to > spend the ~100 USD rather than spending time on writing a partnership > proposal just to get a first device for experimentation. If our tests > succeed, we may still try to enter the program in order to build up a > small cluster over time. I think a cluster of Parallella boards will be most interesting, especially connected by their proprietary expansion connector. The host runtime will have to be pretty intelligent to distribute load across that efficiently. I suspect such a cluster to be best at a fairly heterogenous load; both task- and data-parallel.. It'll be interesting also to see the OpenCL abstraction mapped to such hardware. > Haven't heard of it before, no. There are too many language attempts out > there, so I don't even try to keep track of all of them... Quite. I was particularly excited about that one, though... Anyway, back to ViennaCL for now..! Toby ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel