Re: [X10-users] Running X10 on GPUs, Cell, and Larrabee

Olivier Pernet Tue, 21 Apr 2009 18:45:03 -0700

On Mon, Apr 20, 2009 at 11:47, Vijay Saraswat <[email protected]> wrote:
> Thanks for your message.
>
> The core functionality that X10 provides -- places, asynchronous execution
> and a framework to control asynchrony -- can be used to support programmer
> controlled caches as in the Cell.
>
> Last year in a project internal to IBM we ran an X10 Monte Carlo style
> program on a Cell blade and showed efficiency comparable to C. The
> programmer used finish, async etc to move data back and forth between memory
> and CPU. The biggest issue we faced in getting performance was the need for
> the programmer to use SIMD operations to extract performance out of SPUs.


How did you solve this? Is support for arithmetic operations on arrays
(compiling down to SIMD operations) planned in X10?

> This year we are aiming to have X10 programs (such as programs for
> Black-Scholes, some machine learning kernels) running on NVidia GPUs with
> performance comparable to native CUDA performance.
>
> We share with you the vision of using essentially the same source program to
> deliver good performance on Cell blades, GPUs, multicores and clusters. This
> is the big potential for X10 and the Asynchronous Partitioned Global Address
> Space (APGAS) programming model.
>
> Remote asyncs are essentially messages -- messages with a code pointer
> specifying what code must be executed on the remote side when the message
> arises (i.e. active messages).

Right, but the language constructs influence the way people program.
Since X10 chose to make cross-place communication explicit (as opposed
to Chapel), I feel like messages would be even clearer... How did you
choose the async(place) { } syntax? Did you consider other things?

> If one were to implement the join calculus on top of X10 it would be best to
> restrict attention so that a reduction references only elements in a single
> place. This is the underlying philosophy in X10 -- synchronous actions
> should be confined to a single place as far as possible. Thus atomic
> operations are permitted to access only mutable locations in the current
> place. (We have designed multi-place atomics, with statically determined
> bounded set of places, but not yet implemented them.)

I find the join calculus an elegant way to coordinate concurrent
activities, but X10 seems to have that covered with clocks already. I
guess I'd need to write something substantial in X10 to know what I
like best. Is still in flux ? Have you considered including phasers
from Habanero?

Thanks a lot,

Olivier

> Olivier Pernet wrote:
>>
>> Hi all,
>>
>> I've been doing some thinking about where processor architecture are
>> headed, and the programming models required.
>> It seem clear to me that neither shared mutable memory, nor cache
>> coherency can scale up to hundreds of cores. Hence future processors
>> will have to look very similar to today's GPUs, or some hybrid thing
>> like the Cell. Intel is going in the same direction with Larrabee,
>> although I believe that its use of cache coherency won't survive for
>> very long, probably not above the 100 cores mark.
>>
>> It seems like the future architectures will need to include both:
>> - memory partitioned across cores
>> - programmer-controlled cache
>>
>> X10 is ideally equipped for the former thanks to PGAS. How about the
>> latter?
>>
>> Is there any ongoing work on a Cell or GPU runtime for X10? I think a
>> truly future-proof language, at this point, should enable good
>> performance for the same program on all of Cell, GPUs, single-chip
>> multicore CPU, and clusters.
>>
>> As an aside, has explicit message passing been considered in X10 for
>> communication between places? I suppose futures are an equivalent
>> primitive but I'm partial to message passing as in the actor model,
>> and to the join calculus (although I don't know if it can be
>> implemented efficiently across a cluster of machines).
>>
>> Feel free to point at my mistakes or redirect the conversation to
>> another list if appropriate.
>>
>> Cheers,
>> --
>> Olivier Pernet
>>
>> We are the knights who say
>> echo '16i[q]sa[ln0=aln100%Pln100/snlbx]sbA0D4D465452snlbxq'|dc
>>
>>
>> ------------------------------------------------------------------------------
>> Stay on top of everything new and different, both inside and around Java
>> (TM) technology - register by April 22, and save
>> $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
>> 300 plus technical and hands-on sessions. Register today. Use priority
>> code J9JMT32. http://p.sf.net/sfu/p
>> _______________________________________________
>> X10-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/x10-users
>>
>>
>>
>>
>
>
>

------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
X10-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/x10-users

Re: [X10-users] Running X10 on GPUs, Cell, and Larrabee

Reply via email to