Recently I gave a talk about how Apache Mahout is moving to support GPUs at the lowest layer. This means that the generalized linear algebra of Mahout is GPU accelerated no matter the algorithm constructed with it. So Spark MLlib will have to wait for specialized work to get GPUs (IBM is working on something) but this still won’t put other MLlib algos on GPUs.
Today CCO (the core algorithm of the PIO Template for the UR) as well as the Mahout versions of ALS, SSVD, and anything easily expressed in linalg, are already getting the benefit. The more ad-hoc approaches taken in the past by libs like MLlib will be very large efforts to bring into GPU land and what happens when the rumored ML-cores become available with even higher throughput? If anyone is interested in these developments some are mentioned here: http://actionml.com/blog/talk_at_gtc <http://actionml.com/blog/talk_at_gtc>
