Re: DeepLearning and Spark ?

2015-01-10 Thread Jaonary Rabarisoa
Can someone explain what is the difference between parameter server and
spark ?

There's already an issue on this topic
https://issues.apache.org/jira/browse/SPARK-4590


Another example of DL in Spark essentially based on downpour SDG
http://deepdist.com


On Sat, Jan 10, 2015 at 2:27 AM, Peng Cheng  wrote:

> Not if broadcast can only be used between stages. To enable this you have
> to at least make broadcast asynchronous & non-blocking.
>
> On 9 January 2015 at 18:02, Krishna Sankar  wrote:
>
>> I am also looking at this domain. We could potentially use the broadcast
>> capability in Spark to distribute the parameters. Haven't thought thru yet.
>> Cheers
>> 
>>
>> On Fri, Jan 9, 2015 at 2:56 PM, Andrei  wrote:
>>
>>> Does it makes sense to use Spark's actor system (e.g. via
>>> SparkContext.env.actorSystem) to create parameter server?
>>>
>>> On Fri, Jan 9, 2015 at 10:09 PM, Peng Cheng  wrote:
>>>
 You are not the first :) probably not the fifth to have the question.
 parameter server is not included in spark framework and I've seen all
 kinds of hacking to improvise it: REST api, HDFS, tachyon, etc.
 Not sure if an 'official' benchmark & implementation will be released
 soon

 On 9 January 2015 at 10:59, Marco Shaw  wrote:

> Pretty vague on details:
>
>
> http://www.datasciencecentral.com/m/blogpost?id=6448529%3ABlogPost%3A227199
>
>
> On Jan 9, 2015, at 11:39 AM, Jaonary Rabarisoa 
> wrote:
>
> Hi all,
>
> DeepLearning algorithms are popular and achieve many state of the art
> performance in several real world machine learning problems. Currently
> there are no DL implementation in spark and I wonder if there is an 
> ongoing
> work on this topics.
>
> We can do DL in spark Sparkling water and H2O but this adds an
> additional software stack.
>
> Deeplearning4j seems to implements a distributed version of many
> popural DL algorithm. Porting DL4j in Spark can be interesting.
>
> Google describes an implementation of a large scale DL in this paper
> http://research.google.com/archive/large_deep_networks_nips2012.html.
> Based on model parallelism and data parallelism.
>
> So, I'm trying to imaging what should be a good design for DL
> algorithm in Spark ? Spark already have RDD (for data parallelism). Can
> GraphX be used for the model parallelism (as DNN are generally designed as
> DAG) ? And what about using GPUs to do local parallelism (mecanism to push
> partition into GPU memory ) ?
>
>
> What do you think about this ?
>
>
> Cheers,
>
> Jao
>
>

>>>
>>
>


Re: DeepLearning and Spark ?

2015-01-09 Thread Peng Cheng
Not if broadcast can only be used between stages. To enable this you have
to at least make broadcast asynchronous & non-blocking.

On 9 January 2015 at 18:02, Krishna Sankar  wrote:

> I am also looking at this domain. We could potentially use the broadcast
> capability in Spark to distribute the parameters. Haven't thought thru yet.
> Cheers
> 
>
> On Fri, Jan 9, 2015 at 2:56 PM, Andrei  wrote:
>
>> Does it makes sense to use Spark's actor system (e.g. via
>> SparkContext.env.actorSystem) to create parameter server?
>>
>> On Fri, Jan 9, 2015 at 10:09 PM, Peng Cheng  wrote:
>>
>>> You are not the first :) probably not the fifth to have the question.
>>> parameter server is not included in spark framework and I've seen all
>>> kinds of hacking to improvise it: REST api, HDFS, tachyon, etc.
>>> Not sure if an 'official' benchmark & implementation will be released
>>> soon
>>>
>>> On 9 January 2015 at 10:59, Marco Shaw  wrote:
>>>
 Pretty vague on details:


 http://www.datasciencecentral.com/m/blogpost?id=6448529%3ABlogPost%3A227199


 On Jan 9, 2015, at 11:39 AM, Jaonary Rabarisoa 
 wrote:

 Hi all,

 DeepLearning algorithms are popular and achieve many state of the art
 performance in several real world machine learning problems. Currently
 there are no DL implementation in spark and I wonder if there is an ongoing
 work on this topics.

 We can do DL in spark Sparkling water and H2O but this adds an
 additional software stack.

 Deeplearning4j seems to implements a distributed version of many
 popural DL algorithm. Porting DL4j in Spark can be interesting.

 Google describes an implementation of a large scale DL in this paper
 http://research.google.com/archive/large_deep_networks_nips2012.html.
 Based on model parallelism and data parallelism.

 So, I'm trying to imaging what should be a good design for DL algorithm
 in Spark ? Spark already have RDD (for data parallelism). Can GraphX be
 used for the model parallelism (as DNN are generally designed as DAG) ? And
 what about using GPUs to do local parallelism (mecanism to push partition
 into GPU memory ) ?


 What do you think about this ?


 Cheers,

 Jao


>>>
>>
>


Re: DeepLearning and Spark ?

2015-01-09 Thread Krishna Sankar
I am also looking at this domain. We could potentially use the broadcast
capability in Spark to distribute the parameters. Haven't thought thru yet.
Cheers


On Fri, Jan 9, 2015 at 2:56 PM, Andrei  wrote:

> Does it makes sense to use Spark's actor system (e.g. via
> SparkContext.env.actorSystem) to create parameter server?
>
> On Fri, Jan 9, 2015 at 10:09 PM, Peng Cheng  wrote:
>
>> You are not the first :) probably not the fifth to have the question.
>> parameter server is not included in spark framework and I've seen all
>> kinds of hacking to improvise it: REST api, HDFS, tachyon, etc.
>> Not sure if an 'official' benchmark & implementation will be released soon
>>
>> On 9 January 2015 at 10:59, Marco Shaw  wrote:
>>
>>> Pretty vague on details:
>>>
>>>
>>> http://www.datasciencecentral.com/m/blogpost?id=6448529%3ABlogPost%3A227199
>>>
>>>
>>> On Jan 9, 2015, at 11:39 AM, Jaonary Rabarisoa 
>>> wrote:
>>>
>>> Hi all,
>>>
>>> DeepLearning algorithms are popular and achieve many state of the art
>>> performance in several real world machine learning problems. Currently
>>> there are no DL implementation in spark and I wonder if there is an ongoing
>>> work on this topics.
>>>
>>> We can do DL in spark Sparkling water and H2O but this adds an
>>> additional software stack.
>>>
>>> Deeplearning4j seems to implements a distributed version of many popural
>>> DL algorithm. Porting DL4j in Spark can be interesting.
>>>
>>> Google describes an implementation of a large scale DL in this paper
>>> http://research.google.com/archive/large_deep_networks_nips2012.html.
>>> Based on model parallelism and data parallelism.
>>>
>>> So, I'm trying to imaging what should be a good design for DL algorithm
>>> in Spark ? Spark already have RDD (for data parallelism). Can GraphX be
>>> used for the model parallelism (as DNN are generally designed as DAG) ? And
>>> what about using GPUs to do local parallelism (mecanism to push partition
>>> into GPU memory ) ?
>>>
>>>
>>> What do you think about this ?
>>>
>>>
>>> Cheers,
>>>
>>> Jao
>>>
>>>
>>
>


Re: DeepLearning and Spark ?

2015-01-09 Thread Andrei
Does it makes sense to use Spark's actor system (e.g. via
SparkContext.env.actorSystem) to create parameter server?

On Fri, Jan 9, 2015 at 10:09 PM, Peng Cheng  wrote:

> You are not the first :) probably not the fifth to have the question.
> parameter server is not included in spark framework and I've seen all
> kinds of hacking to improvise it: REST api, HDFS, tachyon, etc.
> Not sure if an 'official' benchmark & implementation will be released soon
>
> On 9 January 2015 at 10:59, Marco Shaw  wrote:
>
>> Pretty vague on details:
>>
>>
>> http://www.datasciencecentral.com/m/blogpost?id=6448529%3ABlogPost%3A227199
>>
>>
>> On Jan 9, 2015, at 11:39 AM, Jaonary Rabarisoa  wrote:
>>
>> Hi all,
>>
>> DeepLearning algorithms are popular and achieve many state of the art
>> performance in several real world machine learning problems. Currently
>> there are no DL implementation in spark and I wonder if there is an ongoing
>> work on this topics.
>>
>> We can do DL in spark Sparkling water and H2O but this adds an additional
>> software stack.
>>
>> Deeplearning4j seems to implements a distributed version of many popural
>> DL algorithm. Porting DL4j in Spark can be interesting.
>>
>> Google describes an implementation of a large scale DL in this paper
>> http://research.google.com/archive/large_deep_networks_nips2012.html.
>> Based on model parallelism and data parallelism.
>>
>> So, I'm trying to imaging what should be a good design for DL algorithm
>> in Spark ? Spark already have RDD (for data parallelism). Can GraphX be
>> used for the model parallelism (as DNN are generally designed as DAG) ? And
>> what about using GPUs to do local parallelism (mecanism to push partition
>> into GPU memory ) ?
>>
>>
>> What do you think about this ?
>>
>>
>> Cheers,
>>
>> Jao
>>
>>
>


Re: DeepLearning and Spark ?

2015-01-09 Thread Peng Cheng
You are not the first :) probably not the fifth to have the question.
parameter server is not included in spark framework and I've seen all kinds
of hacking to improvise it: REST api, HDFS, tachyon, etc.
Not sure if an 'official' benchmark & implementation will be released soon

On 9 January 2015 at 10:59, Marco Shaw  wrote:

> Pretty vague on details:
>
> http://www.datasciencecentral.com/m/blogpost?id=6448529%3ABlogPost%3A227199
>
>
> On Jan 9, 2015, at 11:39 AM, Jaonary Rabarisoa  wrote:
>
> Hi all,
>
> DeepLearning algorithms are popular and achieve many state of the art
> performance in several real world machine learning problems. Currently
> there are no DL implementation in spark and I wonder if there is an ongoing
> work on this topics.
>
> We can do DL in spark Sparkling water and H2O but this adds an additional
> software stack.
>
> Deeplearning4j seems to implements a distributed version of many popural
> DL algorithm. Porting DL4j in Spark can be interesting.
>
> Google describes an implementation of a large scale DL in this paper
> http://research.google.com/archive/large_deep_networks_nips2012.html.
> Based on model parallelism and data parallelism.
>
> So, I'm trying to imaging what should be a good design for DL algorithm in
> Spark ? Spark already have RDD (for data parallelism). Can GraphX be used
> for the model parallelism (as DNN are generally designed as DAG) ? And what
> about using GPUs to do local parallelism (mecanism to push partition into
> GPU memory ) ?
>
>
> What do you think about this ?
>
>
> Cheers,
>
> Jao
>
>


Re: DeepLearning and Spark ?

2015-01-09 Thread Marco Shaw
Pretty vague on details:

http://www.datasciencecentral.com/m/blogpost?id=6448529%3ABlogPost%3A227199


> On Jan 9, 2015, at 11:39 AM, Jaonary Rabarisoa  wrote:
> 
> Hi all,
> 
> DeepLearning algorithms are popular and achieve many state of the art 
> performance in several real world machine learning problems. Currently there 
> are no DL implementation in spark and I wonder if there is an ongoing work on 
> this topics.
> 
> We can do DL in spark Sparkling water and H2O but this adds an additional 
> software stack.
> 
> Deeplearning4j seems to implements a distributed version of many popural DL 
> algorithm. Porting DL4j in Spark can be interesting.
> 
> Google describes an implementation of a large scale DL in this paper 
> http://research.google.com/archive/large_deep_networks_nips2012.html. Based 
> on model parallelism and data parallelism.
> 
> So, I'm trying to imaging what should be a good design for DL algorithm in 
> Spark ? Spark already have RDD (for data parallelism). Can GraphX be used for 
> the model parallelism (as DNN are generally designed as DAG) ? And what about 
> using GPUs to do local parallelism (mecanism to push partition into GPU 
> memory ) ? 
> 
> 
> What do you think about this ?
> 
> 
> Cheers,
> 
> Jao
> 


DeepLearning and Spark ?

2015-01-09 Thread Jaonary Rabarisoa
Hi all,

DeepLearning algorithms are popular and achieve many state of the art
performance in several real world machine learning problems. Currently
there are no DL implementation in spark and I wonder if there is an ongoing
work on this topics.

We can do DL in spark Sparkling water and H2O but this adds an additional
software stack.

Deeplearning4j seems to implements a distributed version of many popural DL
algorithm. Porting DL4j in Spark can be interesting.

Google describes an implementation of a large scale DL in this paper
http://research.google.com/archive/large_deep_networks_nips2012.html. Based
on model parallelism and data parallelism.

So, I'm trying to imaging what should be a good design for DL algorithm in
Spark ? Spark already have RDD (for data parallelism). Can GraphX be used
for the model parallelism (as DNN are generally designed as DAG) ? And what
about using GPUs to do local parallelism (mecanism to push partition into
GPU memory ) ?


What do you think about this ?


Cheers,

Jao