Re: Has anyone tried Spark with Mahout?

Sebastian Schelter Thu, 20 Oct 2011 10:49:03 -0700

On 20.10.2011 19:45, Ted Dunning wrote:
> I think that giraph has a lot to offer here as well.


+1 on that.

> 
> Sent from my iPhone
> 
> On Oct 20, 2011, at 8:30, Josh Patterson <[email protected]> wrote:
> 
>> I've run some tests with Spark in general, its a pretty interesting setup;
>>
>> I think the most interesting aspect (relevant to what you are asking
>> about) is that Matei already has Spark running on top of MRv2:
>>
>> https://github.com/mesos/spark-yarn
>>
>> (you dont have to run mesos, but the YARN code needs to be able to see
>> the jar in order to do its scheduling stuff)
>>
>> I've been playing around with writing a genetic algorithm in
>> Scala/Spark to run on MRv2, and in the process got introduced to the
>> book:
>>
>> "Parallel Iterative Algorithms, From Sequential to Grid Computing"
>>
>> which talks about strategies for parallelizing high iterative
>> algorithms and the inherent issues involved (sync/async iterations,
>> sync/async communications, etc). Since you can use Spark as a
>> "BSP-style" framework (ignoring the RRDs if you like) and just shoot
>> out slices of an array of items to be processed (relatively fast
>> compared to MR), it has some interesting property/tradeoffs to take a
>> look at.
>>
>> Toward the end of my ATL Hug talk I mentioned the possibility of how
>> MRv2 could be used with other frameworks, like Spark, to be better
>> suited for other algorithms (in this case, highly iterative):
>>
>> http://www.slideshare.net/jpatanooga/machine-learning-and-hadoop
>>
>> I think it would be interesting to have mahout sitting on top of MRv2,
>> like Ted is referring to, and then have an algorithm matched to a
>> framework on YARN and a workflow that mixed and matched these
>> combinations.
>>
>> Lot's of possibilities here.
>>
>> JP
>>
>>
>> On Wed, Oct 19, 2011 at 10:42 PM, Ted Dunning <[email protected]> wrote:
>>> Spark is very cool but very incompatible with Hadoop code.  Many Mahout
>>> algorithms would run much faster on Spark, but you will have to do the
>>> porting yourself.
>>>
>>> Let us know how it turns how!
>>>
>>> 2011/10/19 WangRamon <[email protected]>
>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hi All I was told today that Spark is a much better platform for cluster
>>>> computing, better than Hadoop at least at Recommendation computing way, I'm
>>>> still very new at this area, if anyone has done some investigation on 
>>>> Spark,
>>>> can you please share your idea here, thank you very much. Thanks Ramon
>>>>
>>>
>>
>>
>>
>> -- 
>> Twitter: @jpatanooga
>> Solution Architect @ Cloudera
>> hadoop: http://www.cloudera.com

Re: Has anyone tried Spark with Mahout?

Reply via email to