Re: Has anyone tried Spark with Mahout?

Josh Patterson Thu, 20 Oct 2011 11:05:07 -0700

Absolutely, I'd agree on that.

>From what I can tell its the best "Pregel"-style clone going, its
heading towards MRv2 and seems to have some decent momentum behind it.


On Thu, Oct 20, 2011 at 1:48 PM, Sebastian Schelter <[email protected]> wrote:
> On 20.10.2011 19:45, Ted Dunning wrote:
>> I think that giraph has a lot to offer here as well.
>
> +1 on that.
>
>>
>> Sent from my iPhone
>>
>> On Oct 20, 2011, at 8:30, Josh Patterson <[email protected]> wrote:
>>
>>> I've run some tests with Spark in general, its a pretty interesting setup;
>>>
>>> I think the most interesting aspect (relevant to what you are asking
>>> about) is that Matei already has Spark running on top of MRv2:
>>>
>>> https://github.com/mesos/spark-yarn
>>>
>>> (you dont have to run mesos, but the YARN code needs to be able to see
>>> the jar in order to do its scheduling stuff)
>>>
>>> I've been playing around with writing a genetic algorithm in
>>> Scala/Spark to run on MRv2, and in the process got introduced to the
>>> book:
>>>
>>> "Parallel Iterative Algorithms, From Sequential to Grid Computing"
>>>
>>> which talks about strategies for parallelizing high iterative
>>> algorithms and the inherent issues involved (sync/async iterations,
>>> sync/async communications, etc). Since you can use Spark as a
>>> "BSP-style" framework (ignoring the RRDs if you like) and just shoot
>>> out slices of an array of items to be processed (relatively fast
>>> compared to MR), it has some interesting property/tradeoffs to take a
>>> look at.
>>>
>>> Toward the end of my ATL Hug talk I mentioned the possibility of how
>>> MRv2 could be used with other frameworks, like Spark, to be better
>>> suited for other algorithms (in this case, highly iterative):
>>>
>>> http://www.slideshare.net/jpatanooga/machine-learning-and-hadoop
>>>
>>> I think it would be interesting to have mahout sitting on top of MRv2,
>>> like Ted is referring to, and then have an algorithm matched to a
>>> framework on YARN and a workflow that mixed and matched these
>>> combinations.
>>>
>>> Lot's of possibilities here.
>>>
>>> JP
>>>
>>>
>>> On Wed, Oct 19, 2011 at 10:42 PM, Ted Dunning <[email protected]> wrote:
>>>> Spark is very cool but very incompatible with Hadoop code.  Many Mahout
>>>> algorithms would run much faster on Spark, but you will have to do the
>>>> porting yourself.
>>>>
>>>> Let us know how it turns how!
>>>>
>>>> 2011/10/19 WangRamon <[email protected]>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Hi All I was told today that Spark is a much better platform for cluster
>>>>> computing, better than Hadoop at least at Recommendation computing way, 
>>>>> I'm
>>>>> still very new at this area, if anyone has done some investigation on 
>>>>> Spark,
>>>>> can you please share your idea here, thank you very much. Thanks Ramon
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Twitter: @jpatanooga
>>> Solution Architect @ Cloudera
>>> hadoop: http://www.cloudera.com
>
>



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com

Re: Has anyone tried Spark with Mahout?

Reply via email to