Not directly an answer -- but if anything, you can use spark in local mode
-- that's how our unit tests are written. Use something like `local[8]` for
master to enable multiple asynchronous workers.

There will we overhead in the area of <= 0.5 s compared to totally
spark-less execution, but if it is reasonably less compared to the rest of
the job (i.e. if your case is not really a micro-matrix case) then it
should not matter much.

-d


On Fri, Aug 1, 2014 at 2:53 AM, Frank Scholten <[email protected]>
wrote:

> Hi all,
>
> I noticed the development of the Spark co-occurrence of MAHOUT-1464 and I
> wondered if I could get similar results but with less scalability when I
> use MultithreadedBatchItemSimilarities with LLRSimilarity.
>
> I want to use a co-occurrence recommender on a smallish datasets of a few
> GBs that does not warrant the use of a Spark cluster. Is the Spark
> implementation mostly a more scalable version or is it an improved
> implementation that gives different or better results?
>
> Cheers,
>
> Frank
>

Reply via email to