Not directly an answer -- but if anything, you can use spark in local mode -- that's how our unit tests are written. Use something like `local[8]` for master to enable multiple asynchronous workers.
There will we overhead in the area of <= 0.5 s compared to totally spark-less execution, but if it is reasonably less compared to the rest of the job (i.e. if your case is not really a micro-matrix case) then it should not matter much. -d On Fri, Aug 1, 2014 at 2:53 AM, Frank Scholten <[email protected]> wrote: > Hi all, > > I noticed the development of the Spark co-occurrence of MAHOUT-1464 and I > wondered if I could get similar results but with less scalability when I > use MultithreadedBatchItemSimilarities with LLRSimilarity. > > I want to use a co-occurrence recommender on a smallish datasets of a few > GBs that does not warrant the use of a Spark cluster. Is the Spark > implementation mostly a more scalable version or is it an improved > implementation that gives different or better results? > > Cheers, > > Frank >
