I believe writing in the DSL is simple enough, especially if you have some familiarity with Scala on top of R (or, in my case, R on top of Scala perhaps:). I've implemented about couple dozens customized algorithms that used distributed Samsara algebra at least to some degree, and I think I can reliably attest none of them ever exceeded 100 lines or so, and that it significantly reduced my time dedicated to writing algebra on top of Spark and some other backends I use under proprietary settings. I am now mostly doing non-algebraic improvements because writing algebra is easy.
The most difficult part however, at least for me, and as you can see as you go along with the book, was not the pecularities of R-like bindings, but the algorithm reformulations. Traditional "in-memory" algorithms do not work on shared-nothing backends, even though you could program them, they simply will not perform. The main reasons some of the traditional algorithms do not work at scale are because they either require random memory access, or (more often) are simply super-linear w.r.t. input size, so as one scales infrastructure at linear cost, one would still incur less than expected increment in performance (if any at all, at some point) per unit of input. Hence, usually some mathematically, or should i say, statistically motivated tricks are still required. As the book describes, linearly or sub-linearly scalable sketches, random projections, dimensionality reductions etc. etc. are required to alleviate scalability issues of the super-linear algorithms. To your question, i got couple of people doing some pieces on various projects before with Samsara, but they had me as a coworker. I am personally not aware of any outside developers beyond people already on the project @ Apache and my co-workers, although in all honesty i feel it has to do more with maturity and modest marketing of the public version of Samsara than necessarily the difficulty of adoption. -d On Sun, Mar 26, 2017 at 9:15 AM, Gustavo Frederico < gustavo.freder...@thinkwrap.com> wrote: > I read Lyubimov's and Palumbo's book on Mahout Samsara up to chapter 4 > ( Distributed Algebra ). I have some familiarity with R, I did study > linear algebra and calculus in undergrad. In my master's I studied > statistical pattern recognition and researched a number of ML > algorithms in my thesis - spending more time on SVMs. This is to ask: > what is the learning curve of Samsara? How complicated is to work with > distributed algebra to create an algorithm? Can someone share an > example of how long she/he took to go from algorithm conception to > implementation? > > Thanks > > Gustavo >