Re: SGD Implementation and Questions for mapBlock like functionality

2014-11-13 Thread Gokhan Capan
Awesome. So we are going to implement certain required DistributedOperations, in a separate trait similar to, but other than the DistributedEngine. I'll think about this a little more, and propose an initial implementation that hopefully we can agree on. Best, Gokhan On Thu, Nov 13, 2014 at 1:

Re: SGD Implementation and Questions for mapBlock like functionality

2014-11-12 Thread Dmitriy Lyubimov
On Wed, Nov 12, 2014 at 1:44 PM, Dmitriy Lyubimov wrote: > > > On Wed, Nov 12, 2014 at 1:27 PM, Gokhan Capan wrote: > >> My only concern is to add certain loss minimization tools for people to >> write machine learning algorithms. >> >> mapBlock as you suggested can work equally, but I happened

Re: SGD Implementation and Questions for mapBlock like functionality

2014-11-12 Thread Suneel Marthi
Yep it is part of onlinesummarizer Sent from my iPhone > On Nov 12, 2014, at 2:23 PM, Ted Dunning wrote: > >> On Wed, Nov 12, 2014 at 2:08 PM, Gokhan Capan wrote: >> >> Can we easily integrate t-digest for descriptives once we have block >> aggregates? This might count one more reason. > > P

Re: SGD Implementation and Questions for mapBlock like functionality

2014-11-12 Thread Ted Dunning
On Wed, Nov 12, 2014 at 2:08 PM, Gokhan Capan wrote: > Can we easily integrate t-digest for descriptives once we have block > aggregates? This might count one more reason. > Presumably. T-digest is already in Mahout as part of the OnlineSummarizer.

Re: SGD Implementation and Questions for mapBlock like functionality

2014-11-12 Thread Dmitriy Lyubimov
On Wed, Nov 12, 2014 at 2:04 PM, Ted Dunning wrote: > On Wed, Nov 12, 2014 at 9:53 AM, Dmitriy Lyubimov > wrote: > > > once we start mapping aggregate, there's no reason not to > > map other engine specific capabilities, which are vast. At this point > > dilemma is, no matter what we do we are l

Re: SGD Implementation and Questions for mapBlock like functionality

2014-11-12 Thread Gokhan Capan
Ted, Can we easily integrate t-digest for descriptives once we have block aggregates? This might count one more reason. Gokhan On Thu, Nov 13, 2014 at 12:04 AM, Ted Dunning wrote: > On Wed, Nov 12, 2014 at 9:53 AM, Dmitriy Lyubimov > wrote: > > > once we start mapping aggregate, there's no re

Re: SGD Implementation and Questions for mapBlock like functionality

2014-11-12 Thread Ted Dunning
On Wed, Nov 12, 2014 at 9:53 AM, Dmitriy Lyubimov wrote: > once we start mapping aggregate, there's no reason not to > map other engine specific capabilities, which are vast. At this point > dilemma is, no matter what we do we are losing coherency: if we map it all, > then other engines will have

Re: SGD Implementation and Questions for mapBlock like functionality

2014-11-12 Thread Dmitriy Lyubimov
On Wed, Nov 12, 2014 at 1:27 PM, Gokhan Capan wrote: > My only concern is to add certain loss minimization tools for people to > write machine learning algorithms. > > mapBlock as you suggested can work equally, but I happened to have > implemented the aggregate op while thinking. > > Apart from

Re: SGD Implementation and Questions for mapBlock like functionality

2014-11-12 Thread Gokhan Capan
My only concern is to add certain loss minimization tools for people to write machine learning algorithms. mapBlock as you suggested can work equally, but I happened to have implemented the aggregate op while thinking. Apart from this SGD implementation, blockify-a-matrix-and-run-an-operation-in-

Re: SGD Implementation and Questions for mapBlock like functionality

2014-11-12 Thread Dmitriy Lyubimov
yes i usually follow #2 too. The thing is, pretty often algorithm can define its own set of strategies the backend need to support (like this distributedEngine strategy) and keep a lot of logic still common accross all strategies. But then if all-reduce aggregate operation is incredibly common amo

Re: SGD Implementation and Questions for mapBlock like functionality

2014-11-12 Thread Dmitriy Lyubimov
i promise to make a review of this by next monday. I looked briefly and had some suggestions, I think it might be ok. My only concern is what i have already said -- once we start mapping aggregate, there's no reason not to map other engine specific capabilities, which are vast. At this point dilemm

Re: SGD Implementation and Questions for mapBlock like functionality

2014-11-12 Thread Pat Ferrel
So you are following #2, which is good. #1 seems a bit like a hack. For a long time to come we will have to add things to the DSL if it is to be kept engine independent. Yours looks pretty general and simple. Are you familiar with the existing Mahout aggregate methods? They show up in the SGDH

Re: SGD Implementation and Questions for mapBlock like functionality

2014-11-11 Thread Gokhan Capan
So the alternatives are: 1- mapBlock to a matrix whose all rows-but-the first are empty, then aggregate 2- depend on a backend 1 is obviously OK. I don't like the idea of depending on a backend since SGD is a generic loss minimization, on which other algorithms will possibly depend. In this con

RE: SGD Implementation and Questions for mapBlock like functionality

2014-11-11 Thread Andrew Palumbo
culate the bVector at each iteration) to the spark module and like Pat said use the Spark operations? > Subject: Re: SGD Implementation and Questions for mapBlock like functionality > From: p...@occamsmachete.com > Date: Tue, 11 Nov 2014 09:54:52 -0800 > To: dev@mahout.apache.org >

Re: SGD Implementation and Questions for mapBlock like functionality

2014-11-11 Thread Pat Ferrel
Still not sure what you need but if mapBlock and broadcast vals aren’t enough you’ll have to look at Spark’s available operations like join, reduce, etc. As well as the Spark accumulators. None of these have been made generic enough for the DSL yet AFAIK. I use accumulators in Spark specific cod

Re: SGD Implementation and Questions for mapBlock like functionality

2014-11-10 Thread Gokhan Capan
Well, in that specific case, I will accumulate in the client side, collection of the intermediate parameters is not that big (numBlocks x X.ncol). What I need is just mapping (keys, block) to a vector (currently, a mapBlock has to map the block to the new block) >From a general perspective, you ar

Re: SGD Implementation and Questions for mapBlock like functionality

2014-11-10 Thread Pat Ferrel
Do you need a reduce or could you use an accumulator? Either is not really supported in the DSL but clearly these are required for certain algos. Broadcast vals supported but are read only. On Nov 8, 2014, at 12:42 PM, Gokhan Capan wrote: Hi, Based on Zinkevich et al.'s Parallelized Stochast