Awesome.
So we are going to implement certain required DistributedOperations, in a
separate trait similar to, but other than the DistributedEngine.
I'll think about this a little more, and propose an initial implementation
that hopefully we can agree on.
Best,
Gokhan
On Thu, Nov 13, 2014 at 1:
On Wed, Nov 12, 2014 at 1:44 PM, Dmitriy Lyubimov wrote:
>
>
> On Wed, Nov 12, 2014 at 1:27 PM, Gokhan Capan wrote:
>
>> My only concern is to add certain loss minimization tools for people to
>> write machine learning algorithms.
>>
>> mapBlock as you suggested can work equally, but I happened
Yep it is part of onlinesummarizer
Sent from my iPhone
> On Nov 12, 2014, at 2:23 PM, Ted Dunning wrote:
>
>> On Wed, Nov 12, 2014 at 2:08 PM, Gokhan Capan wrote:
>>
>> Can we easily integrate t-digest for descriptives once we have block
>> aggregates? This might count one more reason.
>
> P
On Wed, Nov 12, 2014 at 2:08 PM, Gokhan Capan wrote:
> Can we easily integrate t-digest for descriptives once we have block
> aggregates? This might count one more reason.
>
Presumably.
T-digest is already in Mahout as part of the OnlineSummarizer.
On Wed, Nov 12, 2014 at 2:04 PM, Ted Dunning wrote:
> On Wed, Nov 12, 2014 at 9:53 AM, Dmitriy Lyubimov
> wrote:
>
> > once we start mapping aggregate, there's no reason not to
> > map other engine specific capabilities, which are vast. At this point
> > dilemma is, no matter what we do we are l
Ted,
Can we easily integrate t-digest for descriptives once we have block
aggregates? This might count one more reason.
Gokhan
On Thu, Nov 13, 2014 at 12:04 AM, Ted Dunning wrote:
> On Wed, Nov 12, 2014 at 9:53 AM, Dmitriy Lyubimov
> wrote:
>
> > once we start mapping aggregate, there's no re
On Wed, Nov 12, 2014 at 9:53 AM, Dmitriy Lyubimov wrote:
> once we start mapping aggregate, there's no reason not to
> map other engine specific capabilities, which are vast. At this point
> dilemma is, no matter what we do we are losing coherency: if we map it all,
> then other engines will have
On Wed, Nov 12, 2014 at 1:27 PM, Gokhan Capan wrote:
> My only concern is to add certain loss minimization tools for people to
> write machine learning algorithms.
>
> mapBlock as you suggested can work equally, but I happened to have
> implemented the aggregate op while thinking.
>
> Apart from
My only concern is to add certain loss minimization tools for people to
write machine learning algorithms.
mapBlock as you suggested can work equally, but I happened to have
implemented the aggregate op while thinking.
Apart from this SGD implementation,
blockify-a-matrix-and-run-an-operation-in-
yes i usually follow #2 too.
The thing is, pretty often algorithm can define its own set of strategies
the backend need to support (like this distributedEngine strategy) and keep
a lot of logic still common accross all strategies. But then if all-reduce
aggregate operation is incredibly common amo
i promise to make a review of this by next monday. I looked briefly and had
some suggestions, I think it might be ok. My only concern is what i have
already said -- once we start mapping aggregate, there's no reason not to
map other engine specific capabilities, which are vast. At this point
dilemm
So you are following #2, which is good. #1 seems a bit like a hack. For a long
time to come we will have to add things to the DSL if it is to be kept engine
independent. Yours looks pretty general and simple.
Are you familiar with the existing Mahout aggregate methods? They show up in
the SGDH
So the alternatives are:
1- mapBlock to a matrix whose all rows-but-the first are empty, then
aggregate
2- depend on a backend
1 is obviously OK.
I don't like the idea of depending on a backend since SGD is a generic loss
minimization, on which other algorithms will possibly depend.
In this con
culate the bVector at each
iteration) to the spark module and like Pat said use the Spark operations?
> Subject: Re: SGD Implementation and Questions for mapBlock like functionality
> From: p...@occamsmachete.com
> Date: Tue, 11 Nov 2014 09:54:52 -0800
> To: dev@mahout.apache.org
>
Still not sure what you need but if mapBlock and broadcast vals aren’t enough
you’ll have to look at Spark’s available operations like join, reduce, etc. As
well as the Spark accumulators. None of these have been made generic enough for
the DSL yet AFAIK. I use accumulators in Spark specific cod
Well, in that specific case, I will accumulate in the client side,
collection of the intermediate parameters is not that big (numBlocks x
X.ncol). What I need is just mapping (keys, block) to a vector (currently,
a mapBlock has to map the block to the new block)
>From a general perspective, you ar
Do you need a reduce or could you use an accumulator? Either is not really
supported in the DSL but clearly these are required for certain algos.
Broadcast vals supported but are read only.
On Nov 8, 2014, at 12:42 PM, Gokhan Capan wrote:
Hi,
Based on Zinkevich et al.'s Parallelized Stochast
17 matches
Mail list logo