Re: Sharing a vector between mappers

2010-10-21 Thread Ted Dunning
OK. This makes sense given that A is relatively small. On Thu, Oct 21, 2010 at 8:40 AM, Alexander Hans wrote: > > One point that I don't quite see is how you accumulate partial sums of > > inv(X' * X). It seems that the inverse makes this difficult. > > This is orthogonal to the map-reduce arc

Re: Sharing a vector between mappers

2010-10-21 Thread Alexander Hans
> So what you have is essentially a fancy nearest neighbor approach which > requires that you scan all of your input unless you > have a clever way of discarding those rows of X which have very small > weights. Yes, LWLR is basically a kernel-smoothed nearest neighbor approach that makes a first-

Re: Sharing a vector between mappers

2010-10-21 Thread Ted Dunning
Nicely put. So what you have is essentially a fancy nearest neighbor approach which requires that you scan all of your input unless you have a clever way of discarding those rows of X which have very small weights. It still makes lots of sense to amortize the cost of this scan across multiple x_q

Re: Sharing a vector between mappers

2010-10-21 Thread Alexander Hans
> The issue with this kind of program is that there is usually a part of the > data that is bounded in size > (the model) and a part of the data that is unbounded in size (the input > data). The bounded portion is > usually what is stored in distributed cache, even if not all mappers read > all o

Re: Sharing a vector between mappers

2010-10-21 Thread Ted Dunning
On Thu, Oct 21, 2010 at 12:28 AM, Alexander Hans wrote: > But now that I read your reply it becomes clear > that the better solution for determining predictions for more than one > prediction input vector would indeed be reading those vectors from the > distributed cache or hdfs directly and thus

Re: Sharing a vector between mappers

2010-10-21 Thread Alexander Hans
Hi Ted, > Passing small amounts of data via configuration is reasonable to do, but it > isn't clear that this is a good idea for you. Do you really only want > to pass around a single input vector for an entire map-reduce invocation? > Map-reduce takes a looong time to get started. Yeah, that's

Re: Sharing a vector between mappers

2010-10-20 Thread Ted Dunning
Passing small amounts of data via configuration is reasonable to do, but it isn't clear that this is a good idea for you. Do you really only want to pass around a single input vector for an entire map-reduce invocation? Map-reduce takes a looong time to get started. If you might possibly want to

Sharing a vector between mappers

2010-10-20 Thread Alexander Hans
Hi, I've finally got some work done on the LWLR implementation. It's already functional when used with fixed weights of 1, i.e., linear regression. In that case each mapper gets a vector from the training data and calculates the A matrix (X'*W*X, with W being a diagonal matrix containing the weigh