On Mon, Apr 25, 2011 at 12:04 PM, Dmitriy Lyubimov <[email protected]>wrote:

> I don't think stuff like pre-clustering, dimensionality reduction
> should be included. Just the summarization, hashing trick and common
> strategies for parsing non-quantitative inputs included in the book.
>

So you prefer the limited function option.


> ...
> But if there's pre-clustering and/or dimensionality reduction (PCA
> like stuff), that would be a pipeline, not just input processing? I
> don't think about input processing as being a pipelined processing.
>

It isn't usually a pipeline as in map-reduce.  Yes, it is a set of pure
functions applied to the input variables to produce the actual predictor
variables.  Yes, these functions can be composed.

If you are trying to do what Grant says (provide Mahout-as-a-service) then
you need to provide some mechanism for adding these things.

Reply via email to