On Mon, Apr 25, 2011 at 12:04 PM, Dmitriy Lyubimov <[email protected]>wrote:
> I don't think stuff like pre-clustering, dimensionality reduction > should be included. Just the summarization, hashing trick and common > strategies for parsing non-quantitative inputs included in the book. > So you prefer the limited function option. > ... > But if there's pre-clustering and/or dimensionality reduction (PCA > like stuff), that would be a pipeline, not just input processing? I > don't think about input processing as being a pipelined processing. > It isn't usually a pipeline as in map-reduce. Yes, it is a set of pure functions applied to the input variables to produce the actual predictor variables. Yes, these functions can be composed. If you are trying to do what Grant says (provide Mahout-as-a-service) then you need to provide some mechanism for adding these things.
