I see. I guess you mean nested preprocessors vs. pipelined jobs.
There are some efforts, e.g. Rapid Miner, that allows to do more than just input normalization in a formal model -- although i did not play enough with that. But they do *something*. perhaps it could be a source for inspiration. Is Rapid Miner modelling closer to what you mean? On Mon, Apr 25, 2011 at 12:14 PM, Ted Dunning <[email protected]> wrote: > On Mon, Apr 25, 2011 at 12:04 PM, Dmitriy Lyubimov <[email protected]>wrote: > >> I don't think stuff like pre-clustering, dimensionality reduction >> should be included. Just the summarization, hashing trick and common >> strategies for parsing non-quantitative inputs included in the book. >> > > So you prefer the limited function option. > > >> ... >> But if there's pre-clustering and/or dimensionality reduction (PCA >> like stuff), that would be a pipeline, not just input processing? I >> don't think about input processing as being a pipelined processing. >> > > It isn't usually a pipeline as in map-reduce. Yes, it is a set of pure > functions applied to the input variables to produce the actual predictor > variables. Yes, these functions can be composed. > > If you are trying to do what Grant says (provide Mahout-as-a-service) then > you need to provide some mechanism for adding these things. >
