I would also only fit these on training data. There are probably some corner cases where letting these ancillary transforms see test data results in a target leak. Though I can't really think of a good example.
More to the point, you're probably fitting these as part of a pipeline and that pipeline as a whole is only fed with training data during model building. On Wed, Nov 2, 2016 at 6:05 PM Nirav Patel <npa...@xactlycorp.com> wrote: > It is very clear that for ML algorithms (classification, regression) that > Estimator only fits on training data but it's not very clear of other > estimators like IDF for example. > IDF is a feature transformation model but having IDF estimator and > transformer makes it little confusing that what exactly it does in Fitting > on one dataset vs Transforming on another dataset. > > > > [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> > > <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] > <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] > <https://twitter.com/Xactly> [image: Facebook] > <https://www.facebook.com/XactlyCorp> [image: YouTube] > <http://www.youtube.com/xactlycorporation>