Hi all, I'm fairly new to scikit-learn, but have been using a predictive model for a while now that would benefit from scikit-learn's estimator API. However, I could use some advice on how best to implement this.
Briefly, the model is a combination of dimension reduction and nearest neighbors, but the dimension reduction step (canonical correspondence analysis - CCA) relies on two matrices to create the synthetic feature scores for the candidates in the nearest neighbor step. The two matrices are a "species" matrix (spp) and an "environmental" matrix (env) which are used to create orthogonal CCA axes that are linear combinations of the environmental features. In reading through the documentation on creating new estimators, it seems that every estimator should provide a fit(X, y) method. Somehow I need my X parameter to be both the spp and env matrices together. I got a lot of good inspiration from this post on Stack Overflow: https://stackoverflow.com/questions/45966500/use-sklearn-gridsearchcv-on-custom-class-whose-fit-method-takes-3-arguments and can mostly understand how the OP implemented this, basically by creating a DataHandler class that packs together the two matrices, such that the call to fit would look like: estimator.fit(DataHandler(spp, env), y) I'm wondering if this is the best way to handle the design or if I'm not fully understanding how I could use a Pipeline to accomplish the same goal. Thanks for any guidance - boilerplate sample code would be most appreciated! matt _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn