[scikit-learn] custom estimator with more than two arguments to fit()

2020-07-31 Thread Gregory, Matthew
Hi all,

I'm fairly new to scikit-learn, but have been using a predictive model for a 
while now that would benefit from scikit-learn's estimator API.  However, I 
could use some advice on how best to implement this.

Briefly, the model is a combination of dimension reduction and nearest 
neighbors, but the dimension reduction step (canonical correspondence analysis 
- CCA) relies on two matrices to create the synthetic feature scores for the 
candidates in the nearest neighbor step.  The two matrices are a "species" 
matrix (spp) and an "environmental" matrix (env) which are used to create 
orthogonal CCA axes that are linear combinations of the environmental features.

In reading through the documentation on creating new estimators, it seems that 
every estimator should provide a fit(X, y) method.  Somehow I need my X 
parameter to be both the spp and env matrices together.  I got a lot of good 
inspiration from this post on Stack Overflow:

  
https://stackoverflow.com/questions/45966500/use-sklearn-gridsearchcv-on-custom-class-whose-fit-method-takes-3-arguments

and can mostly understand how the OP implemented this, basically by creating a 
DataHandler class that packs together the two matrices, such that the call to 
fit would look like:

  estimator.fit(DataHandler(spp, env), y)

I'm wondering if this is the best way to handle the design or if I'm not fully 
understanding how I could use a Pipeline to accomplish the same goal.  Thanks 
for any guidance - boilerplate sample code would be most appreciated!

matt

  
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] custom estimator with more than two arguments to fit()

2020-07-31 Thread Nicolas Hug

Hi Matt,

We do have CCA and other PLS-related transformers / regressors in 
scikit-learn. They are able to do dimensionality reduction on both X and 
Y (which I believe correspond to spp and env), so you might want to have 
a look at these. However, they're not fully compatible with the whole 
ecosystem unfortunately: for example our Pipeline objects assume that 
only X can be transformed, not Y.


Nicolas

On 7/31/20 12:02 PM, Gregory, Matthew wrote:

Hi all,

I'm fairly new to scikit-learn, but have been using a predictive model for a 
while now that would benefit from scikit-learn's estimator API.  However, I 
could use some advice on how best to implement this.

Briefly, the model is a combination of dimension reduction and nearest neighbors, but the dimension 
reduction step (canonical correspondence analysis - CCA) relies on two matrices to create the 
synthetic feature scores for the candidates in the nearest neighbor step.  The two matrices are a 
"species" matrix (spp) and an "environmental" matrix (env) which are used to 
create orthogonal CCA axes that are linear combinations of the environmental features.

In reading through the documentation on creating new estimators, it seems that 
every estimator should provide a fit(X, y) method.  Somehow I need my X 
parameter to be both the spp and env matrices together.  I got a lot of good 
inspiration from this post on Stack Overflow:

   
https://stackoverflow.com/questions/45966500/use-sklearn-gridsearchcv-on-custom-class-whose-fit-method-takes-3-arguments

and can mostly understand how the OP implemented this, basically by creating a 
DataHandler class that packs together the two matrices, such that the call to 
fit would look like:

   estimator.fit(DataHandler(spp, env), y)

I'm wondering if this is the best way to handle the design or if I'm not fully 
understanding how I could use a Pipeline to accomplish the same goal.  Thanks 
for any guidance - boilerplate sample code would be most appreciated!

matt

   
___

scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] custom estimator with more than two arguments to fit()

2020-07-31 Thread Gregory, Matthew
Hi Nicolas,

Nicolas Hug wrote:
> We do have CCA and other PLS-related transformers / regressors in
> scikit-learn. They are able to do dimensionality reduction on both
> X and Y (which I believe correspond to spp and env), so you might
> want to have a look at these. However, they're not fully
> compatible with the whole ecosystem unfortunately: for example our
> Pipeline objects assume that only X can be transformed, not Y.

Just to clarify, I'm only seeing canonical *correlation* analysis and not 
canonical *correspondence* analysis (ter Braak) in scikit-learn?

https://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.CCA.html

But your point is taken - I can use this for inspiration because it has both X 
and Y matrices.  But if I'm understanding correctly, there is no way to couple 
this with a further step of NearestNeighbors into a pipeline?  I will only need 
the transformed scores coming out of CCA to feed into the NearestNeighbors 
step.  Sorry if I'm not understanding this correctly.

matt

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn