Hi there,

The way to do what you describe in scikit-learn would be via the 
ColumnTransformer
https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html

Note that however scikit-learn is mostly designed for multi-variate statistics, 
and thus does not tend to individualize columns in its transformers.

Some of us are working on a related package, skrub (https://skrub-data.org), 
which is more focused to on heterogeneous dataframes. It does not currently 
have something that would help you much, but we are heavily brain-storming a 
variety of APIs to do flexible transformations of dataframes, including easily 
doing what you want. The challenge is to address the variety of cases.

Hope this helps,

Gaël

On Wed, Jan 22, 2025 at 03:23:39PM -0800, Bill Ross wrote:
> Hi,

> I have a mixture of table data and intermediate vectors from another model, 
> which don't seem to scale productively. The fact that MinMaxScaler seems to 
> do all features in X makes me wonder if/how people train with such mixed data.

> The easy approaches seem to be either scale the db data and then combine with 
> the vectors, or just scale the db columns in place 'by hand'.

> Otherwise, I might consider adding a column-list option to the API.

> I suspect I'm just missing something important, since I wandered in following 
> this purely-tabular example, which seemed good before adding ML-derived 
> vectors:

> https://www.kaggle.com/code/carlmcbrideellis/tabular-classification-with-neural-networks-keras

> Any advice or more-appropriate example to follow would be great.

> Thanks,

> Bill
-- 
    Gael Varoquaux
    Research Director, INRIA
    http://gael-varoquaux.info      
https://bsky.app/profile/gaelvaroquaux.bsky.social
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to