SMOTENC will internally one hot encode the features, generate new features, and finally decode. So you need to do something like:
from imblearn.pipeline import make_pipeline, Pipeline num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values) print(len(num_indices1)) print(len(cat_indices1)) pipeline=Pipeline(steps= [ # Categorical features ('feature_processing', FeatureUnion(transformer_list = [ ('categorical', MultiColumn(cat_indices1)), #numeric ('numeric', Pipeline(steps = [ ('select', MultiColumn(num_indices1)), ('scale', StandardScaler()) ])) ])), ('clf', rg) ] ) pipeline_with_resampling = make_pipeline(SMOTENC(categorical_features=cat_indices_1), pipeline) On Sun, 20 Jan 2019 at 18:05, S Hamidizade <hamidizad...@gmail.com> wrote: > Dear Scikit-learners > Hi. > > I would greatly appreciate if you could let me know how to use SMOTENC. I > wrote: > > num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) > cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values) > print(len(num_indices1)) > print(len(cat_indices1)) > > pipeline=Pipeline(steps= [ > # Categorical features > ('feature_processing', FeatureUnion(transformer_list = [ > ('categorical', MultiColumn(cat_indices1)), > > #numeric > ('numeric', Pipeline(steps = [ > ('select', MultiColumn(num_indices1)), > ('scale', StandardScaler()) > ])) > ])), > ('clf', rg) > ] > ) > > Therefore, as it is indicated I have 5 categorical features. Really, > indices 123 to 160 are related to one categorical feature with 37 possible > values which is converted into 37 columns using get_dummies. > Sorry, I think SMOTENC should be inserted before the classifier ('clf', > reg) but I don't know how to define "categorical_features" in SMOTENC. > Besides, could you please let me know where to use imblearn.pipeline? > > Thanks in advance. > Best regards, > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn