Re: [scikit-learn] Imblearn: SMOTENC
Thanks. The code is provided here: https://github.com/scikit-learn-contrib/imbalanced-learn/issues/537 Best regards, On Thu, Jan 24, 2019 at 7:15 PM Guillaume Lemaître wrote: > You should open a ticket on imbalanced-learn GitHub issue. This is easier > to post a reproducible example and for us to test it. > From the error message, I can understand that you have 161 features and > require a feature above the index 160. > > > > On Thu, 24 Jan 2019 at 16:19, S Hamidizade wrote: > >> Thanks. Unfortunately, now the error is: >> ValueError: Some of the categorical indices are out of range. Indices >> should be between 0 and 160. >> Best regards, >> >> On Sun, Jan 20, 2019 at 8:31 PM S Hamidizade >> wrote: >> >>> Dear Scikit-learners >>> Hi. >>> >>> I would greatly appreciate if you could let me know how to use >>> SMOTENC. I wrote: >>> >>> num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) >>> cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values) >>> print(len(num_indices1)) >>> print(len(cat_indices1)) >>> >>> pipeline=Pipeline(steps= [ >>> # Categorical features >>> ('feature_processing', FeatureUnion(transformer_list = [ >>> ('categorical', MultiColumn(cat_indices1)), >>> >>> #numeric >>> ('numeric', Pipeline(steps = [ >>> ('select', MultiColumn(num_indices1)), >>> ('scale', StandardScaler()) >>> ])) >>> ])), >>> ('clf', rg) >>> ] >>> ) >>> >>> Therefore, as it is indicated I have 5 categorical features. Really, >>> indices 123 to 160 are related to one categorical feature with 37 possible >>> values which is converted into 37 columns using get_dummies. >>> Sorry, I think SMOTENC should be inserted before the classifier ('clf', >>> reg) but I don't know how to define "categorical_features" in SMOTENC. >>> Besides, could you please let me know where to use imblearn.pipeline? >>> >>> Thanks in advance. >>> Best regards, >>> >> ___ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Imblearn: SMOTENC
You should open a ticket on imbalanced-learn GitHub issue. This is easier to post a reproducible example and for us to test it. >From the error message, I can understand that you have 161 features and require a feature above the index 160. On Thu, 24 Jan 2019 at 16:19, S Hamidizade wrote: > Thanks. Unfortunately, now the error is: > ValueError: Some of the categorical indices are out of range. Indices > should be between 0 and 160. > Best regards, > > On Sun, Jan 20, 2019 at 8:31 PM S Hamidizade > wrote: > >> Dear Scikit-learners >> Hi. >> >> I would greatly appreciate if you could let me know how to use >> SMOTENC. I wrote: >> >> num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) >> cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values) >> print(len(num_indices1)) >> print(len(cat_indices1)) >> >> pipeline=Pipeline(steps= [ >> # Categorical features >> ('feature_processing', FeatureUnion(transformer_list = [ >> ('categorical', MultiColumn(cat_indices1)), >> >> #numeric >> ('numeric', Pipeline(steps = [ >> ('select', MultiColumn(num_indices1)), >> ('scale', StandardScaler()) >> ])) >> ])), >> ('clf', rg) >> ] >> ) >> >> Therefore, as it is indicated I have 5 categorical features. Really, >> indices 123 to 160 are related to one categorical feature with 37 possible >> values which is converted into 37 columns using get_dummies. >> Sorry, I think SMOTENC should be inserted before the classifier ('clf', >> reg) but I don't know how to define "categorical_features" in SMOTENC. >> Besides, could you please let me know where to use imblearn.pipeline? >> >> Thanks in advance. >> Best regards, >> > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Imblearn: SMOTENC
Thanks. Unfortunately, now the error is: ValueError: Some of the categorical indices are out of range. Indices should be between 0 and 160. Best regards, On Sun, Jan 20, 2019 at 8:31 PM S Hamidizade wrote: > Dear Scikit-learners > Hi. > > I would greatly appreciate if you could let me know how to use SMOTENC. I > wrote: > > num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) > cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values) > print(len(num_indices1)) > print(len(cat_indices1)) > > pipeline=Pipeline(steps= [ > # Categorical features > ('feature_processing', FeatureUnion(transformer_list = [ > ('categorical', MultiColumn(cat_indices1)), > > #numeric > ('numeric', Pipeline(steps = [ > ('select', MultiColumn(num_indices1)), > ('scale', StandardScaler()) > ])) > ])), > ('clf', rg) > ] > ) > > Therefore, as it is indicated I have 5 categorical features. Really, > indices 123 to 160 are related to one categorical feature with 37 possible > values which is converted into 37 columns using get_dummies. > Sorry, I think SMOTENC should be inserted before the classifier ('clf', > reg) but I don't know how to define "categorical_features" in SMOTENC. > Besides, could you please let me know where to use imblearn.pipeline? > > Thanks in advance. > Best regards, > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Imblearn: SMOTENC
As stated in the doc, categorical_features are the indices of the categorical column and not the name of the columns. This is similar to the one hot encoder API. Sent from my phone - sorry to be brief and potential misspell. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Imblearn: SMOTENC
Dear Mr. Lemaitre Thanks a lot for sharing your time and knowledge. Unfortunately, it throws the following error: Traceback (most recent call last): 119 File "D:/mifs-master_2/MU/learning-from-imbalanced-classes-master/learning-from-imbalanced-classes-master/continuous/Final Logit/SMOTENC/logit-final - Copy.py", line 419, in 41 pipeline_with_resampling = make_pipeline(SMOTENC(categorical_features=cat_indices1), pipeline) File "C:\Users\Markazi.co\Anaconda3\lib\site-packages\sklearn\pipeline.py", line 594, in make_pipeline return Pipeline(_name_estimators(steps), memory=memory) File "C:\Users\Markazi.co\Anaconda3\lib\site-packages\sklearn\pipeline.py", line 119, in __init__ self._validate_steps() File "C:\Users\Markazi.co\Anaconda3\lib\site-packages\sklearn\pipeline.py", line 167, in _validate_steps " '%s' (type %s) doesn't" % (t, type(t))) TypeError: All intermediate steps should be transformers and implement fit and transform. 'SMOTENC(categorical_features=['x95', 'x97', 'x99', 'x100', 'x121_1', 'x121_2', 'x121_3', 'x121_4', 'x121_5', 'x121_6', 'x121_7', 'x121_8', 'x121_9', 'x121_10', 'x121_11', 'x121_12', 'x121_13', 'x121_14', 'x121_15', 'x121_16', 'x121_17', 'x121_18', 'x121_19', 'x121_20', 'x121_21', 'x121_22', 'x121_23', 'x121_24', 'x121_25', 'x121_26', 'x121_27', 'x121_28', 'x121_29', 'x121_30', 'x121_31', 'x121_32', 'x121_33', 'x121_34', 'x121_35', 'x121_36', 'x121_37'], k_neighbors=5, n_jobs=1, random_state=None, sampling_strategy='auto')' (type ) doesn't Thanks in advance. Best regards, On Mon, Jan 21, 2019 at 2:26 PM Guillaume Lemaître wrote: > SMOTENC will internally one hot encode the features, generate new > features, and finally decode. > So you need to do something like: > > > from imblearn.pipeline import make_pipeline, Pipeline > > num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) > cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values) > print(len(num_indices1)) > print(len(cat_indices1)) > > pipeline=Pipeline(steps= [ > # Categorical features > ('feature_processing', FeatureUnion(transformer_list = [ > ('categorical', MultiColumn(cat_indices1)), > > #numeric > ('numeric', Pipeline(steps = [ > ('select', MultiColumn(num_indices1)), > ('scale', StandardScaler()) > ])) > ])), > ('clf', rg) > ] > ) > > pipeline_with_resampling = > make_pipeline(SMOTENC(categorical_features=cat_indices_1), pipeline) > > > > > On Sun, 20 Jan 2019 at 18:05, S Hamidizade wrote: > >> Dear Scikit-learners >> Hi. >> >> I would greatly appreciate if you could let me know how to use >> SMOTENC. I wrote: >> >> num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) >> cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values) >> print(len(num_indices1)) >> print(len(cat_indices1)) >> >> pipeline=Pipeline(steps= [ >> # Categorical features >> ('feature_processing', FeatureUnion(transformer_list = [ >> ('categorical', MultiColumn(cat_indices1)), >> >> #numeric >> ('numeric', Pipeline(steps = [ >> ('select', MultiColumn(num_indices1)), >> ('scale', StandardScaler()) >> ])) >> ])), >> ('clf', rg) >> ] >> ) >> >> Therefore, as it is indicated I have 5 categorical features. Really, >> indices 123 to 160 are related to one categorical feature with 37 possible >> values which is converted into 37 columns using get_dummies. >> Sorry, I think SMOTENC should be inserted before the classifier ('clf', >> reg) but I don't know how to define "categorical_features" in SMOTENC. >> Besides, could you please let me know where to use imblearn.pipeline? >> >> Thanks in advance. >> Best regards, >> ___ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Imblearn: SMOTENC
SMOTENC will internally one hot encode the features, generate new features, and finally decode. So you need to do something like: from imblearn.pipeline import make_pipeline, Pipeline num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values) print(len(num_indices1)) print(len(cat_indices1)) pipeline=Pipeline(steps= [ # Categorical features ('feature_processing', FeatureUnion(transformer_list = [ ('categorical', MultiColumn(cat_indices1)), #numeric ('numeric', Pipeline(steps = [ ('select', MultiColumn(num_indices1)), ('scale', StandardScaler()) ])) ])), ('clf', rg) ] ) pipeline_with_resampling = make_pipeline(SMOTENC(categorical_features=cat_indices_1), pipeline) On Sun, 20 Jan 2019 at 18:05, S Hamidizade wrote: > Dear Scikit-learners > Hi. > > I would greatly appreciate if you could let me know how to use SMOTENC. I > wrote: > > num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) > cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values) > print(len(num_indices1)) > print(len(cat_indices1)) > > pipeline=Pipeline(steps= [ > # Categorical features > ('feature_processing', FeatureUnion(transformer_list = [ > ('categorical', MultiColumn(cat_indices1)), > > #numeric > ('numeric', Pipeline(steps = [ > ('select', MultiColumn(num_indices1)), > ('scale', StandardScaler()) > ])) > ])), > ('clf', rg) > ] > ) > > Therefore, as it is indicated I have 5 categorical features. Really, > indices 123 to 160 are related to one categorical feature with 37 possible > values which is converted into 37 columns using get_dummies. > Sorry, I think SMOTENC should be inserted before the classifier ('clf', > reg) but I don't know how to define "categorical_features" in SMOTENC. > Besides, could you please let me know where to use imblearn.pipeline? > > Thanks in advance. > Best regards, > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Imblearn: SMOTENC
Dear Scikit-learners Hi. I would greatly appreciate if you could let me know how to use SMOTENC. I wrote: num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values) print(len(num_indices1)) print(len(cat_indices1)) pipeline=Pipeline(steps= [ # Categorical features ('feature_processing', FeatureUnion(transformer_list = [ ('categorical', MultiColumn(cat_indices1)), #numeric ('numeric', Pipeline(steps = [ ('select', MultiColumn(num_indices1)), ('scale', StandardScaler()) ])) ])), ('clf', rg) ] ) Therefore, as it is indicated I have 5 categorical features. Really, indices 123 to 160 are related to one categorical feature with 37 possible values which is converted into 37 columns using get_dummies. Sorry, I think SMOTENC should be inserted before the classifier ('clf', reg) but I don't know how to define "categorical_features" in SMOTENC. Besides, could you please let me know where to use imblearn.pipeline? Thanks in advance. Best regards, ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn