Re: [scikit-learn] Imblearn: SMOTENC

2019-01-26 Thread S Hamidizade
Thanks. The code is provided here:
https://github.com/scikit-learn-contrib/imbalanced-learn/issues/537

Best regards,

On Thu, Jan 24, 2019 at 7:15 PM Guillaume Lemaître 
wrote:

> You should open a ticket on imbalanced-learn GitHub issue. This is easier
> to post a reproducible example and for us to test it.
> From the error message, I can understand that you have 161 features and
> require a feature above the index 160.
>
>
>
> On Thu, 24 Jan 2019 at 16:19, S Hamidizade  wrote:
>
>> Thanks. Unfortunately, now the error is:
>> ValueError: Some of the categorical indices are out of range. Indices
>> should be between 0 and 160.
>> Best regards,
>>
>> On Sun, Jan 20, 2019 at 8:31 PM S Hamidizade 
>> wrote:
>>
>>> Dear Scikit-learners
>>> Hi.
>>>
>>> I would greatly appreciate if you could let me know how to use
>>> SMOTENC.  I wrote:
>>>
>>> num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values)
>>> cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values)
>>> print(len(num_indices1))
>>> print(len(cat_indices1))
>>>
>>> pipeline=Pipeline(steps= [
>>> # Categorical features
>>> ('feature_processing', FeatureUnion(transformer_list = [
>>> ('categorical', MultiColumn(cat_indices1)),
>>>
>>> #numeric
>>> ('numeric', Pipeline(steps = [
>>> ('select', MultiColumn(num_indices1)),
>>> ('scale', StandardScaler())
>>> ]))
>>> ])),
>>> ('clf', rg)
>>> ]
>>> )
>>>
>>> Therefore, as it is indicated I have 5 categorical features. Really,
>>> indices 123 to 160 are related to one categorical feature with 37 possible
>>> values which is converted into 37 columns using get_dummies.
>>>  Sorry, I think SMOTENC should be inserted before the classifier ('clf',
>>> reg) but I don't know how to define "categorical_features" in SMOTENC.
>>> Besides, could you please let me know where to use imblearn.pipeline?
>>>
>>> Thanks in advance.
>>> Best regards,
>>>
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
> Guillaume Lemaitre
> INRIA Saclay - Parietal team
> Center for Data Science Paris-Saclay
> https://glemaitre.github.io/
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Imblearn: SMOTENC

2019-01-24 Thread Guillaume Lemaître
You should open a ticket on imbalanced-learn GitHub issue. This is easier
to post a reproducible example and for us to test it.
>From the error message, I can understand that you have 161 features and
require a feature above the index 160.



On Thu, 24 Jan 2019 at 16:19, S Hamidizade  wrote:

> Thanks. Unfortunately, now the error is:
> ValueError: Some of the categorical indices are out of range. Indices
> should be between 0 and 160.
> Best regards,
>
> On Sun, Jan 20, 2019 at 8:31 PM S Hamidizade 
> wrote:
>
>> Dear Scikit-learners
>> Hi.
>>
>> I would greatly appreciate if you could let me know how to use
>> SMOTENC.  I wrote:
>>
>> num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values)
>> cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values)
>> print(len(num_indices1))
>> print(len(cat_indices1))
>>
>> pipeline=Pipeline(steps= [
>> # Categorical features
>> ('feature_processing', FeatureUnion(transformer_list = [
>> ('categorical', MultiColumn(cat_indices1)),
>>
>> #numeric
>> ('numeric', Pipeline(steps = [
>> ('select', MultiColumn(num_indices1)),
>> ('scale', StandardScaler())
>> ]))
>> ])),
>> ('clf', rg)
>> ]
>> )
>>
>> Therefore, as it is indicated I have 5 categorical features. Really,
>> indices 123 to 160 are related to one categorical feature with 37 possible
>> values which is converted into 37 columns using get_dummies.
>>  Sorry, I think SMOTENC should be inserted before the classifier ('clf',
>> reg) but I don't know how to define "categorical_features" in SMOTENC.
>> Besides, could you please let me know where to use imblearn.pipeline?
>>
>> Thanks in advance.
>> Best regards,
>>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Imblearn: SMOTENC

2019-01-24 Thread S Hamidizade
Thanks. Unfortunately, now the error is:
ValueError: Some of the categorical indices are out of range. Indices
should be between 0 and 160.
Best regards,

On Sun, Jan 20, 2019 at 8:31 PM S Hamidizade  wrote:

> Dear Scikit-learners
> Hi.
>
> I would greatly appreciate if you could let me know how to use SMOTENC.  I
> wrote:
>
> num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values)
> cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values)
> print(len(num_indices1))
> print(len(cat_indices1))
>
> pipeline=Pipeline(steps= [
> # Categorical features
> ('feature_processing', FeatureUnion(transformer_list = [
> ('categorical', MultiColumn(cat_indices1)),
>
> #numeric
> ('numeric', Pipeline(steps = [
> ('select', MultiColumn(num_indices1)),
> ('scale', StandardScaler())
> ]))
> ])),
> ('clf', rg)
> ]
> )
>
> Therefore, as it is indicated I have 5 categorical features. Really,
> indices 123 to 160 are related to one categorical feature with 37 possible
> values which is converted into 37 columns using get_dummies.
>  Sorry, I think SMOTENC should be inserted before the classifier ('clf',
> reg) but I don't know how to define "categorical_features" in SMOTENC.
> Besides, could you please let me know where to use imblearn.pipeline?
>
> Thanks in advance.
> Best regards,
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Imblearn: SMOTENC

2019-01-23 Thread Guillaume Lemaître
As stated in the doc, categorical_features are the indices of the categorical 
column and not the name of the columns. This is similar to the one hot encoder 
API. 

Sent from my phone - sorry to be brief and potential misspell.


___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Imblearn: SMOTENC

2019-01-23 Thread S Hamidizade
Dear Mr. Lemaitre

Thanks a lot for sharing your time and knowledge. Unfortunately, it throws
the following error:

Traceback (most recent call last):
119
  File
"D:/mifs-master_2/MU/learning-from-imbalanced-classes-master/learning-from-imbalanced-classes-master/continuous/Final
Logit/SMOTENC/logit-final - Copy.py", line 419, in 
41
pipeline_with_resampling =
make_pipeline(SMOTENC(categorical_features=cat_indices1), pipeline)
  File
"C:\Users\Markazi.co\Anaconda3\lib\site-packages\sklearn\pipeline.py", line
594, in make_pipeline
return Pipeline(_name_estimators(steps), memory=memory)
  File
"C:\Users\Markazi.co\Anaconda3\lib\site-packages\sklearn\pipeline.py", line
119, in __init__
self._validate_steps()
  File
"C:\Users\Markazi.co\Anaconda3\lib\site-packages\sklearn\pipeline.py", line
167, in _validate_steps
" '%s' (type %s) doesn't" % (t, type(t)))
TypeError: All intermediate steps should be transformers and implement fit
and transform. 'SMOTENC(categorical_features=['x95', 'x97', 'x99', 'x100',
'x121_1', 'x121_2', 'x121_3', 'x121_4', 'x121_5', 'x121_6', 'x121_7',
'x121_8', 'x121_9', 'x121_10', 'x121_11', 'x121_12', 'x121_13', 'x121_14',
'x121_15', 'x121_16', 'x121_17', 'x121_18', 'x121_19', 'x121_20',
'x121_21', 'x121_22', 'x121_23', 'x121_24', 'x121_25', 'x121_26',
'x121_27', 'x121_28', 'x121_29', 'x121_30', 'x121_31', 'x121_32',
'x121_33', 'x121_34', 'x121_35', 'x121_36', 'x121_37'],
k_neighbors=5, n_jobs=1, random_state=None, sampling_strategy='auto')'
(type ) doesn't

Thanks in advance.
Best regards,

On Mon, Jan 21, 2019 at 2:26 PM Guillaume Lemaître 
wrote:

> SMOTENC will internally one hot encode the features, generate new
> features, and finally decode.
> So you need to do something like:
>
>
> from imblearn.pipeline import make_pipeline, Pipeline
>
> num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values)
> cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values)
> print(len(num_indices1))
> print(len(cat_indices1))
>
> pipeline=Pipeline(steps= [
> # Categorical features
> ('feature_processing', FeatureUnion(transformer_list = [
> ('categorical', MultiColumn(cat_indices1)),
>
> #numeric
> ('numeric', Pipeline(steps = [
> ('select', MultiColumn(num_indices1)),
> ('scale', StandardScaler())
> ]))
> ])),
> ('clf', rg)
> ]
> )
>
> pipeline_with_resampling = 
> make_pipeline(SMOTENC(categorical_features=cat_indices_1), pipeline)
>
>
>
>
> On Sun, 20 Jan 2019 at 18:05, S Hamidizade  wrote:
>
>> Dear Scikit-learners
>> Hi.
>>
>> I would greatly appreciate if you could let me know how to use
>> SMOTENC.  I wrote:
>>
>> num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values)
>> cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values)
>> print(len(num_indices1))
>> print(len(cat_indices1))
>>
>> pipeline=Pipeline(steps= [
>> # Categorical features
>> ('feature_processing', FeatureUnion(transformer_list = [
>> ('categorical', MultiColumn(cat_indices1)),
>>
>> #numeric
>> ('numeric', Pipeline(steps = [
>> ('select', MultiColumn(num_indices1)),
>> ('scale', StandardScaler())
>> ]))
>> ])),
>> ('clf', rg)
>> ]
>> )
>>
>> Therefore, as it is indicated I have 5 categorical features. Really,
>> indices 123 to 160 are related to one categorical feature with 37 possible
>> values which is converted into 37 columns using get_dummies.
>>  Sorry, I think SMOTENC should be inserted before the classifier ('clf',
>> reg) but I don't know how to define "categorical_features" in SMOTENC.
>> Besides, could you please let me know where to use imblearn.pipeline?
>>
>> Thanks in advance.
>> Best regards,
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
> Guillaume Lemaitre
> INRIA Saclay - Parietal team
> Center for Data Science Paris-Saclay
> https://glemaitre.github.io/
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Imblearn: SMOTENC

2019-01-21 Thread Guillaume Lemaître
SMOTENC will internally one hot encode the features, generate new features,
and finally decode.
So you need to do something like:


from imblearn.pipeline import make_pipeline, Pipeline

num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values)
cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values)
print(len(num_indices1))
print(len(cat_indices1))

pipeline=Pipeline(steps= [
# Categorical features
('feature_processing', FeatureUnion(transformer_list = [
('categorical', MultiColumn(cat_indices1)),

#numeric
('numeric', Pipeline(steps = [
('select', MultiColumn(num_indices1)),
('scale', StandardScaler())
]))
])),
('clf', rg)
]
)

pipeline_with_resampling =
make_pipeline(SMOTENC(categorical_features=cat_indices_1), pipeline)




On Sun, 20 Jan 2019 at 18:05, S Hamidizade  wrote:

> Dear Scikit-learners
> Hi.
>
> I would greatly appreciate if you could let me know how to use SMOTENC.  I
> wrote:
>
> num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values)
> cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values)
> print(len(num_indices1))
> print(len(cat_indices1))
>
> pipeline=Pipeline(steps= [
> # Categorical features
> ('feature_processing', FeatureUnion(transformer_list = [
> ('categorical', MultiColumn(cat_indices1)),
>
> #numeric
> ('numeric', Pipeline(steps = [
> ('select', MultiColumn(num_indices1)),
> ('scale', StandardScaler())
> ]))
> ])),
> ('clf', rg)
> ]
> )
>
> Therefore, as it is indicated I have 5 categorical features. Really,
> indices 123 to 160 are related to one categorical feature with 37 possible
> values which is converted into 37 columns using get_dummies.
>  Sorry, I think SMOTENC should be inserted before the classifier ('clf',
> reg) but I don't know how to define "categorical_features" in SMOTENC.
> Besides, could you please let me know where to use imblearn.pipeline?
>
> Thanks in advance.
> Best regards,
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Imblearn: SMOTENC

2019-01-20 Thread S Hamidizade
Dear Scikit-learners
Hi.

I would greatly appreciate if you could let me know how to use SMOTENC.  I
wrote:

num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values)
cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values)
print(len(num_indices1))
print(len(cat_indices1))

pipeline=Pipeline(steps= [
# Categorical features
('feature_processing', FeatureUnion(transformer_list = [
('categorical', MultiColumn(cat_indices1)),

#numeric
('numeric', Pipeline(steps = [
('select', MultiColumn(num_indices1)),
('scale', StandardScaler())
]))
])),
('clf', rg)
]
)

Therefore, as it is indicated I have 5 categorical features. Really,
indices 123 to 160 are related to one categorical feature with 37 possible
values which is converted into 37 columns using get_dummies.
 Sorry, I think SMOTENC should be inserted before the classifier ('clf',
reg) but I don't know how to define "categorical_features" in SMOTENC.
Besides, could you please let me know where to use imblearn.pipeline?

Thanks in advance.
Best regards,
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn