date:20190121

Re: [scikit-learn] Any clustering algo to cluster multiple timing series data?

2019-01-21 Thread lampahome

How about scaling data first by MinMaxScaler and then cluster?

What I thought is scaling can scale then into 0~1 section, and it can
ignore the quantity of each data

After scaling, it shows the increasing/decreasing ratio between each points.

Then cluster then by the eucledian distance should work?
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Imblearn: SMOTENC

2019-01-21 Thread Guillaume Lemaître

SMOTENC will internally one hot encode the features, generate new features,
and finally decode.
So you need to do something like:


from imblearn.pipeline import make_pipeline, Pipeline

num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values)
cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values)
print(len(num_indices1))
print(len(cat_indices1))

pipeline=Pipeline(steps= [
# Categorical features
('feature_processing', FeatureUnion(transformer_list = [
('categorical', MultiColumn(cat_indices1)),

#numeric
('numeric', Pipeline(steps = [
('select', MultiColumn(num_indices1)),
('scale', StandardScaler())
]))
])),
('clf', rg)
]
)

pipeline_with_resampling =
make_pipeline(SMOTENC(categorical_features=cat_indices_1), pipeline)




On Sun, 20 Jan 2019 at 18:05, S Hamidizade  wrote:

> Dear Scikit-learners
> Hi.
>
> I would greatly appreciate if you could let me know how to use SMOTENC.  I
> wrote:
>
> num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values)
> cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values)
> print(len(num_indices1))
> print(len(cat_indices1))
>
> pipeline=Pipeline(steps= [
> # Categorical features
> ('feature_processing', FeatureUnion(transformer_list = [
> ('categorical', MultiColumn(cat_indices1)),
>
> #numeric
> ('numeric', Pipeline(steps = [
> ('select', MultiColumn(num_indices1)),
> ('scale', StandardScaler())
> ]))
> ])),
> ('clf', rg)
> ]
> )
>
> Therefore, as it is indicated I have 5 categorical features. Really,
> indices 123 to 160 are related to one categorical feature with 37 possible
> values which is converted into 37 columns using get_dummies.
>  Sorry, I think SMOTENC should be inserted before the classifier ('clf',
> reg) but I don't know how to define "categorical_features" in SMOTENC.
> Besides, could you please let me know where to use imblearn.pipeline?
>
> Thanks in advance.
> Best regards,
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Any clustering algo to cluster multiple timing series data?

Re: [scikit-learn] Imblearn: SMOTENC

2 matches

Site Navigation

Mail list logo

Footer information