[scikit-learn] New Transformer to Support Multiple Column Pipelines & One Hot Encoding

2018-02-20 Thread Dale Jacques
Hello all, Long time lurker, first time emailer. I have two small contributions I would like to propose to the email list. I was working on a project this weekend that was using both categorical and numerical columns to predict a final output. I needed to save my transformations to make future p

Re: [scikit-learn] New Transformer to Support Multiple Column Pipelines & One Hot Encoding

2018-02-20 Thread Joris Van den Bossche
Hi Dale, Those two issues you mention are indeed current bottlenecks of sklearn's API, but we are currently working on trying to solve them: 1) ColumnTransformer to be able to apply different transformers to different columns: https://github.com/scikit-learn/scikit-learn/pull/9012/ 2) As you men

Re: [scikit-learn] KMeans cluster

2018-02-20 Thread Shiheng Duan
Yes, but what is used to decide the optimal output? I saw on the document, it is the best output in terms of inertia. What does that mean? Thanks. On Wed, Feb 14, 2018 at 7:46 PM, Joel Nothman wrote: > you can repeatedly use n_init=1? > > ___ > scikit-

Re: [scikit-learn] KMeans cluster

2018-02-20 Thread Sebastian Raschka
Inertia simply means the sum of the squared distances from sample points to their cluster centroid. The smaller the inertia, the closer the cluster members are to their cluster centroid (that's also what KMeans optimizes when choosing centroids). In this context, the elbow method may be helpful