Re: [Scikit-learn-general] higher accuracy with non scaled data

2014-07-10 Thread Mathieu Blondel
Hi Sheila, The purpose of my comment was to help you fix your experimental setup, not improve accuracy. In fact, scaling the entire data before splitting is expected to work better, but this is cheating. In your case, you can just give up scaling your data, since the "natural" scale works better.

Re: [Scikit-learn-general] higher accuracy with non scaled data

2014-07-10 Thread Sheila the angel
I have changed the code, still I don't see much difference. The non-scaled data-set is giving more accuracy then scaled. Should I apply dimension selection first? And what are the easy methods to start with? Thanks -- Sheila On 8 July 2014 17:02, Mathieu Blondel wrote: > > > > On Tue, Jul 8, 20

Re: [Scikit-learn-general] higher accuracy with non scaled data

2014-07-08 Thread Mathieu Blondel
On Tue, Jul 8, 2014 at 11:27 PM, Sheila the angel wrote: > First I scaled the complete data-set and then splitting it in test and > train data. > You should not pre-process the data before splitting it. Just ask yourself how you would use your model in practice. In a real-world setting, you woul

Re: [Scikit-learn-general] higher accuracy with non scaled data

2014-07-08 Thread Lars Buitinck
2014-07-08 16:27 GMT+02:00 Sheila the angel : > First I scaled the complete data-set and then splitting it in test and train > data. Not the cleanest option, but that should work. -- Open source business process managemen

Re: [Scikit-learn-general] higher accuracy with non scaled data

2014-07-08 Thread Sheila the angel
First I scaled the complete data-set and then splitting it in test and train data. On 8 July 2014 16:13, Lars Buitinck wrote: > 2014-07-08 16:00 GMT+02:00 Michael Eickenberg < > michael.eickenb...@gmail.com>: > > That totally depends on your data. Here it looks like you are scaling > down a >

Re: [Scikit-learn-general] higher accuracy with non scaled data

2014-07-08 Thread Lars Buitinck
2014-07-08 16:00 GMT+02:00 Michael Eickenberg : > That totally depends on your data. Here it looks like you are scaling down a > feature that captures a lot of the variation you are looking for, thus > making it less important with respect to the other features in the euclidean > distance. You coul

Re: [Scikit-learn-general] higher accuracy with non scaled data

2014-07-08 Thread Michael Eickenberg
That totally depends on your data. Here it looks like you are scaling down a feature that captures a lot of the variation you are looking for, thus making it less important with respect to the other features in the euclidean distance. You could try selecting important features beforehand. But they