Hi Sheila,
The purpose of my comment was to help you fix your experimental setup, not
improve accuracy.
In fact, scaling the entire data before splitting is expected to work
better, but this is cheating.
In your case, you can just give up scaling your data, since the "natural"
scale works better.
I have changed the code, still I don't see much difference.
The non-scaled data-set is giving more accuracy then scaled.
Should I apply dimension selection first?
And what are the easy methods to start with?
Thanks
--
Sheila
On 8 July 2014 17:02, Mathieu Blondel wrote:
>
>
>
> On Tue, Jul 8, 20
On Tue, Jul 8, 2014 at 11:27 PM, Sheila the angel
wrote:
> First I scaled the complete data-set and then splitting it in test and
> train data.
>
You should not pre-process the data before splitting it. Just ask yourself
how you would use your model in practice. In a real-world setting, you
woul
2014-07-08 16:27 GMT+02:00 Sheila the angel :
> First I scaled the complete data-set and then splitting it in test and train
> data.
Not the cleanest option, but that should work.
--
Open source business process managemen
First I scaled the complete data-set and then splitting it in test and
train data.
On 8 July 2014 16:13, Lars Buitinck wrote:
> 2014-07-08 16:00 GMT+02:00 Michael Eickenberg <
> michael.eickenb...@gmail.com>:
> > That totally depends on your data. Here it looks like you are scaling
> down a
>
2014-07-08 16:00 GMT+02:00 Michael Eickenberg :
> That totally depends on your data. Here it looks like you are scaling down a
> feature that captures a lot of the variation you are looking for, thus
> making it less important with respect to the other features in the euclidean
> distance. You coul
That totally depends on your data. Here it looks like you are scaling down
a feature that captures a lot of the variation you are looking for, thus
making it less important with respect to the other features in the
euclidean distance. You could try selecting important features beforehand.
But they