My problem is basically solved now. Mainly it is noisy data after original 
dataset is transformed into numeric values. The model would perform better with 
grouping categorical data than simply execute e.g. pd.factorize() function 
which may creating a large unique list. 

Thanks for all your help.

Sincerely


----- Original Message -----
From: Olivier Grisel <olivier.gri...@ensta.org>
To: ChungHung Liu <chliu52...@yahoo.co.uk>; scikit-learn-general 
<scikit-learn-general@lists.sourceforge.net>
Cc: 
Sent: Wednesday, 18 September 2013, 4:43
Subject: Re: [Scikit-learn-general] Imbalanced dataset

You might want to try to cascade a high precision linear classifier
(by tuning the intercept_ attribute based on the PR-curve) to trim
most of the majority class with a second stage classifier as described
in this paper by Google: http://research.google.com/pubs/pub37195.html

I have never tried it my-self yet but it sounds interesting to try and
should be doable by using sklearn models as building blocks.


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to