Re: [R] Bias in sample - Logistic Regression

Wensui Liu Wed, 01 Oct 2008 20:10:29 -0700

Hi, Shiva,

The idea of reject inference is very simple. Let's assume a credit card
environment. There are 100 applicants, out of which 50 will be approved and
booked in. Therefore, we can only observe the adverse behavior, such as
default and delinquency, of 50 booked accounts. Again, let's assume out of
50 booked cards, 5 are bad(default / delinquency). A normal thought is to
build a model to "cherry pick" bad guys and then apply the same model to all
applicants.


However, we can only observed the behavior of the applicants booked, which
is 50, but not all applicants, which is 100. Therefore, the model result
looks better than what it is supposed to be. This is so-called 'sample
bias'. The same thing can happen to healthcare or direct marketing as well.

Luckily enough, many people have done some excellent work on this problem.
Please do some readings by Heckman. Greene in NYU has paper in this area as
well. And I believe there is also implementation in R. If you use SAS(large
in industry), take a look at proc qlim.

HTH.

-- 
===============================
WenSui Liu
Acquisition Risk, Chase
Email : [EMAIL PROTECTED]
Blog   : statcompute.spaces.live.com
===============================

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bias in sample - Logistic Regression

Reply via email to