Hi All, I have a problem with weighted logistic regression. I have a number of SNPs and a case/control scenario, but not all genotypes are as "guaranteed" as others, so I am using weights to downsample the importance of individuals whose genotype has been heavily "inferred".
My data is quite big, but with a dummy example: > status <- c(1,1,1,0,0) > SNPs <- matrix( c(1,0,1,0,0,0,0,1,0,1,0,1,0,1,1), ncol =3) > weight <- c(0.2, 0.1, 1, 0.8, 0.7) > glm(status ~ SNPs, weights = weight, family = binomial) Call: glm(formula = status ~ SNPs, family = binomial, weights = weight) Coefficients: (Intercept) SNPs1 SNPs2 SNPs3 -2.079 42.282 -18.964 NA Degrees of Freedom: 4 Total (i.e. Null); 2 Residual Null Deviance: 3.867 Residual Deviance: 0.6279 AIC: 6.236 Warning messages: 1: non-integer #successes in a binomial glm! in: eval(expr, envir, enclos) 2: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, NB I do not get warning (2) for my data so I'll completely disregard it. Warning (1) looks suspiciously like a multiplication of my C/C status by the weights... what exacly is glm doing with the weight vector? In any case, how would I go about weighting my individuals in a logistic regression? Regards, Federico Calboli -- Federico C. F. Calboli Department of Epidemiology and Public Health Imperial College, St Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html