One solution is to simulate the population by repeating each row "weight" number of times. This is inefficient. It may create a very large dataset for a large sample survey. But some of graphs and other things may turn out to your liking, depending upon how the functions are written.
Anupam. Rick Bischoff wrote the following on 8/30/2006 7:57 PM: > The data sets I am working with all have a weight variable--e.g., > each row doesn't mean 1 observation. > > With that in mind, nearly all of the graphs and summary statistics > are incorrect for my data, because they don't take into account the > weight. > > **** > For example "median" is incorrect, as the quantiles aren't calculated > with weights: > > sum( weights[X < median(X)] ) / sum(weights) > > This should be 0.5... of course it's not. > **** > > Unfortunately, it seems that most(all?) of R's graphics and summary > statistic functions don't take a weight or frequency argument. > (Fortunately the models do...) > > Am I completely missing how to do this? One way would be to > replicate each row proportional to the weight (e.g. if the weight was > 4, we would 3 additional copies) but this will get prohibitive pretty > quickly as the dataset grows. > > > Thanks in advance! > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.