Dear Anupam,

At 12:07 AM 9/14/2003 -0400, [EMAIL PROTECTED] wrote:
Hi John,
thanks for the suggestion.  What would one consider as large range?

If the largest case weight corresponds to a probability of inclusion of 1, then the probability of inclusion for other cases is weight/max.weight, and the expected number of cases in the plot is n*average.weight/max.weight. You don't want this number to be too small. In your case, the ratio of the average to maximum weight is 0.013; assuming about 200,000 valid observations, you'd have about 2600 points in the plot, which seems reasonable (but see below).


> summary(finalwt)
   Min. 1st Qu.  Median    Mean          3rd Qu.             Max.
    1.8   192.1   462.7      872.8          1018.0          67150.0
The sample is large: about 250,000.
How large a sample should one draw from the sample? There are also plenty of
missing values.

Since you can't plot the missing values, the effective n is the number of valid cases. Making a scatterplot with a very large number of points is almost surely going to be uninformative (irrespective of the issue of weighting), and I'd consider an alternative, such as a bivariate nonparametric density estimate, possibly showing outlying points individually. (My second suggestion should produce a similar result.)


If you want to sample, I'd proceed by trial and error, adjusting both the sample size and point size. For example, you can decrease the sample size by scaling the weights down. A rule for how many points to include would be hard to come by, because a reasonable answer depends upon the configuration of the data, but I suspect that several tens of thousands of points would generally be too many.

Perhaps someone else will have better ideas.

Regards,
John


In a message dated 9/12/03 10:48:06 PM Pacific Daylight Time,
[EMAIL PROTECTED] writes:

> Dear Anupam,
>
> I may be wrong, but I don't think that there's any standard method to use
> in plotting with case weights. I can think of two approaches, however: (1)
> If you have a large sample, and if the range of the weights isn't too
> large, you could sample your observations with probability of inclusion in
> the plot proportional to the case weights. (2) You could plot the points
> with "size" proportional to the square root of the case weights (i.e., area
> proportional to the weights).
>
> I hope that this helps,
> John
>
[[alternative HTML version deleted]]


______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

----------------------------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: [EMAIL PROTECTED] phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Reply via email to