Why do you care what distribution your data comes from?

That is a serious question, the more we know about what your actual 
question/goal is, the more we can help.  It is a common mistake for people who 
know enough statistics to be dangerous to focus on the distribution of the data 
rather than the question of interest.

Many of the traditional statistical tests/models based on the assumption of 
normality are still useful when the data does not follow a normal distribution 
as long as the sample size is large enough.  If the above does not hold, there 
are often alternative tests/methods that don't rely on a specific known 
distribution.  A simple transformation may get you close enough.

David mentioned that some cases it is the distribution of the errors, not the 
original data that matters.  Some people mistakenly think that the explanatory 
variables need to be normal in a regression as well, but that is not needed.

For finding transformations that get you closer to normal, look at the boxcox 
function in the MASS package and possibly the vis.boxcox and vis.boxcoxu 
functions in the TeachingDemos package (and the paper referenced there).

If you really want to know the distribution of the data, you should start with 
the science, not the data and examples.  Random chance can make data from one 
distribution look like it comes from a different but similar one.  Start with 
the nature of the problem (without looking at the data), will the values be 
discrete or continuous?, is there a lower/upper limit on the values possible?  
Is it likely to be skewed (have extreme values in one direction)?  What 
distributions are commonly used in this area?  Etc.  Answering those questions 
can narrow down the candidates.

Hope this helps,


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Jason Rupert
> Sent: Friday, February 13, 2009 6:12 AM
> To: Gabor Grothendieck
> Cc: R-help@r-project.org
> Subject: Re: [R] Website, book, paper, etc. that shows example plots of
> distributions?
> 
> Thank you very much.  Thank you again regarding the suggestion below.
> I will give that a shot and I guess I've got my work counted out for
> me.  I counted 45 different distributions.
> 
> Is the best way to get a QQPlot of each, to run through producing a
> data set for each distribution and then using the qqplot function to
> get a QQplot of the distribution and then compare it with my data
> distribution?
> 
> As you can tell I am not a trained statistician, so any guidance or
> suggested further reading is greatly appreciated.
> 
> I guess I am pretty sure my data is not a normal distribution due to
> doing some of the empirical "Goodness of Fit" tests and comparing the
> QQplot of my data against the QQPlot of a normal distribution with the
> same number of points.  I guess the next step is to figure out which
> distribution my data most closely matches.
> 
> Also, I guess I could also fool around and take the log, sqrt, etc. of
> my data and see if it will then more closely resemble a normal
> distribution.
> 
> Thank you again for assisting this novice data analyst who is trying to
> gain a better understanding of the techniques using this powerful
> software package.
> 
> 
> 
> 
> --- On Fri, 2/13/09, Gabor Grothendieck <ggrothendi...@gmail.com>
> wrote:
> From: Gabor Grothendieck <ggrothendi...@gmail.com>
> Subject: Re: [R] Website, book, paper, etc. that shows example plots of
> distributions?
> To: jasonkrup...@yahoo.com
> Cc: R-help@r-project.org
> Date: Friday, February 13, 2009, 5:43 AM
> 
> You can readily create a dynamic display for using qqplot and similar
> functions
> in conjunction with either the playwith or TeachingDemos packages.
> 
> For example, to investigate the effect of the shape parameter in the
> skew
> normal distribution on its qqplot relative to the normal distribution:
> 
>    library(playwith)
>    library(sn)
>    playwith(qqnorm(rsn(100, shape = shape)),
>        parameters = list(shape = seq(-3, 3, .1)))
> 
> Now move the slider located at the bottom of the window that
> appears and watch the plot change in response to changing
> the shape value.
> 
> You can find more distributions here:
> http://cran.r-project.org/web/views/Distributions.html
> 
> On Thu, Feb 12, 2009 at 1:04 PM, Jason Rupert <jasonkrup...@yahoo.com>
> wrote:
> > By any chance is any one aware of a website, book, paper, etc. or
> combinations of those sources that show plots of different
> distributions?
> >
> > After reading a pretty good whitepaper I became aware of the benefit
> of I
> the benefit of doing Q-Q plots and histograms to help assess a
> distribution.
> The whitepaper is called:
> > "Univariate Analysis and Normality Test Using SAS, Stata, and
> SPSS*" , (c) 2002-2008 The Trustees of Indiana University Univariate
> Analysis and Normality Test: 1, Hun Myoung Park
> >
> > Unfortunately the white paper does not provide an extensive amount of
> example distributions plotted using Q-Q plots and histograms, so I am
> curious if
> there is a "portfolio"-type  website or other whitepaper shows
> examples of various types of distributions.
> >
> > It would be helpful to see a bunch of Q-Q plots and their associated
> histograms to get an idea of how the distribution looks in comparison
> against
> the Gaussian.
> >
> > I think seeing the plot really helps.
> >
> > Thank you for any insights.
> >
> >
> >
> >        [[alternative HTML version deleted]]
> >
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> 
> 
> 
> 
>       [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to