Yes, linear regression is a good place to start.  But I would consider
the robust regressions as well since they answer a different question
from regular linear regression and that question may be more
appropriate for the skewed data.

On Wed, Oct 23, 2019 at 2:52 PM <rain1...@aim.com> wrote:
>
> Hi Greg and others,
>
> Thank you for these explanations and clarifications, as they are much 
> appreciated!
>
> Indeed, I do have some datasets that exhibit some distinct skewness. Simple 
> scatter plots do show at least some linearity between my x and y variables 
> (albeit weak, given the scattered nature of data points), but could this be 
> sufficient to try simple linear regression? Also, if the data is overly 
> skewed, could transforming it (such as logarithmically) justify the use of 
> simple linear regression and/or correlation, if it causes the data to become 
> mildly skewed in distribution? I have large sample sizes for all of my 
> datasets, and the variables are continuous.
>
> That would pretty much cover all of my questions concerning this!
>
> Thank you, once again, for your time!
>
> -----Original Message-----
> From: Greg Snow <538...@gmail.com>
> To: rain1290 <rain1...@aim.com>
> Cc: r-sig-geo <r-sig-geo@r-project.org>
> Sent: Wed, Oct 23, 2019 3:49 pm
> Subject: Re: [R-sig-Geo] Alternate statistical test to linear regression?
>
> First, please expunge the "(N>30)" concept from your mind.  This is an
> oversimplified rule of thumb used in introductory statistics courses
> (I am guilty of doing this in intro stat as well, but I try to
> emphasize to my students that it is only a rule of thumb for that
> class and the truth is more complex once you are in the real world, so
> consult with a statistician).  There is nothing magical about a sample
> size of 30, I have seen cases where n=6 is large enough for the CLT
> and cases where n=10,000 was not big enough.
>
> If the data is not overly skewed and your sample size is large then
> you can just use regression as is and the inference will be
> approximately correct (with a really good approximation).  But with
> skewness we often prefer the median over the mean and least squares
> regression is equivalent to fitting a mean, some of the robust
> regression options are equivalent to fitting a median, so they may be
> preferable on that count.
>
> Note that Pearson's correlation does not test linearity, it assumes
> linearity (and bivariate normality).  Most issues with regression will
> be the same for the correlation.
>
> On Wed, Oct 23, 2019 at 11:25 AM <rain1...@aim.com> wrote:
> >
> > Hi Greg and others,
> >
> > Thank you for your very informative response! I actually made a mistake in 
> > my initial message, in that I was actually testing for the y variable, not 
> > the x. I will also look into those packages on CRAN, but even if there is 
> > some skewness on the y, because my sample size is much larger than 30 
> > (N>30), it might be safe to apply a linear regression analysis, if we can 
> > assume linearity?
> >
> > A useful alternative would be to use correlation coefficients to test the 
> > degree of association between the x and y variables; specifically, the 
> > Pearson correlation coefficient, since both x and y variables are 
> > quantitative. Does that make sense?
> >
> > Thanks again,
> >
> >
> > -----Original Message-----
> > From: Greg Snow <538...@gmail.com>
> > To: rain1290 <rain1...@aim.com>
> > Cc: r-sig-geo <r-sig-geo@r-project.org>
> > Sent: Wed, Oct 23, 2019 1:00 pm
> > Subject: Re: [R-sig-Geo] Alternate statistical test to linear regression?
> >
> > Note that the normality assumptions are about the residuals (or about
> > y conditional on x), not on the x variable(s) or all of y
> > (non-conditional).  If x is highly skewed and the residuals are normal
> > then diagnostics just on y will also show skewness (if there is a
> > relationship between x and y).
> >
> > Also, the normality assumptions are about the tests and confidence
> > intervals, the least squares fit is legitimate (but possibly not the
> > most interesting fit) whether the residuals are normal or not.  The
> > Central Limit Theorem also applies in regression, so if the residuals
> > are non-normal, but you have a large sample size then the tests and
> > intervals will still be approximately correct (with the quality of the
> > approximation depending on the degree of non-normality and sample
> > size).
> >
> > There are many alternative tools.  There is a task view on CRAN for
> > Robust Statistical Methods that gives summaries of many packages and
> > tools for robust regression (and other things as well) which does not
> > depend on the normality assumptions.
> >
> >
> > On Wed, Oct 23, 2019 at 9:21 AM rain1290--- via R-sig-Geo
> > <r-sig-geo@r-project.org> wrote:
> > >
> > > Greetings,
> > > I am testing to see if linear relationships exist between my x and y 
> > > variables. I conducted various diagnoses in R to test for normality of 
> > > the x variable data by using qqnorm, qqline and histograms that show the 
> > > distribution of the data. If the data is shown to be normally distributed 
> > > in either normal quantile plots or in the histograms (i.e. a bell 
> > > curve-shaped distribution), I would assume normality and apply the linear 
> > > regression model, using "lm". However, in some cases, my distributions do 
> > > not satisfy the normality criteria, and so I feel that using the linear 
> > > regression model, in those cases, would not be appropriate. For that 
> > > reason, would you be able to suggest an alternate test to the linear 
> > > regression model in R? Maybe a non-parametric counterpart to it?
> > > Thank you, and any help would be greatly appreciated!
> > >        [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > R-sig-Geo mailing list
> > > R-sig-Geo@r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
> >
> >
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
>
> > 538...@gmail.com
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> 538...@gmail.com



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Reply via email to