Yes, linear regression is a good place to start. But I would consider the robust regressions as well since they answer a different question from regular linear regression and that question may be more appropriate for the skewed data.
On Wed, Oct 23, 2019 at 2:52 PM <rain1...@aim.com> wrote: > > Hi Greg and others, > > Thank you for these explanations and clarifications, as they are much > appreciated! > > Indeed, I do have some datasets that exhibit some distinct skewness. Simple > scatter plots do show at least some linearity between my x and y variables > (albeit weak, given the scattered nature of data points), but could this be > sufficient to try simple linear regression? Also, if the data is overly > skewed, could transforming it (such as logarithmically) justify the use of > simple linear regression and/or correlation, if it causes the data to become > mildly skewed in distribution? I have large sample sizes for all of my > datasets, and the variables are continuous. > > That would pretty much cover all of my questions concerning this! > > Thank you, once again, for your time! > > -----Original Message----- > From: Greg Snow <538...@gmail.com> > To: rain1290 <rain1...@aim.com> > Cc: r-sig-geo <r-sig-geo@r-project.org> > Sent: Wed, Oct 23, 2019 3:49 pm > Subject: Re: [R-sig-Geo] Alternate statistical test to linear regression? > > First, please expunge the "(N>30)" concept from your mind. This is an > oversimplified rule of thumb used in introductory statistics courses > (I am guilty of doing this in intro stat as well, but I try to > emphasize to my students that it is only a rule of thumb for that > class and the truth is more complex once you are in the real world, so > consult with a statistician). There is nothing magical about a sample > size of 30, I have seen cases where n=6 is large enough for the CLT > and cases where n=10,000 was not big enough. > > If the data is not overly skewed and your sample size is large then > you can just use regression as is and the inference will be > approximately correct (with a really good approximation). But with > skewness we often prefer the median over the mean and least squares > regression is equivalent to fitting a mean, some of the robust > regression options are equivalent to fitting a median, so they may be > preferable on that count. > > Note that Pearson's correlation does not test linearity, it assumes > linearity (and bivariate normality). Most issues with regression will > be the same for the correlation. > > On Wed, Oct 23, 2019 at 11:25 AM <rain1...@aim.com> wrote: > > > > Hi Greg and others, > > > > Thank you for your very informative response! I actually made a mistake in > > my initial message, in that I was actually testing for the y variable, not > > the x. I will also look into those packages on CRAN, but even if there is > > some skewness on the y, because my sample size is much larger than 30 > > (N>30), it might be safe to apply a linear regression analysis, if we can > > assume linearity? > > > > A useful alternative would be to use correlation coefficients to test the > > degree of association between the x and y variables; specifically, the > > Pearson correlation coefficient, since both x and y variables are > > quantitative. Does that make sense? > > > > Thanks again, > > > > > > -----Original Message----- > > From: Greg Snow <538...@gmail.com> > > To: rain1290 <rain1...@aim.com> > > Cc: r-sig-geo <r-sig-geo@r-project.org> > > Sent: Wed, Oct 23, 2019 1:00 pm > > Subject: Re: [R-sig-Geo] Alternate statistical test to linear regression? > > > > Note that the normality assumptions are about the residuals (or about > > y conditional on x), not on the x variable(s) or all of y > > (non-conditional). If x is highly skewed and the residuals are normal > > then diagnostics just on y will also show skewness (if there is a > > relationship between x and y). > > > > Also, the normality assumptions are about the tests and confidence > > intervals, the least squares fit is legitimate (but possibly not the > > most interesting fit) whether the residuals are normal or not. The > > Central Limit Theorem also applies in regression, so if the residuals > > are non-normal, but you have a large sample size then the tests and > > intervals will still be approximately correct (with the quality of the > > approximation depending on the degree of non-normality and sample > > size). > > > > There are many alternative tools. There is a task view on CRAN for > > Robust Statistical Methods that gives summaries of many packages and > > tools for robust regression (and other things as well) which does not > > depend on the normality assumptions. > > > > > > On Wed, Oct 23, 2019 at 9:21 AM rain1290--- via R-sig-Geo > > <r-sig-geo@r-project.org> wrote: > > > > > > Greetings, > > > I am testing to see if linear relationships exist between my x and y > > > variables. I conducted various diagnoses in R to test for normality of > > > the x variable data by using qqnorm, qqline and histograms that show the > > > distribution of the data. If the data is shown to be normally distributed > > > in either normal quantile plots or in the histograms (i.e. a bell > > > curve-shaped distribution), I would assume normality and apply the linear > > > regression model, using "lm". However, in some cases, my distributions do > > > not satisfy the normality criteria, and so I feel that using the linear > > > regression model, in those cases, would not be appropriate. For that > > > reason, would you be able to suggest an alternate test to the linear > > > regression model in R? Maybe a non-parametric counterpart to it? > > > Thank you, and any help would be greatly appreciated! > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > > R-sig-Geo mailing list > > > R-sig-Geo@r-project.org > > > https://stat.ethz.ch/mailman/listinfo/r-sig-geo > > > > > > > > > > -- > > Gregory (Greg) L. Snow Ph.D. > > > 538...@gmail.com > > > > -- > Gregory (Greg) L. Snow Ph.D. > 538...@gmail.com -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com _______________________________________________ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo