Re: [R] A question on Statistics regarding regression
This is probably better for Cross Validated [https://stats.stackexchange.com]. Surprisingly, I can't quickly find an answered question on this topic. My "tl;dr" answer would be: "inflated" relative to what? Having an unbalanced sample certainly decreases the *power* of an analysis, but there's nothing 'incorrect' (AFAICS) with the estimated SEs, and no reason to try to fix them. https://stats.stackexchange.com/questions/23108/unbalanced-design-effect https://stats.stackexchange.com/questions/347050/unbalanced-sample-in-dummy-variable-for-ols-linear-regression On 8/24/24 14:15, Jeff Newmiller via R-help wrote: you say you asked elsewhere, but so many hits come up when I just search for "unbalanced sample size" your justification for not following the posting guide does not seem honest. I also recall that various discussions of statistical power address this in basic statistics. On August 24, 2024 11:05:12 AM PDT, Christofer Bogaso wrote: Hi, I have asked this question elsewhere however failed to get any response, so hoping to get some insight from experts and statisticians here. Let say we are fitting a regression equation where one explanatory variable is categorical with 2 categories. However in the sample, one category has 95% of values but other category has just 5%. Means, the categories are highly unbalanced. Typically SE of estimate may be inflated for such highly unbalanced categorical explanatory variable. Such unbalanced case may come from 2 scenarios 1) there is a flaw in sample or it is just by chance that second category has just 5% values in the sample or 2) in the population itself, the second category has very small number of occurrences which is reflected in the sample. My question how the SE would be impacted in above 2 cases? Will the impact be same i.e. we would get incorrect estimate of SE in both cases? If yes, is there any way to prove analytically or may be based on simulation? My apologies as this question is not directly R related. However I just wanted to get some insight on above problem related to Statistics >from some of the great Statisticians in this forum. Thanks for your time. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A question on Statistics regarding regression
you say you asked elsewhere, but so many hits come up when I just search for "unbalanced sample size" your justification for not following the posting guide does not seem honest. I also recall that various discussions of statistical power address this in basic statistics. On August 24, 2024 11:05:12 AM PDT, Christofer Bogaso wrote: >Hi, > >I have asked this question elsewhere however failed to get any >response, so hoping to get some insight from experts and statisticians >here. > >Let say we are fitting a regression equation where one explanatory >variable is categorical with 2 categories. However in the sample, one >category has 95% of values but other category has just 5%. Means, the >categories are highly unbalanced. > >Typically SE of estimate may be inflated for such highly unbalanced >categorical explanatory variable. > >Such unbalanced case may come from 2 scenarios 1) there is a flaw in >sample or it is just by chance that second category has just 5% values >in the sample or 2) in the population itself, the second category has >very small number of occurrences which is reflected in the sample. > >My question how the SE would be impacted in above 2 cases? Will the >impact be same i.e. we would get incorrect estimate of SE in both >cases? If yes, is there any way to prove analytically or may be based >on simulation? > >My apologies as this question is not directly R related. However I >just wanted to get some insight on above problem related to Statistics >from some of the great Statisticians in this forum. > >Thanks for your time. > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide https://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A question on Statistics regarding regression
Hi, I have asked this question elsewhere however failed to get any response, so hoping to get some insight from experts and statisticians here. Let say we are fitting a regression equation where one explanatory variable is categorical with 2 categories. However in the sample, one category has 95% of values but other category has just 5%. Means, the categories are highly unbalanced. Typically SE of estimate may be inflated for such highly unbalanced categorical explanatory variable. Such unbalanced case may come from 2 scenarios 1) there is a flaw in sample or it is just by chance that second category has just 5% values in the sample or 2) in the population itself, the second category has very small number of occurrences which is reflected in the sample. My question how the SE would be impacted in above 2 cases? Will the impact be same i.e. we would get incorrect estimate of SE in both cases? If yes, is there any way to prove analytically or may be based on simulation? My apologies as this question is not directly R related. However I just wanted to get some insight on above problem related to Statistics from some of the great Statisticians in this forum. Thanks for your time. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A question on Statistics
I derive posting guide from https://www.r-project.org/posting-guide.html I am imagining a distribution where mean is zero but there are few large observations in the positive side which are not very frequent. On Sun, Jul 1, 2018 at 8:29 PM Bert Gunter wrote: > From the posting guide: > > "*R-help* is intended to be comprehensible to people who want to use R to > solve problems but who are not necessarily interested in or knowledgeable > about programming." > > This says to me that R-help is for general questions about R programming, > not statistics, though I grant you that the intersection is nonempty. > Nevertheless, purely statistical issues should be posted elsewhere, and > your query appears to be such. > > However, I'll just note: what does "centered at 0" mean for an asymmetric > distribution? I think you may need to reconsider Jeff's advice. > > > Cheers, > Bert > > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > On Sun, Jul 1, 2018 at 5:53 AM, Christofer Bogaso < > bogaso.christo...@gmail.com> wrote: > >> Hi, >> >> I could post in StackExchange for sure, however I dont think R-help >> posting >> guide discourage asking a question about Statistics, atleast formally. >> >> I could further clarify if my question is not elaborate enough. And many >> apologies if it is very trivial - however still I am looking for 2nd >> opinion on my question. >> >> Answer to Jeff's pointer - yes my distribution is assumed to be centered >> at >> 0. >> >> Thanks, >> >> On Sun, Jul 1, 2018 at 8:04 AM Hasan Diwan wrote: >> >> > Christofer, >> > On Sat, 30 Jun 2018 at 12:54, Jeff Newmiller >> > wrote: >> > > >> > > You should use Stack Exchange for questions about statistics. >> > >> > Specifically, https://stats.stackexchange.com/ -- H >> > -- >> > OpenPGP: >> > https://sks-keyservers.net/pks/lookup?op=get&search=0xFEBAD7FFD041BBA1 >> > If you wish to request my time, please do so using >> > bit.ly/hd1AppointmentRequest. >> > Si vous voudrais faire connnaisance, allez a >> bit.ly/hd1AppointmentRequest. >> > >> > Sent from my mobile device >> > Envoye de mon portable >> > >> > __ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A question on Statistics
>From the posting guide: "*R-help* is intended to be comprehensible to people who want to use R to solve problems but who are not necessarily interested in or knowledgeable about programming." This says to me that R-help is for general questions about R programming, not statistics, though I grant you that the intersection is nonempty. Nevertheless, purely statistical issues should be posted elsewhere, and your query appears to be such. However, I'll just note: what does "centered at 0" mean for an asymmetric distribution? I think you may need to reconsider Jeff's advice. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, Jul 1, 2018 at 5:53 AM, Christofer Bogaso < bogaso.christo...@gmail.com> wrote: > Hi, > > I could post in StackExchange for sure, however I dont think R-help posting > guide discourage asking a question about Statistics, atleast formally. > > I could further clarify if my question is not elaborate enough. And many > apologies if it is very trivial - however still I am looking for 2nd > opinion on my question. > > Answer to Jeff's pointer - yes my distribution is assumed to be centered at > 0. > > Thanks, > > On Sun, Jul 1, 2018 at 8:04 AM Hasan Diwan wrote: > > > Christofer, > > On Sat, 30 Jun 2018 at 12:54, Jeff Newmiller > > wrote: > > > > > > You should use Stack Exchange for questions about statistics. > > > > Specifically, https://stats.stackexchange.com/ -- H > > -- > > OpenPGP: > > https://sks-keyservers.net/pks/lookup?op=get&search=0xFEBAD7FFD041BBA1 > > If you wish to request my time, please do so using > > bit.ly/hd1AppointmentRequest. > > Si vous voudrais faire connnaisance, allez a > bit.ly/hd1AppointmentRequest. > > > > Sent from my mobile device > > Envoye de mon portable > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A question on Statistics
Hi, I could post in StackExchange for sure, however I dont think R-help posting guide discourage asking a question about Statistics, atleast formally. I could further clarify if my question is not elaborate enough. And many apologies if it is very trivial - however still I am looking for 2nd opinion on my question. Answer to Jeff's pointer - yes my distribution is assumed to be centered at 0. Thanks, On Sun, Jul 1, 2018 at 8:04 AM Hasan Diwan wrote: > Christofer, > On Sat, 30 Jun 2018 at 12:54, Jeff Newmiller > wrote: > > > > You should use Stack Exchange for questions about statistics. > > Specifically, https://stats.stackexchange.com/ -- H > -- > OpenPGP: > https://sks-keyservers.net/pks/lookup?op=get&search=0xFEBAD7FFD041BBA1 > If you wish to request my time, please do so using > bit.ly/hd1AppointmentRequest. > Si vous voudrais faire connnaisance, allez a bit.ly/hd1AppointmentRequest. > > Sent from my mobile device > Envoye de mon portable > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A question on Statistics
Christofer, On Sat, 30 Jun 2018 at 12:54, Jeff Newmiller wrote: > > You should use Stack Exchange for questions about statistics. Specifically, https://stats.stackexchange.com/ -- H -- OpenPGP: https://sks-keyservers.net/pks/lookup?op=get&search=0xFEBAD7FFD041BBA1 If you wish to request my time, please do so using bit.ly/hd1AppointmentRequest. Si vous voudrais faire connnaisance, allez a bit.ly/hd1AppointmentRequest. Sent from my mobile device Envoye de mon portable __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A question on Statistics
You should use Stack Exchange for questions about statistics. You should also think a bit before you post, regardless of where. You are the one who described this as a highly asymmetric distribution, and didn't say anything about it being centered at zero. You already answered your own question, to the extent that it can be answered. On Sun, 1 Jul 2018, Christofer Bogaso wrote: Hi, I have a quick question on Statistical distribution as follows, hoping Statisticians here would give me very insightful feedback. Say, I have a large sample from a highly asymmetric distribution ranging from -Inf to +Inf. Now I wish to calculate sample X1 and X2 within which middle 70% probability would reside. One approach x = my sample calculatte quantile(x, prob = 15%) & quantile(x, prob = 85%) another approach calculate quantile(abs[x], prob = 85%) In this case X1 and X2 would be +/- of above result. My question is in all scenarios, are above two approach equivalent? If not which is the better approach to find such range. Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- Jeff NewmillerThe . . Go Live... DCN:Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A question on Statistics
Hi, I have a quick question on Statistical distribution as follows, hoping Statisticians here would give me very insightful feedback. Say, I have a large sample from a highly asymmetric distribution ranging from -Inf to +Inf. Now I wish to calculate sample X1 and X2 within which middle 70% probability would reside. One approach x = my sample calculatte quantile(x, prob = 15%) & quantile(x, prob = 85%) another approach calculate quantile(abs[x], prob = 85%) In this case X1 and X2 would be +/- of above result. My question is in all scenarios, are above two approach equivalent? If not which is the better approach to find such range. Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A question on Statistics
Maithula: On Sun, Dec 26, 2010 at 11:09 AM, Maithula Chandrashekhar wrote: > I am not a pure Statistics background and therefore please forgive me if > this question (which is not R related either) is too trivial. > > In many Statistics literature I find following statement: "restrictions in > different coefficients matrices have to be imposed to ensure uniqueness of > the parametrization". Can somebody tell me what is the meaning of Uniqueness > in the parametrization? Does it mean that, two different coefficient > matrices may give exactly the same result, and therefore coefficient matrix > is not unique? -- yes. See the section on "contrast matrices" in Venables and Ripley's "Modern Applied Statistics with S" (MASS) for a concise but, I think, illuminating explanation. (It's in the chapter on linear models/regression). -- Bert > > I find there are many members (perhaps all) in this forum who are really > masters in Statistics. Therefore I hope somebody will clarify me with the > intuition behind that. > > Thanks, > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Bert Gunter Genentech Nonclinical Biostatistics 467-7374 http://devo.gene.com/groups/devo/depts/ncb/home.shtml __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A question on Statistics
I am not a pure Statistics background and therefore please forgive me if this question (which is not R related either) is too trivial. In many Statistics literature I find following statement: "restrictions in different coefficients matrices have to be imposed to ensure uniqueness of the parametrization". Can somebody tell me what is the meaning of Uniqueness in the parametrization? Does it mean that, two different coefficient matrices may give exactly the same result, and therefore coefficient matrix is not unique? I find there are many members (perhaps all) in this forum who are really masters in Statistics. Therefore I hope somebody will clarify me with the intuition behind that. Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.