Re: [R] A question on Statistics regarding regression

2024-08-24 Thread Ben Bolker
  This is probably better for Cross Validated 
[https://stats.stackexchange.com]. Surprisingly, I can't quickly find an 
answered question on this topic. My "tl;dr" answer would be: "inflated" 
relative to what? Having an unbalanced sample certainly decreases the 
*power* of an analysis, but there's nothing 'incorrect' (AFAICS) with 
the estimated SEs, and no reason to try to fix them.


https://stats.stackexchange.com/questions/23108/unbalanced-design-effect

https://stats.stackexchange.com/questions/347050/unbalanced-sample-in-dummy-variable-for-ols-linear-regression

On 8/24/24 14:15, Jeff Newmiller via R-help wrote:

you say you asked elsewhere, but so many hits come up when I just search for 
"unbalanced sample size" your justification for not following the posting guide 
does not seem honest.

I also recall that various discussions of statistical power address this in 
basic statistics.

On August 24, 2024 11:05:12 AM PDT, Christofer Bogaso 
 wrote:

Hi,

I have asked this question elsewhere however failed to get any
response, so hoping to get some insight from experts and statisticians
here.

Let say we are fitting a regression equation where one explanatory
variable is categorical with 2 categories. However in the sample, one
category has 95% of values but other category has just 5%. Means, the
categories are highly unbalanced.

Typically SE of estimate may be inflated for such highly unbalanced
categorical explanatory variable.

Such unbalanced case may come from 2 scenarios 1) there is a flaw in
sample or it is just by chance that second category has just 5% values
in the sample or 2) in the population itself, the second category has
very small number of occurrences which is reflected in the sample.

My question how the SE would be impacted in above 2 cases? Will the
impact be same i.e. we would get incorrect estimate of SE in both
cases? If yes, is there any way to prove analytically or may be based
on simulation?

My apologies as this question is not directly R related. However I
just wanted to get some insight on above problem related to Statistics

>from some of the great Statisticians in this forum.

Thanks for your time.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A question on Statistics regarding regression

2024-08-24 Thread Jeff Newmiller via R-help
you say you asked elsewhere, but so many hits come up when I just search for 
"unbalanced sample size" your justification for not following the posting guide 
does not seem honest. 

I also recall that various discussions of statistical power address this in 
basic statistics.

On August 24, 2024 11:05:12 AM PDT, Christofer Bogaso 
 wrote:
>Hi,
>
>I have asked this question elsewhere however failed to get any
>response, so hoping to get some insight from experts and statisticians
>here.
>
>Let say we are fitting a regression equation where one explanatory
>variable is categorical with 2 categories. However in the sample, one
>category has 95% of values but other category has just 5%. Means, the
>categories are highly unbalanced.
>
>Typically SE of estimate may be inflated for such highly unbalanced
>categorical explanatory variable.
>
>Such unbalanced case may come from 2 scenarios 1) there is a flaw in
>sample or it is just by chance that second category has just 5% values
>in the sample or 2) in the population itself, the second category has
>very small number of occurrences which is reflected in the sample.
>
>My question how the SE would be impacted in above 2 cases? Will the
>impact be same i.e. we would get incorrect estimate of SE in both
>cases? If yes, is there any way to prove analytically or may be based
>on simulation?
>
>My apologies as this question is not directly R related. However I
>just wanted to get some insight on above problem related to Statistics
>from some of the great Statisticians in this forum.
>
>Thanks for your time.
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] A question on Statistics regarding regression

2024-08-24 Thread Christofer Bogaso
Hi,

I have asked this question elsewhere however failed to get any
response, so hoping to get some insight from experts and statisticians
here.

Let say we are fitting a regression equation where one explanatory
variable is categorical with 2 categories. However in the sample, one
category has 95% of values but other category has just 5%. Means, the
categories are highly unbalanced.

Typically SE of estimate may be inflated for such highly unbalanced
categorical explanatory variable.

Such unbalanced case may come from 2 scenarios 1) there is a flaw in
sample or it is just by chance that second category has just 5% values
in the sample or 2) in the population itself, the second category has
very small number of occurrences which is reflected in the sample.

My question how the SE would be impacted in above 2 cases? Will the
impact be same i.e. we would get incorrect estimate of SE in both
cases? If yes, is there any way to prove analytically or may be based
on simulation?

My apologies as this question is not directly R related. However I
just wanted to get some insight on above problem related to Statistics
from some of the great Statisticians in this forum.

Thanks for your time.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A question on Statistics

2018-07-01 Thread Christofer Bogaso
I derive posting guide from https://www.r-project.org/posting-guide.html

I am imagining a distribution where mean is zero but there are few large
observations in the positive side which are not very frequent.

On Sun, Jul 1, 2018 at 8:29 PM Bert Gunter  wrote:

> From the posting guide:
>
> "*R-help* is intended to be comprehensible to people who want to use R to
> solve problems but who are not necessarily interested in or knowledgeable
> about programming."
>
> This says to me that R-help is for general questions about R programming,
> not statistics, though I grant you that the intersection is nonempty.
> Nevertheless, purely statistical issues should be posted elsewhere, and
> your query appears to be such.
>
> However, I'll just note: what does "centered at 0" mean for an asymmetric
> distribution? I think you may need to reconsider Jeff's advice.
>
>
> Cheers,
> Bert
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Sun, Jul 1, 2018 at 5:53 AM, Christofer Bogaso <
> bogaso.christo...@gmail.com> wrote:
>
>> Hi,
>>
>> I could post in StackExchange for sure, however I dont think R-help
>> posting
>> guide discourage asking a question about Statistics, atleast formally.
>>
>> I could further clarify if my question is not elaborate enough. And many
>> apologies if it is very trivial - however still I am looking for 2nd
>> opinion on my question.
>>
>> Answer to Jeff's pointer - yes my distribution is assumed to be centered
>> at
>> 0.
>>
>> Thanks,
>>
>> On Sun, Jul 1, 2018 at 8:04 AM Hasan Diwan  wrote:
>>
>> > Christofer,
>> > On Sat, 30 Jun 2018 at 12:54, Jeff Newmiller 
>> > wrote:
>> > >
>> > > You should use Stack Exchange for questions about statistics.
>> >
>> > Specifically, https://stats.stackexchange.com/ -- H
>> > --
>> > OpenPGP:
>> > https://sks-keyservers.net/pks/lookup?op=get&search=0xFEBAD7FFD041BBA1
>> > If you wish to request my time, please do so using
>> > bit.ly/hd1AppointmentRequest.
>> > Si vous voudrais faire connnaisance, allez a
>> bit.ly/hd1AppointmentRequest.
>> >
>> > Sent from my mobile device
>> > Envoye de mon portable
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A question on Statistics

2018-07-01 Thread Bert Gunter
>From the posting guide:

"*R-help* is intended to be comprehensible to people who want to use R to
solve problems but who are not necessarily interested in or knowledgeable
about programming."

This says to me that R-help is for general questions about R programming,
not statistics, though I grant you that the intersection is nonempty.
Nevertheless, purely statistical issues should be posted elsewhere, and
your query appears to be such.

However, I'll just note: what does "centered at 0" mean for an asymmetric
distribution? I think you may need to reconsider Jeff's advice.


Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Sun, Jul 1, 2018 at 5:53 AM, Christofer Bogaso <
bogaso.christo...@gmail.com> wrote:

> Hi,
>
> I could post in StackExchange for sure, however I dont think R-help posting
> guide discourage asking a question about Statistics, atleast formally.
>
> I could further clarify if my question is not elaborate enough. And many
> apologies if it is very trivial - however still I am looking for 2nd
> opinion on my question.
>
> Answer to Jeff's pointer - yes my distribution is assumed to be centered at
> 0.
>
> Thanks,
>
> On Sun, Jul 1, 2018 at 8:04 AM Hasan Diwan  wrote:
>
> > Christofer,
> > On Sat, 30 Jun 2018 at 12:54, Jeff Newmiller 
> > wrote:
> > >
> > > You should use Stack Exchange for questions about statistics.
> >
> > Specifically, https://stats.stackexchange.com/ -- H
> > --
> > OpenPGP:
> > https://sks-keyservers.net/pks/lookup?op=get&search=0xFEBAD7FFD041BBA1
> > If you wish to request my time, please do so using
> > bit.ly/hd1AppointmentRequest.
> > Si vous voudrais faire connnaisance, allez a
> bit.ly/hd1AppointmentRequest.
> >
> > Sent from my mobile device
> > Envoye de mon portable
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A question on Statistics

2018-07-01 Thread Christofer Bogaso
Hi,

I could post in StackExchange for sure, however I dont think R-help posting
guide discourage asking a question about Statistics, atleast formally.

I could further clarify if my question is not elaborate enough. And many
apologies if it is very trivial - however still I am looking for 2nd
opinion on my question.

Answer to Jeff's pointer - yes my distribution is assumed to be centered at
0.

Thanks,

On Sun, Jul 1, 2018 at 8:04 AM Hasan Diwan  wrote:

> Christofer,
> On Sat, 30 Jun 2018 at 12:54, Jeff Newmiller 
> wrote:
> >
> > You should use Stack Exchange for questions about statistics.
>
> Specifically, https://stats.stackexchange.com/ -- H
> --
> OpenPGP:
> https://sks-keyservers.net/pks/lookup?op=get&search=0xFEBAD7FFD041BBA1
> If you wish to request my time, please do so using
> bit.ly/hd1AppointmentRequest.
> Si vous voudrais faire connnaisance, allez a bit.ly/hd1AppointmentRequest.
>
> Sent from my mobile device
> Envoye de mon portable
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A question on Statistics

2018-06-30 Thread Hasan Diwan
Christofer,
On Sat, 30 Jun 2018 at 12:54, Jeff Newmiller  wrote:
>
> You should use Stack Exchange for questions about statistics.

Specifically, https://stats.stackexchange.com/ -- H
-- 
OpenPGP: https://sks-keyservers.net/pks/lookup?op=get&search=0xFEBAD7FFD041BBA1
If you wish to request my time, please do so using bit.ly/hd1AppointmentRequest.
Si vous voudrais faire connnaisance, allez a bit.ly/hd1AppointmentRequest.

Sent from my mobile device
Envoye de mon portable

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A question on Statistics

2018-06-30 Thread Jeff Newmiller

You should use Stack Exchange for questions about statistics.

You should also think a bit before you post, regardless of where. You are 
the one who described this as a highly asymmetric distribution, and didn't 
say anything about it being centered at zero. You already answered your 
own question, to the extent that it can be answered.


On Sun, 1 Jul 2018, Christofer Bogaso wrote:


Hi,

I have a quick question on Statistical distribution as follows, hoping
Statisticians here would give me very insightful feedback.

Say, I have a large sample from a highly asymmetric distribution ranging
from -Inf to +Inf. Now I wish to calculate sample X1 and X2 within which
middle 70% probability would reside.

One approach
x = my sample
calculatte quantile(x, prob = 15%) & quantile(x, prob = 85%)

another approach
calculate quantile(abs[x], prob = 85%)
In this case X1 and X2 would be +/- of above result.

My question is in all scenarios, are above two approach equivalent? If not
which is the better approach to find such range.

Thanks,

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] A question on Statistics

2018-06-30 Thread Christofer Bogaso
Hi,

I have a quick question on Statistical distribution as follows, hoping
Statisticians here would give me very insightful feedback.

Say, I have a large sample from a highly asymmetric distribution ranging
from -Inf to +Inf. Now I wish to calculate sample X1 and X2 within which
middle 70% probability would reside.

One approach
x = my sample
calculatte quantile(x, prob = 15%) & quantile(x, prob = 85%)

another approach
calculate quantile(abs[x], prob = 85%)
In this case X1 and X2 would be +/- of above result.

My question is in all scenarios, are above two approach equivalent? If not
which is the better approach to find such range.

Thanks,

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A question on Statistics

2010-12-26 Thread Bert Gunter
Maithula:

On Sun, Dec 26, 2010 at 11:09 AM, Maithula Chandrashekhar
 wrote:
> I am not a pure Statistics background and therefore please forgive me if
> this question (which is not R related either) is too trivial.
>
> In many Statistics literature I find following statement: "restrictions in
> different coefficients matrices have to be imposed to ensure uniqueness of
> the parametrization". Can somebody tell me what is the meaning of Uniqueness
> in the parametrization? Does it mean that, two different coefficient
> matrices may give exactly the same result, and therefore coefficient matrix
> is not unique?
-- yes.

See the section on "contrast matrices" in Venables and Ripley's
"Modern Applied Statistics with S" (MASS) for a concise but, I think,
illuminating explanation. (It's in the chapter on linear
models/regression).

-- Bert

>
> I find there are many members (perhaps all) in this forum who are really
> masters in Statistics. Therefore I hope somebody will clarify me with the
> intuition behind that.
>
> Thanks,
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://devo.gene.com/groups/devo/depts/ncb/home.shtml

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] A question on Statistics

2010-12-26 Thread Maithula Chandrashekhar
I am not a pure Statistics background and therefore please forgive me if
this question (which is not R related either) is too trivial.

In many Statistics literature I find following statement: "restrictions in
different coefficients matrices have to be imposed to ensure uniqueness of
the parametrization". Can somebody tell me what is the meaning of Uniqueness
in the parametrization? Does it mean that, two different coefficient
matrices may give exactly the same result, and therefore coefficient matrix
is not unique?

I find there are many members (perhaps all) in this forum who are really
masters in Statistics. Therefore I hope somebody will clarify me with the
intuition behind that.

Thanks,

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.