Re: [sc-dev] Re: [Issue 90759] ZTEST not same as Excel

2009-09-21 Thread Eike Rathke
Hi Leonard,

Just returned from vacation, hence the silence..

On Friday, 2009-09-11 00:43:09 +0200, Leonard Mada wrote:

> I might be too late,

Yes, too late, UI string changes aren't allowed anymore.

> but the following small correction
> sounds slightly better:
> 
> Calculates the probability of observing a z-statistic
> greater than the one computed based on *the* sample.
> 
> > Calculates the probability of observing a z-statistic greater than the one
> > computed based on a sample.

Yes, slightly better. I may change it on the fly for OOo3.3 if
I remember..

  Eike

-- 
 OOo/SO Calc core developer. Number formatter stricken i18n transpositionizer.
 SunSign   0x87F8D412 : 2F58 5236 DB02 F335 8304  7D6C 65C9 F9B5 87F8 D412
 OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
 Please don't send personal mail to the e...@sun.com account, which I use for
 mailing lists only and don't read from outside Sun. Use er...@sun.com Thanks.


pgpN3Q1icD4Wd.pgp
Description: PGP signature


[sc-dev] Re: [Issue 90759] ZTEST not same as Excel

2009-09-10 Thread Leonard Mada
Hello Eike,

I might be too late, but the following small correction
sounds slightly better:

Calculates the probability of observing a z-statistic
greater than the one computed based on *the* sample.

 Original-Nachricht 
> Datum: 3 Sep 2009 12:34:44 -
> Von: e...@openoffice.org
> An: disco...@openoffice.org
> Betreff: [Issue 90759] ZTEST not same as Excel

> To comment on the following update, log in, then open the issue:
> http://www.openoffice.org/issues/show_bug.cgi?id=90759
> 
> 
> 
> 
> 
> --- Additional comments from e...@openoffice.org Thu Sep  3 12:34:43
> + 2009 ---
> I used this one now:
> 
> Calculates the probability of observing a z-statistic greater than the one
> computed based on a sample.
> 
> revision 275752
> sc/source/ui/src/scfuncs.src
> 
> 
> -
> Please do not reply to this automatically generated notification from
> Issue Tracker. Please log onto the website and enter your comments.
> http://qa.openoffice.org/issue_handling/project_issues.html#notification

-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser

-
To unsubscribe, e-mail: dev-unsubscr...@sc.openoffice.org
For additional commands, e-mail: dev-h...@sc.openoffice.org



Re: [sc-dev] Re: [Issue 90759] ZTEST not same as Excel

2009-08-26 Thread Leonard Mada
Dear Eike, dear Regina,

I will try to explain the rationale behind the z-test.

Unfortunately, the quirks behind its computation in spreadsheet
software make it not that easy to describe.

The assumptions of the z-test:
- you have a random sample with a mean Xs

- there is a population that follows a gaussian distribution
  with mean muP and variance sigma^2

- the question is whether this sample was drawn from this population

- the statistical hypothesis are:

H0: Xs  = muP
Ha: Xs != muP (2-tailed)

The one tailed version Ha is:
Xs either < muP or
Xs > muP (only one of these)

So, basically, what we are testing is the probability
that the sample was taken from a population with mean muP.

The z test will first compute the z-statistic, and then
will infer the probability of H0 based on this z statistic.
[There is a direct correspondence between z and the probability.]

So, the 2-tailed version looks like:
- if computed z is more extreme than a critical z0, then
  we have to reject H0
- else, we have to accept H0

More extreme means:
 either z < -|z0| or z > |z0|, where |...| is the absolute value;

We compare 2 z values only when we talk about
interpreting the z-statistic.

Otherwise, we do not compare 2-values.

The z-test simply gives us the probability, under the null
hypothesis, to observe a z-statistic as extreme or more extreme
than that calculated, or, as written on MathWorks:
"The p-value is the probability, under the null hypothesis,
 of observing a value as extreme or more extreme of the
 z test statistic..." (slightly reworded)

Or, still in other words:
 we obtain the probability to observe in a random sample from
 the given study population a z statistic as extreme as
 that calculated. [This is the meaning of the p-value.]

This sounds good, and is easily understandable.

The problem with the z-test implementation in spreadsheets
(I infer the implementation details from previous posts,
 I did not test it specifically), is that a different
 probability is computed, namely:

 the probability of observing a z > computed z for this sample.

Statistically, this is the "one-sided" "greater" alternative.
But this is not as easy to explain if you do not understand
statistics.

So basically, we compute the probability, under the null
hypothesis, to observe a z-statistic greater than the one
computed.


"Under the null hypothesis" means to observe such a value
by chance alone (aka randomly).

The shortest definition that still makes some sense is:
 The probability of a z-statistic greater than the one computed.
 [where computed is based on the sample]

I hope this sounds English enough, but unfortunately neither I
am a native speaker. I would have welcomed some input from anyone
speaking natively English.

Sincerely,

Leonard Mada

A last note:
I understand what was meant in the previous definition
with a second sample, but I found that explanation very
confusing, because we never take a 2nd sample. Also, the
"2" samples are never compared.
[A 2nd sample would also cause a lot of trouble because
 of 2 means and 2 distinct variances. The actual reasoning
 refers to H0 and goes like this:
 We draw a hypothetical random sample, and compute
 the probability to get a z-statistic as extreme or more
 extreme than that observed with our real sample. It is
 the probability of drawing such a sample, not of comparing
 2 samples.]

 Original-Nachricht 
> Datum: Wed, 26 Aug 2009 22:35:29 +0200
> Von: Eike Rathke 
> An: dev@sc.openoffice.org
> Betreff: Re: [sc-dev] Re: [Issue 90759] ZTEST not same as Excel

> Hi Regina,
> 
> On Wednesday, 2009-08-26 18:02:44 +0200, Regina Henschel wrote:
> 
> >> "calculates the probability of observing a value as large
> >>  or larger for the z-statistic"
> >
> > There is a comparison "observing a value larger" but it does not  
> > contain, to what it is compared. There must be something like "observing
>  
> > a value larger than ...".
> >
> > I think "as large as..." can be dropped, it makes no difference for a  
> > continuous distribution and the text becomes shorter.
> >
> > Is "for the z-statistic" an attribute to "a value"? I understand it so. 
> > Is it a typical sentence order in English to put it at the end?
> >
> > In German I would say "Berechnet die Wahrscheinlichkeit einen Wert der  
> > Gauß-Statistik zu beobachten, der größer ist als der Wert der  
> > Gauß-Statistik der Stichprobe." But I'm not sure, Leonardo wants to say
>  
> > this. ("Z-Statistik" does not exist in German.)
> 
> Translating that I'd get, hopefully correct:
> 
> "Calculates the probability of observing a value of the z-statistic
> larger than the 

Re: [sc-dev] Re: [Issue 90759] ZTEST not same as Excel

2009-08-26 Thread Eike Rathke
Hi Regina,

On Wednesday, 2009-08-26 18:02:44 +0200, Regina Henschel wrote:

>> "calculates the probability of observing a value as large
>>  or larger for the z-statistic"
>
> There is a comparison "observing a value larger" but it does not  
> contain, to what it is compared. There must be something like "observing  
> a value larger than ...".
>
> I think "as large as..." can be dropped, it makes no difference for a  
> continuous distribution and the text becomes shorter.
>
> Is "for the z-statistic" an attribute to "a value"? I understand it so.  
> Is it a typical sentence order in English to put it at the end?
>
> In German I would say "Berechnet die Wahrscheinlichkeit einen Wert der  
> Gauß-Statistik zu beobachten, der größer ist als der Wert der  
> Gauß-Statistik der Stichprobe." But I'm not sure, Leonardo wants to say  
> this. ("Z-Statistik" does not exist in German.)

Translating that I'd get, hopefully correct:

"Calculates the probability of observing a value of the z-statistic
larger than the value of the sample's z-statistic."

Is that what we want to say?

> Describing the function using 'z-statistic' is indeed better than using  
> a description with 'mean', because of the function name ZTEST.

I agree.

  Eike

-- 
 OOo/SO Calc core developer. Number formatter stricken i18n transpositionizer.
 SunSign   0x87F8D412 : 2F58 5236 DB02 F335 8304  7D6C 65C9 F9B5 87F8 D412
 OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
 Please don't send personal mail to the e...@sun.com account, which I use for
 mailing lists only and don't read from outside Sun. Use er...@sun.com Thanks.


pgpqPiYEWDQt1.pgp
Description: PGP signature


Re: [sc-dev] Re: [Issue 90759] ZTEST not same as Excel

2009-08-26 Thread Regina Henschel

Hi Eike, hi Leonard,

Eike Rathke schrieb:

Hi Leonard,

On Tuesday, 2009-08-25 15:45:42 +0200, Leonard Mada wrote:


I would therefore stick with the one sample definition,
and adapt only the text to correspond to what actually
the function computes.

"The p-value is the probability, under the null hypothesis,
 of observing a value as extreme or more extreme of the
 z-statistic"


Probably too long, taking localizations into account.


or shortened:

"calculates the probability of observing a value as extreme
 or more extreme of the z-statistic"

and (possibly) correcting for the wrong implementation:

"calculates the probability of observing a value as large
 or larger for the z-statistic"


Not being a native speaker the difference isn't clear to me.


"extreme" can be very small or very large. But our ZTEST only calculates 
the "larger" case.




To have this changed we need a decision real soon now. So far then I'd
go for

"calculates the probability of observing a value as large
 or larger for the z-statistic"

Any objections? Adding that to i90759 to have it documented.


There is a comparison "observing a value larger" but it does not 
contain, to what it is compared. There must be something like "observing 
a value larger than ...".


I think "as large as..." can be dropped, it makes no difference for a 
continuous distribution and the text becomes shorter.


Is "for the z-statistic" an attribute to "a value"? I understand it so. 
Is it a typical sentence order in English to put it at the end?


In German I would say "Berechnet die Wahrscheinlichkeit einen Wert der 
Gauß-Statistik zu beobachten, der größer ist als der Wert der 
Gauß-Statistik der Stichprobe." But I'm not sure, Leonardo wants to say 
this. ("Z-Statistik" does not exist in German.)


Describing the function using 'z-statistic' is indeed better than using 
a description with 'mean', because of the function name ZTEST.


kind regards
Regina

-
To unsubscribe, e-mail: dev-unsubscr...@sc.openoffice.org
For additional commands, e-mail: dev-h...@sc.openoffice.org



Re: [sc-dev] Re: [Issue 90759] ZTEST not same as Excel

2009-08-26 Thread Eike Rathke
Hi Leonard,

On Tuesday, 2009-08-25 15:45:42 +0200, Leonard Mada wrote:

> I would therefore stick with the one sample definition,
> and adapt only the text to correspond to what actually
> the function computes.
> 
> "The p-value is the probability, under the null hypothesis,
>  of observing a value as extreme or more extreme of the
>  z-statistic"

Probably too long, taking localizations into account.

> or shortened:
> 
> "calculates the probability of observing a value as extreme
>  or more extreme of the z-statistic"
> 
> and (possibly) correcting for the wrong implementation:
> 
> "calculates the probability of observing a value as large
>  or larger for the z-statistic"

Not being a native speaker the difference isn't clear to me.

To have this changed we need a decision real soon now. So far then I'd
go for

"calculates the probability of observing a value as large
 or larger for the z-statistic"

Any objections? Adding that to i90759 to have it documented.

  Eike

-- 
 OOo/SO Calc core developer. Number formatter stricken i18n transpositionizer.
 SunSign   0x87F8D412 : 2F58 5236 DB02 F335 8304  7D6C 65C9 F9B5 87F8 D412
 OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
 Please don't send personal mail to the e...@sun.com account, which I use for
 mailing lists only and don't read from outside Sun. Use er...@sun.com Thanks.


pgp1GDJ3fpS2g.pgp
Description: PGP signature


Re: [sc-dev] Re: [Issue 90759] ZTEST not same as Excel

2009-08-25 Thread Leonard Mada
Dear Regina,

I find the mention of 2 samples very confusing (this is also true
for the Wikipedia site). This is especially problematic, because
2 samples will generate 2 different standard deviations, while
there is only one sigma if you go with a population.

A far better description is provided on the Mathworks site.
http://www.mathworks.com/access/helpdesk/help/toolbox/stats/index.html?/access/helpdesk/help/toolbox/stats/ztest.html&http://www.google.ro/url?sa=t&source=web&ct=res&cd=3&url=http%3A%2F%2Fwww.mathworks.com%2Faccess%2Fhelpdesk%2Fhelp%2Ftoolbox%2Fstats%2Fztest.html&ei=qt-TSvKEKoPe-QbviqyxBg&rct=j&q=ztest&usg=AFQjCNEynERcFmdEW0pD-B-nZSBuh7zYPw

"The p-value is the probability, under the null hypothesis,
 of observing a value as extreme or more extreme of the test
 statistic  where  is the sample mean, μ = m is the
 hypothesized population mean, σ is the population standard
 deviation, and n is the sample size."

This is the classical description of the z-statistic.

As I mentioned on a number of occasions, ztest is not
implemented directly in R (as it should be avoided in
any serious statistic - and has no place there).


I would therefore stick with the one sample definition,
and adapt only the text to correspond to what actually
the function computes.

"The p-value is the probability, under the null hypothesis,
 of observing a value as extreme or more extreme of the
 z-statistic"

or shortened:

"calculates the probability of observing a value as extreme
 or more extreme of the z-statistic"

and (possibly) correcting for the wrong implementation:

"calculates the probability of observing a value as large
 or larger for the z-statistic"

and, if space is really such a huge concern, then compacting
"as large or larger" will result in:

"calculates the probability of observing a value larger
 than the z-statistic"

Sincerely,

Leonard Mada


 Original-Nachricht 
> Datum: Tue, 25 Aug 2009 14:23:33 +0200
> Von: Regina Henschel 
> An: dev@sc.openoffice.org
> Betreff: Re: [sc-dev] Re: [Issue 90759] ZTEST not same as Excel

> Hi Leonard,
> 
> Leonard Mada schrieb:
> > Dear Calc team,
> > 
> > the following wording is ambiguous and it may be wrong altogether:
> > 
> >> function description:
> >> calculates the probability of a *sample* mean greater than the mean of
> >> the given *sample*.
> > [EMPHASIS ADDED]
> 
> A long phrase for the application help would be:
> 
> For a given random sample of size n, drawn from a normally distributed 
> population with a known mean µ and standard deviation sigma, ZTEST 
> calculates the probability that another sample of the same size would 
> have a mean greater than the mean m of the given sample.
> 
> ZTEST calculates 1-NORMSDIST(z) where z = (m-µ)/(sigma/sqrt(n)).
> 
> 
> The function ZTEST is not a Z-test, but you can calculate a value by 
> ZTEST, which you can use to perform a Z-test.
> 
> Do you know a better phrase, that *does not exceed two lines* for the 
> function wizard? We can explain the function in detail on the Wiki, 
> where you already find the formula and nice diagramms. The problem is, 
> to get a very short description, without given such useless phrases like 
> for TTest, where you find "Calculates the T test" in the function wizard 
> and "Returns the probability associated with a Student's t-Test." in the 
> application help.
> 
> kind regards
> Regina
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@sc.openoffice.org
> For additional commands, e-mail: dev-h...@sc.openoffice.org

-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser

-
To unsubscribe, e-mail: dev-unsubscr...@sc.openoffice.org
For additional commands, e-mail: dev-h...@sc.openoffice.org



Re: [sc-dev] Re: [Issue 90759] ZTEST not same as Excel

2009-08-25 Thread Regina Henschel

Hi Leonard,

Leonard Mada schrieb:

Dear Calc team,

the following wording is ambiguous and it may be wrong altogether:


function description:
calculates the probability of a *sample* mean greater than the mean of
the given *sample*.

[EMPHASIS ADDED]


A long phrase for the application help would be:

For a given random sample of size n, drawn from a normally distributed 
population with a known mean µ and standard deviation sigma, ZTEST 
calculates the probability that another sample of the same size would 
have a mean greater than the mean m of the given sample.


ZTEST calculates 1-NORMSDIST(z) where z = (m-µ)/(sigma/sqrt(n)).


The function ZTEST is not a Z-test, but you can calculate a value by 
ZTEST, which you can use to perform a Z-test.


Do you know a better phrase, that *does not exceed two lines* for the 
function wizard? We can explain the function in detail on the Wiki, 
where you already find the formula and nice diagramms. The problem is, 
to get a very short description, without given such useless phrases like 
for TTest, where you find "Calculates the T test" in the function wizard 
and "Returns the probability associated with a Student's t-Test." in the 
application help.


kind regards
Regina



-
To unsubscribe, e-mail: dev-unsubscr...@sc.openoffice.org
For additional commands, e-mail: dev-h...@sc.openoffice.org



[sc-dev] Re: [Issue 90759] ZTEST not same as Excel

2009-08-25 Thread Leonard Mada
Dear Calc team,

the following wording is ambiguous and it may be wrong altogether:

> function description:
> calculates the probability of a *sample* mean greater than the mean of
> the given *sample*.
[EMPHASIS ADDED]

So, does it compute the probabilities based on 2 samples?
I doubt it.

I may go back to basics. Lets say we have a population (P) with
a population mean Mu(P), and a sample X, with a sample mean mu(X).

The statistical hypothesis are:
H0: Mu(P) == mu(X)
Ha: Mu(P) either '<' or '>' mu(X) [the one tailed version]

Depending on the tail used, it will be '<' or '>'.
The 2-tailed version is Mu(P) '!=' mu(X).

I haven't followed the discussion recently, so I am unable
to tell exactly what is computed.

But I very much suspect that it compares mu(X) against
a population mean Mu(P).

Also, the phrase dose not specify what statistics is used.
It might be obvious that the z-statistic is used, but I would
rather specify it explicitly. There are a lot of different statistics
out there.

Sincerely,

Leonard Mada


 Original-Nachricht 
> Datum: 25 Aug 2009 07:20:41 -
> Von: drk...@openoffice.org
> An: disco...@openoffice.org
> Betreff: [Issue 90759] ZTEST not same as Excel

> To comment on the following update, log in, then open the issue:
> http://www.openoffice.org/issues/show_bug.cgi?id=90759
> 
> 
> 
> 
> 
> --- Additional comments from drk...@openoffice.org Tue Aug 25 07:20:40
> + 2009 ---
> Our proposal for the function wizard:-
> 
> function description:
> calculates the probability of a sample mean greater than the mean of
> the given sample.
> 
> first parameter
> The given sample, drawn from a normally distributed population
> 
> second parameter
> The known mean of the population
> 
> third parameter
> The known standard deviation of the population. If omitted, the standard
> deviation of the given sample is used. 
> 
> 
> -
> Please do not reply to this automatically generated notification from
> Issue Tracker. Please log onto the website and enter your comments.
> http://qa.openoffice.org/issue_handling/project_issues.html#notification

-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser

-
To unsubscribe, e-mail: dev-unsubscr...@sc.openoffice.org
For additional commands, e-mail: dev-h...@sc.openoffice.org