Re: [sc-dev] Re: [Issue 90759] ZTEST not same as Excel
Hi Leonard, Just returned from vacation, hence the silence.. On Friday, 2009-09-11 00:43:09 +0200, Leonard Mada wrote: > I might be too late, Yes, too late, UI string changes aren't allowed anymore. > but the following small correction > sounds slightly better: > > Calculates the probability of observing a z-statistic > greater than the one computed based on *the* sample. > > > Calculates the probability of observing a z-statistic greater than the one > > computed based on a sample. Yes, slightly better. I may change it on the fly for OOo3.3 if I remember.. Eike -- OOo/SO Calc core developer. Number formatter stricken i18n transpositionizer. SunSign 0x87F8D412 : 2F58 5236 DB02 F335 8304 7D6C 65C9 F9B5 87F8 D412 OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS Please don't send personal mail to the e...@sun.com account, which I use for mailing lists only and don't read from outside Sun. Use er...@sun.com Thanks. pgpN3Q1icD4Wd.pgp Description: PGP signature
[sc-dev] Re: [Issue 90759] ZTEST not same as Excel
Hello Eike, I might be too late, but the following small correction sounds slightly better: Calculates the probability of observing a z-statistic greater than the one computed based on *the* sample. Original-Nachricht > Datum: 3 Sep 2009 12:34:44 - > Von: e...@openoffice.org > An: disco...@openoffice.org > Betreff: [Issue 90759] ZTEST not same as Excel > To comment on the following update, log in, then open the issue: > http://www.openoffice.org/issues/show_bug.cgi?id=90759 > > > > > > --- Additional comments from e...@openoffice.org Thu Sep 3 12:34:43 > + 2009 --- > I used this one now: > > Calculates the probability of observing a z-statistic greater than the one > computed based on a sample. > > revision 275752 > sc/source/ui/src/scfuncs.src > > > - > Please do not reply to this automatically generated notification from > Issue Tracker. Please log onto the website and enter your comments. > http://qa.openoffice.org/issue_handling/project_issues.html#notification -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser - To unsubscribe, e-mail: dev-unsubscr...@sc.openoffice.org For additional commands, e-mail: dev-h...@sc.openoffice.org
Re: [sc-dev] Re: [Issue 90759] ZTEST not same as Excel
Dear Eike, dear Regina, I will try to explain the rationale behind the z-test. Unfortunately, the quirks behind its computation in spreadsheet software make it not that easy to describe. The assumptions of the z-test: - you have a random sample with a mean Xs - there is a population that follows a gaussian distribution with mean muP and variance sigma^2 - the question is whether this sample was drawn from this population - the statistical hypothesis are: H0: Xs = muP Ha: Xs != muP (2-tailed) The one tailed version Ha is: Xs either < muP or Xs > muP (only one of these) So, basically, what we are testing is the probability that the sample was taken from a population with mean muP. The z test will first compute the z-statistic, and then will infer the probability of H0 based on this z statistic. [There is a direct correspondence between z and the probability.] So, the 2-tailed version looks like: - if computed z is more extreme than a critical z0, then we have to reject H0 - else, we have to accept H0 More extreme means: either z < -|z0| or z > |z0|, where |...| is the absolute value; We compare 2 z values only when we talk about interpreting the z-statistic. Otherwise, we do not compare 2-values. The z-test simply gives us the probability, under the null hypothesis, to observe a z-statistic as extreme or more extreme than that calculated, or, as written on MathWorks: "The p-value is the probability, under the null hypothesis, of observing a value as extreme or more extreme of the z test statistic..." (slightly reworded) Or, still in other words: we obtain the probability to observe in a random sample from the given study population a z statistic as extreme as that calculated. [This is the meaning of the p-value.] This sounds good, and is easily understandable. The problem with the z-test implementation in spreadsheets (I infer the implementation details from previous posts, I did not test it specifically), is that a different probability is computed, namely: the probability of observing a z > computed z for this sample. Statistically, this is the "one-sided" "greater" alternative. But this is not as easy to explain if you do not understand statistics. So basically, we compute the probability, under the null hypothesis, to observe a z-statistic greater than the one computed. "Under the null hypothesis" means to observe such a value by chance alone (aka randomly). The shortest definition that still makes some sense is: The probability of a z-statistic greater than the one computed. [where computed is based on the sample] I hope this sounds English enough, but unfortunately neither I am a native speaker. I would have welcomed some input from anyone speaking natively English. Sincerely, Leonard Mada A last note: I understand what was meant in the previous definition with a second sample, but I found that explanation very confusing, because we never take a 2nd sample. Also, the "2" samples are never compared. [A 2nd sample would also cause a lot of trouble because of 2 means and 2 distinct variances. The actual reasoning refers to H0 and goes like this: We draw a hypothetical random sample, and compute the probability to get a z-statistic as extreme or more extreme than that observed with our real sample. It is the probability of drawing such a sample, not of comparing 2 samples.] Original-Nachricht > Datum: Wed, 26 Aug 2009 22:35:29 +0200 > Von: Eike Rathke > An: dev@sc.openoffice.org > Betreff: Re: [sc-dev] Re: [Issue 90759] ZTEST not same as Excel > Hi Regina, > > On Wednesday, 2009-08-26 18:02:44 +0200, Regina Henschel wrote: > > >> "calculates the probability of observing a value as large > >> or larger for the z-statistic" > > > > There is a comparison "observing a value larger" but it does not > > contain, to what it is compared. There must be something like "observing > > > a value larger than ...". > > > > I think "as large as..." can be dropped, it makes no difference for a > > continuous distribution and the text becomes shorter. > > > > Is "for the z-statistic" an attribute to "a value"? I understand it so. > > Is it a typical sentence order in English to put it at the end? > > > > In German I would say "Berechnet die Wahrscheinlichkeit einen Wert der > > Gauß-Statistik zu beobachten, der größer ist als der Wert der > > Gauß-Statistik der Stichprobe." But I'm not sure, Leonardo wants to say > > > this. ("Z-Statistik" does not exist in German.) > > Translating that I'd get, hopefully correct: > > "Calculates the probability of observing a value of the z-statistic > larger than the
Re: [sc-dev] Re: [Issue 90759] ZTEST not same as Excel
Hi Regina, On Wednesday, 2009-08-26 18:02:44 +0200, Regina Henschel wrote: >> "calculates the probability of observing a value as large >> or larger for the z-statistic" > > There is a comparison "observing a value larger" but it does not > contain, to what it is compared. There must be something like "observing > a value larger than ...". > > I think "as large as..." can be dropped, it makes no difference for a > continuous distribution and the text becomes shorter. > > Is "for the z-statistic" an attribute to "a value"? I understand it so. > Is it a typical sentence order in English to put it at the end? > > In German I would say "Berechnet die Wahrscheinlichkeit einen Wert der > Gauß-Statistik zu beobachten, der größer ist als der Wert der > Gauß-Statistik der Stichprobe." But I'm not sure, Leonardo wants to say > this. ("Z-Statistik" does not exist in German.) Translating that I'd get, hopefully correct: "Calculates the probability of observing a value of the z-statistic larger than the value of the sample's z-statistic." Is that what we want to say? > Describing the function using 'z-statistic' is indeed better than using > a description with 'mean', because of the function name ZTEST. I agree. Eike -- OOo/SO Calc core developer. Number formatter stricken i18n transpositionizer. SunSign 0x87F8D412 : 2F58 5236 DB02 F335 8304 7D6C 65C9 F9B5 87F8 D412 OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS Please don't send personal mail to the e...@sun.com account, which I use for mailing lists only and don't read from outside Sun. Use er...@sun.com Thanks. pgpqPiYEWDQt1.pgp Description: PGP signature
Re: [sc-dev] Re: [Issue 90759] ZTEST not same as Excel
Hi Eike, hi Leonard, Eike Rathke schrieb: Hi Leonard, On Tuesday, 2009-08-25 15:45:42 +0200, Leonard Mada wrote: I would therefore stick with the one sample definition, and adapt only the text to correspond to what actually the function computes. "The p-value is the probability, under the null hypothesis, of observing a value as extreme or more extreme of the z-statistic" Probably too long, taking localizations into account. or shortened: "calculates the probability of observing a value as extreme or more extreme of the z-statistic" and (possibly) correcting for the wrong implementation: "calculates the probability of observing a value as large or larger for the z-statistic" Not being a native speaker the difference isn't clear to me. "extreme" can be very small or very large. But our ZTEST only calculates the "larger" case. To have this changed we need a decision real soon now. So far then I'd go for "calculates the probability of observing a value as large or larger for the z-statistic" Any objections? Adding that to i90759 to have it documented. There is a comparison "observing a value larger" but it does not contain, to what it is compared. There must be something like "observing a value larger than ...". I think "as large as..." can be dropped, it makes no difference for a continuous distribution and the text becomes shorter. Is "for the z-statistic" an attribute to "a value"? I understand it so. Is it a typical sentence order in English to put it at the end? In German I would say "Berechnet die Wahrscheinlichkeit einen Wert der Gauß-Statistik zu beobachten, der größer ist als der Wert der Gauß-Statistik der Stichprobe." But I'm not sure, Leonardo wants to say this. ("Z-Statistik" does not exist in German.) Describing the function using 'z-statistic' is indeed better than using a description with 'mean', because of the function name ZTEST. kind regards Regina - To unsubscribe, e-mail: dev-unsubscr...@sc.openoffice.org For additional commands, e-mail: dev-h...@sc.openoffice.org
Re: [sc-dev] Re: [Issue 90759] ZTEST not same as Excel
Hi Leonard, On Tuesday, 2009-08-25 15:45:42 +0200, Leonard Mada wrote: > I would therefore stick with the one sample definition, > and adapt only the text to correspond to what actually > the function computes. > > "The p-value is the probability, under the null hypothesis, > of observing a value as extreme or more extreme of the > z-statistic" Probably too long, taking localizations into account. > or shortened: > > "calculates the probability of observing a value as extreme > or more extreme of the z-statistic" > > and (possibly) correcting for the wrong implementation: > > "calculates the probability of observing a value as large > or larger for the z-statistic" Not being a native speaker the difference isn't clear to me. To have this changed we need a decision real soon now. So far then I'd go for "calculates the probability of observing a value as large or larger for the z-statistic" Any objections? Adding that to i90759 to have it documented. Eike -- OOo/SO Calc core developer. Number formatter stricken i18n transpositionizer. SunSign 0x87F8D412 : 2F58 5236 DB02 F335 8304 7D6C 65C9 F9B5 87F8 D412 OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS Please don't send personal mail to the e...@sun.com account, which I use for mailing lists only and don't read from outside Sun. Use er...@sun.com Thanks. pgp1GDJ3fpS2g.pgp Description: PGP signature
Re: [sc-dev] Re: [Issue 90759] ZTEST not same as Excel
Dear Regina, I find the mention of 2 samples very confusing (this is also true for the Wikipedia site). This is especially problematic, because 2 samples will generate 2 different standard deviations, while there is only one sigma if you go with a population. A far better description is provided on the Mathworks site. http://www.mathworks.com/access/helpdesk/help/toolbox/stats/index.html?/access/helpdesk/help/toolbox/stats/ztest.html&http://www.google.ro/url?sa=t&source=web&ct=res&cd=3&url=http%3A%2F%2Fwww.mathworks.com%2Faccess%2Fhelpdesk%2Fhelp%2Ftoolbox%2Fstats%2Fztest.html&ei=qt-TSvKEKoPe-QbviqyxBg&rct=j&q=ztest&usg=AFQjCNEynERcFmdEW0pD-B-nZSBuh7zYPw "The p-value is the probability, under the null hypothesis, of observing a value as extreme or more extreme of the test statistic where is the sample mean, μ = m is the hypothesized population mean, σ is the population standard deviation, and n is the sample size." This is the classical description of the z-statistic. As I mentioned on a number of occasions, ztest is not implemented directly in R (as it should be avoided in any serious statistic - and has no place there). I would therefore stick with the one sample definition, and adapt only the text to correspond to what actually the function computes. "The p-value is the probability, under the null hypothesis, of observing a value as extreme or more extreme of the z-statistic" or shortened: "calculates the probability of observing a value as extreme or more extreme of the z-statistic" and (possibly) correcting for the wrong implementation: "calculates the probability of observing a value as large or larger for the z-statistic" and, if space is really such a huge concern, then compacting "as large or larger" will result in: "calculates the probability of observing a value larger than the z-statistic" Sincerely, Leonard Mada Original-Nachricht > Datum: Tue, 25 Aug 2009 14:23:33 +0200 > Von: Regina Henschel > An: dev@sc.openoffice.org > Betreff: Re: [sc-dev] Re: [Issue 90759] ZTEST not same as Excel > Hi Leonard, > > Leonard Mada schrieb: > > Dear Calc team, > > > > the following wording is ambiguous and it may be wrong altogether: > > > >> function description: > >> calculates the probability of a *sample* mean greater than the mean of > >> the given *sample*. > > [EMPHASIS ADDED] > > A long phrase for the application help would be: > > For a given random sample of size n, drawn from a normally distributed > population with a known mean µ and standard deviation sigma, ZTEST > calculates the probability that another sample of the same size would > have a mean greater than the mean m of the given sample. > > ZTEST calculates 1-NORMSDIST(z) where z = (m-µ)/(sigma/sqrt(n)). > > > The function ZTEST is not a Z-test, but you can calculate a value by > ZTEST, which you can use to perform a Z-test. > > Do you know a better phrase, that *does not exceed two lines* for the > function wizard? We can explain the function in detail on the Wiki, > where you already find the formula and nice diagramms. The problem is, > to get a very short description, without given such useless phrases like > for TTest, where you find "Calculates the T test" in the function wizard > and "Returns the probability associated with a Student's t-Test." in the > application help. > > kind regards > Regina > > > > - > To unsubscribe, e-mail: dev-unsubscr...@sc.openoffice.org > For additional commands, e-mail: dev-h...@sc.openoffice.org -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser - To unsubscribe, e-mail: dev-unsubscr...@sc.openoffice.org For additional commands, e-mail: dev-h...@sc.openoffice.org
Re: [sc-dev] Re: [Issue 90759] ZTEST not same as Excel
Hi Leonard, Leonard Mada schrieb: Dear Calc team, the following wording is ambiguous and it may be wrong altogether: function description: calculates the probability of a *sample* mean greater than the mean of the given *sample*. [EMPHASIS ADDED] A long phrase for the application help would be: For a given random sample of size n, drawn from a normally distributed population with a known mean µ and standard deviation sigma, ZTEST calculates the probability that another sample of the same size would have a mean greater than the mean m of the given sample. ZTEST calculates 1-NORMSDIST(z) where z = (m-µ)/(sigma/sqrt(n)). The function ZTEST is not a Z-test, but you can calculate a value by ZTEST, which you can use to perform a Z-test. Do you know a better phrase, that *does not exceed two lines* for the function wizard? We can explain the function in detail on the Wiki, where you already find the formula and nice diagramms. The problem is, to get a very short description, without given such useless phrases like for TTest, where you find "Calculates the T test" in the function wizard and "Returns the probability associated with a Student's t-Test." in the application help. kind regards Regina - To unsubscribe, e-mail: dev-unsubscr...@sc.openoffice.org For additional commands, e-mail: dev-h...@sc.openoffice.org
[sc-dev] Re: [Issue 90759] ZTEST not same as Excel
Dear Calc team, the following wording is ambiguous and it may be wrong altogether: > function description: > calculates the probability of a *sample* mean greater than the mean of > the given *sample*. [EMPHASIS ADDED] So, does it compute the probabilities based on 2 samples? I doubt it. I may go back to basics. Lets say we have a population (P) with a population mean Mu(P), and a sample X, with a sample mean mu(X). The statistical hypothesis are: H0: Mu(P) == mu(X) Ha: Mu(P) either '<' or '>' mu(X) [the one tailed version] Depending on the tail used, it will be '<' or '>'. The 2-tailed version is Mu(P) '!=' mu(X). I haven't followed the discussion recently, so I am unable to tell exactly what is computed. But I very much suspect that it compares mu(X) against a population mean Mu(P). Also, the phrase dose not specify what statistics is used. It might be obvious that the z-statistic is used, but I would rather specify it explicitly. There are a lot of different statistics out there. Sincerely, Leonard Mada Original-Nachricht > Datum: 25 Aug 2009 07:20:41 - > Von: drk...@openoffice.org > An: disco...@openoffice.org > Betreff: [Issue 90759] ZTEST not same as Excel > To comment on the following update, log in, then open the issue: > http://www.openoffice.org/issues/show_bug.cgi?id=90759 > > > > > > --- Additional comments from drk...@openoffice.org Tue Aug 25 07:20:40 > + 2009 --- > Our proposal for the function wizard:- > > function description: > calculates the probability of a sample mean greater than the mean of > the given sample. > > first parameter > The given sample, drawn from a normally distributed population > > second parameter > The known mean of the population > > third parameter > The known standard deviation of the population. If omitted, the standard > deviation of the given sample is used. > > > - > Please do not reply to this automatically generated notification from > Issue Tracker. Please log onto the website and enter your comments. > http://qa.openoffice.org/issue_handling/project_issues.html#notification -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser - To unsubscribe, e-mail: dev-unsubscr...@sc.openoffice.org For additional commands, e-mail: dev-h...@sc.openoffice.org