Dear Martin,

Helpful general advice, although it's perhaps worth mentioning that the geometric mean, defined e.g. naively as prod(x)^(1/length(x)), is necessarily 0 if there are any 0 values in x. That is, the geometric mean "works" in this case but isn't really informative.

Best,
 John
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/

On 2024-01-22 12:18 p.m., Martin Maechler wrote:
Caution: External email.


Rich Shepard
     on Mon, 22 Jan 2024 07:45:31 -0800 (PST) writes:

     > A statistical question, not specific to R.  I'm asking for
     > a pointer for a source of definitive descriptions of what
     > types of data are best summarized by the arithmetic,
     > geometric, and harmonic means.

In spite of  off-topic:

I think it is a good question, not really only about
geo-chemistry, but about statistics in applied sciences (and
engineering for that matter).

Something I sure good applied statisticians in the 1980's and
1990's would all know the answer of :

To use the geometric mean instead of the arithmetic mean
is basically  *equivalent* to  first log-transform the data
and then work with that transformed data:
Not just for computing average, but for more relevant modelling,
inference, etc.

John W Tukey (and several other of the grands of the time)
had the log transform among the  "First aid transformations":

If the data for a continuous variable must all be positive it is
also typically the case that the distribution is considerably
skewed to the right.
In such a case behave as a good human who sees another human in
health distress: apply First Aid -- do the things you learned to
do quickly without too much thought, because things must happen
fast ---to hopefully save the other's life.

Here: Do log transform all such variables with further ado,
and only afterwards start your (exploratory and more) data analysis.

Now,  mean(log(y)) = log(geometricmean(y)),
where mean() is the arithmetic mean as in R
{mathematically; on the computer you need all.equal(), not '==' !!}

I.e., according to Tukey and all the other experienced applied
statisticians of the past, the geometric mean is the "best thing"
to do for such positive right-skewed data   in the same sense
that the log-transform is the best "a priori" transformation for
such data -- with the one advantage even that you need to fiddle
with zeroes when log-transforming, whereas the geometric mean
works already for zeroes.

Martin


     > As an aquatic ecologist I see regulators apply the
     > geometric mean to geochemical concentrations rather than
     > using the arithmetic mean. I want to know whether the
     > geometric mean of a set of chemical concentrations (e.g.,
     > in mg/L) is an appropriate representation of the expected
     > value. If not, I want to explain this to non-technical
     > decision-makers; if so, I want to understand why my
     > assumption is wrong.

     > TIA,

     > Rich

     > ______________________________________________
     > R-help@r-project.org mailing list -- To UNSUBSCRIBE and
     > more, see https://stat.ethz.ch/mailman/listinfo/r-help
     > PLEASE do read the posting guide
     > http://www.R-project.org/posting-guide.html and provide
     > commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to