Re: [R] skewness and kurtosis in e1071 correct?

2005-05-24 Thread Dirk Enzmann
To me your answer is opaque (but that seems to be rather a problem of 
language ;-) ). Perhaps my question has not been expressed clearly 
enough. Let me state it differently:


In the R package e1071 the formulas (implicit) used are (3) and (4) (see 
below), the standard deviation used in these formulas, however is based 
on (2) (see below). This seems to be inconsistent and my question is, 
whether there is a commonly used third definition of skewness and 
kurtosis in which the formulas for the biased skewness and kurtosis 
_but_ with the unbiased standard deviation are employed.


The standard deviation can be defined as the _sample_ statistic:

sd = 1/n * sum( (x - mean(x))^2 )  # (1)

and as the estimated population parameter:

sd = 1/(n-1) * sum( (x-mean(x))^2 )  # (2).

In R the function sd() calculates the latter.

In the same way, expressed via z-values skewness and kurtosis can be 
defined as the _sample_ statistic (also called biased estimator , see: 
http://www.mathdaily.com/lessons/Skewness ):


skewness = mean(z^3) # (3)

kurtosis = mean(z^4)-3   # (4)

with z = (x - mean(x))/sd(x)
with sd = 1/n * sum( (x - mean(x)^2 )
(thus: here sd is the _sample_ statistic, see (1) above!)

but they can also be defined as the estimated population parameters 
(also called unbiased, see: 
http://www.mathdaily.com/lessons/Kurtosis#Sample_kurtosis ):


skewness = n/((n-1)*(n-2)) * sum(z^3)  # (5)

kurtosis = n*(n+1)/((n-1)*(n-2)*(n-3)) * sum(z^4) - 
3*(n-1)^2/((n-2)*(n-3))  # (6)


with z = (x - mean(x))/sd(x)
with sd = 1/(n-1) * sum( (x - mean(x)^2 )
(thus: here sd is the estimated population parameter, see (2) 
above!. BTW: The R function scale() calculates the z-values based on 
this definition, as well.)



Campbell wrote:

This is probably an issue over definitions rather than the correct
answer.  To me skewness and kurtosis are functions of the distribution
rather than the population, they are equivalent to expectation rather
than mean.  For the normal distribution it makes no sense to estimate
them as the distribution is uniquely defined by its first two  moments. 
 However there are two defnitions of kurotsis as it is often

standardized such that the expectation is 0.


*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-040-42838.7498 (office)
   +49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] skewness and kurtosis in e1071 correct? (correction)

2005-05-24 Thread Dirk Enzmann
I'm sorry, but my previous message, as often happens, some brackets were 
wrong:


Here are the correct formulas:

sd = 1/n * sum((x-mean(x))^2) # (1)

sd = 1/(n-1) * sum((x-mean(x))^2) # (2)

This also occured in the last paragraph.

Dirk

*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-040-42838.7498 (office)
   +49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] skewness and kurtosis in e1071 correct?

2005-05-24 Thread Dirk Enzmann
My last question to the R list is another example of asking too fast 
before reading (sorry). But by the way and because it might be 
interesting for others, an answer can be found in:


Joanes, D. N.  Gill, C. A. (1998) Comparing measures of sample skewness 
and kurtosis. Journal of the Royal Statistical Society (Series D): The 
Statistician 47 (1), 183189.


The measures discussed in this article are:

g1 : skewness as defined in formula (3) of my mail (and in STATA)
G1 : skewness as defined in formula (5) of my mail (and in SAS, SPSS, 
and EXCEL)

b1 : skewness as defined in the package e1071 (and in MINITAB and BMDP)

g2 : kurtosis as defined in formula (4) of my mail (and in STATA)
G2 : kurtosis as defined in formula (6) of my mail (and in SAS, SPSS, 
and EXCEL)

b2 : kurtosis as defined in the package e1071 (and in MINITAB and BMDP)



Off topic: I don't know why the thread of these mails are not included 
in the correct thread.


Dirk

--
*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-040-42838.7498 (office)
   +49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] skewness and kurtosis in e1071 correct?

2005-05-23 Thread Campbell
This is probably an issue over definitions rather than the correct
answer.  To me skewness and kurtosis are functions of the distribution
rather than the population, they are equivalent to expectation rather
than mean.  For the normal distribution it makes no sense to estimate
them as the distribution is uniquely defined by its first two  moments. 
 However there are two defnitions of kurotsis as it is often
standardized such that the expectation is 0.

HTH 

Phineas  

www.pwp.phineas.blueyonder.co.uk


 Dirk Enzmann [EMAIL PROTECTED] 05/23/05 3:17 AM 
I wonder whether the functions for skewness and kurtosis in the e1071 
package are based on correct formulas.

The functions in the package e1071 are:

# 
skewness - function (x, na.rm = FALSE)
{
 if (na.rm)
 x - x[!is.na(x)]
 sum((x - mean(x))^3)/(length(x) * sd(x)^3)
}
# 

and

# 
kurtosis - function (x, na.rm = FALSE)
{
 if (na.rm)
 x - x[!is.na(x)]
 sum((x - mean(x))^4)/(length(x) * var(x)^2) - 3
}
# 

However, sd() and var() are the estimated population parameters. To 
calculate the sample statistics of skewness and kurtosis, shouldn't one 
use the sample statistics of the standard deviation (and variance), as 
well? For example:

# 
# Function to calculate the sample statistic of skewness:
skew_s=function(x)
{
  x = x[!is.na(x)]
  n = length(x)
  if (n  3)
  {
cat('valid cases = ',n,'\nskewness is not defined for less than 3 
valid cases!\n')
  }
  else
  {
  z = sqrt(n/(n-1))*scale(x)
  mean(z^3)
  }
}
# 

and

# 
# Function to calculate the sample statistic of kurtosis:
kurt_s=function(x)
{
  x = x[!is.na(x)]
  n = length(x)
  if (n  4)
  {
cat('valid cases = ',n,'\nkurtosis is not defined for less than 4 
valid cases!\n')
  }
  else
  {
  z = sqrt(n/(n-1))*scale(x)
  mean(z^4)-3
  }
}
# 

Whereas, to calculate the (unbiased) estimated population parameter of 
skewness and kurtosis, the correction should also include the number of 
cases in the following way:

# 
# Function to calculate the unbiased populataion estimate of skewness:
skew=function(x)
{
  x = x[!is.na(x)]
  n = length(x)
  if (n  3)
  {
cat('valid cases = ',n,'\nskewness is not defined for less than 3 
valid cases!\n')
  }
  else
  {
  z = scale(x)
  sum(z^3)*n/((n-1)*(n-2))
  }
}
# 

and

# 
# Function to calculate the unbiased population estimate of kurtosis:
kurt=function(x)
{
  x = x[!is.na(x)]
  n = length(x)
  if (n  4)
  {
cat('valid cases = ',n,'\nkurtosis is not defined for less than 4 
valid cases!\n')
  }
  else
  {
  z = scale(x)
  sum(z^4)*n*(n+1)/((n-1)*(n-2)*(n-3))-3*(n-1)^2/((n-2)*(n-3))
  }
}
# 

Thus, it seems that the formulas used in the e1071 package are neither 
formulas for the sample statistics nor for the (unbiased) estimates of 
the population parameters. Is there another definition of kurtosis and 
skewness that I am not aware of?

Dirk

-- 
*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-040-42838.7498 (office)
+49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] skewness and kurtosis in e1071 correct?

2005-05-22 Thread Dirk Enzmann
I wonder whether the functions for skewness and kurtosis in the e1071 
package are based on correct formulas.


The functions in the package e1071 are:

# 
skewness - function (x, na.rm = FALSE)
{
if (na.rm)
x - x[!is.na(x)]
sum((x - mean(x))^3)/(length(x) * sd(x)^3)
}
# 

and

# 
kurtosis - function (x, na.rm = FALSE)
{
if (na.rm)
x - x[!is.na(x)]
sum((x - mean(x))^4)/(length(x) * var(x)^2) - 3
}
# 

However, sd() and var() are the estimated population parameters. To 
calculate the sample statistics of skewness and kurtosis, shouldn't one 
use the sample statistics of the standard deviation (and variance), as 
well? For example:


# 
# Function to calculate the sample statistic of skewness:
skew_s=function(x)
{
 x = x[!is.na(x)]
 n = length(x)
 if (n  3)
 {
   cat('valid cases = ',n,'\nskewness is not defined for less than 3 
valid cases!\n')

 }
 else
 {
 z = sqrt(n/(n-1))*scale(x)
 mean(z^3)
 }
}
# 

and

# 
# Function to calculate the sample statistic of kurtosis:
kurt_s=function(x)
{
 x = x[!is.na(x)]
 n = length(x)
 if (n  4)
 {
   cat('valid cases = ',n,'\nkurtosis is not defined for less than 4 
valid cases!\n')

 }
 else
 {
 z = sqrt(n/(n-1))*scale(x)
 mean(z^4)-3
 }
}
# 

Whereas, to calculate the (unbiased) estimated population parameter of 
skewness and kurtosis, the correction should also include the number of 
cases in the following way:


# 
# Function to calculate the unbiased populataion estimate of skewness:
skew=function(x)
{
 x = x[!is.na(x)]
 n = length(x)
 if (n  3)
 {
   cat('valid cases = ',n,'\nskewness is not defined for less than 3 
valid cases!\n')

 }
 else
 {
 z = scale(x)
 sum(z^3)*n/((n-1)*(n-2))
 }
}
# 

and

# 
# Function to calculate the unbiased population estimate of kurtosis:
kurt=function(x)
{
 x = x[!is.na(x)]
 n = length(x)
 if (n  4)
 {
   cat('valid cases = ',n,'\nkurtosis is not defined for less than 4 
valid cases!\n')

 }
 else
 {
 z = scale(x)
 sum(z^4)*n*(n+1)/((n-1)*(n-2)*(n-3))-3*(n-1)^2/((n-2)*(n-3))
 }
}
# 

Thus, it seems that the formulas used in the e1071 package are neither 
formulas for the sample statistics nor for the (unbiased) estimates of 
the population parameters. Is there another definition of kurtosis and 
skewness that I am not aware of?


Dirk

--
*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-040-42838.7498 (office)
   +49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Skewness and Kurtosis

2004-10-27 Thread Vito Ricci
Hi,

in which R-package I could find skewness and kurtosis
measures for a distribution?

I built some functions:

gamma1-function(x)
{
m=mean(x)
n=length(x)
s=sqrt(var(x))
m3=sum((x-m)^3)/n
g1=m3/(s^3)
return(g1)
}

skewness-function(x)
{
m=mean(x)
me=median(x)
s=sqrt(var(x))
sk=(m-me)/s
return(sk)
}

bowley-function(x)
{
q-as.vector(quantile(x,prob=c(.25,.50,.75)))
b=(q[3]+q[1]-2*q[2])/(q[3]-q[2])
return(b)
}

b3-function(x)
{
m=mean(x)
me=median(x)
n=length(x)
d=sum(abs(x-me))/n
b=(m-me)/d
return(b)
}

but I'm looking for some already included in a
package.

Thanks in advance.
Best
Vito


=
Diventare costruttori di soluzioni

The business of the statistician is to catalyze 
the scientific learning process.  
George E. P. Box


Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Skewness and Kurtosis

2004-10-27 Thread Uwe Ligges
Vito Ricci wrote:
Hi,
in which R-package I could find skewness and kurtosis
measures for a distribution?
Both funcions are in package e1071.
Uwe Ligges
I built some functions:
gamma1-function(x)
{
m=mean(x)
n=length(x)
s=sqrt(var(x))
m3=sum((x-m)^3)/n
g1=m3/(s^3)
return(g1)
}
skewness-function(x)
{
m=mean(x)
me=median(x)
s=sqrt(var(x))
sk=(m-me)/s
return(sk)
}
bowley-function(x)
{
q-as.vector(quantile(x,prob=c(.25,.50,.75)))
b=(q[3]+q[1]-2*q[2])/(q[3]-q[2])
return(b)
}
b3-function(x)
{
m=mean(x)
me=median(x)
n=length(x)
d=sum(abs(x-me))/n
b=(m-me)/d
return(b)
}
but I'm looking for some already included in a
package.
Thanks in advance.
Best
Vito
=
Diventare costruttori di soluzioni
The business of the statistician is to catalyze 
the scientific learning process.  
George E. P. Box

Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Skewness and Kurtosis

2004-10-27 Thread Diethelm Wuertz
Hello,
fBasics has functions for skewness and kurtosis:
Comments on these functions are welcome!
Diethelm Wuertz
# **
kurtosis =
function (x, ...)
{   # A function implemented by D. Wuertz
   # FUNCTION:
  
   UseMethod(kurtosis)
}

# 
--

kurtosis.default =
function (x, na.rm = FALSE, ...)
{   # A function implemented by D. Wuertz
   # Description:
   #   Returns the value of the kurtosis of a
   #   distribution function. Missing values
   #   can be handled.
  
   # FUNCTION:
  
   # Warnings:
   if (!is.numeric(x)  !is.complex(x)  !is.logical(x)) {
   warning(argument is not numeric or logical: returning NA)
   return(as.numeric(NA))}
  
   # Remove NAs:
   if (na.rm) x = x[!is.na(x)]

   # Kurtosis:
   n = length(x)
   if (is.integer(x)) x = as.numeric(x)
   kurtosis = sum((x-mean(x))^4/var(x)^2)/length(x) - 3
  
   # Return Value:
   kurtosis 
}

# 
--

kurtosis.data.frame =
function (x, ...)
{   # A function implemented by D. Wuertz
   sapply(x, kurtosis, ...)
}

# 
**

skewness =
function (x, ...)
{   # A function implemented by D. Wuertz
   UseMethod(skewness)
}
# 
--

skewness.default =
function (x, na.rm = FALSE, ...)
{   # A function implemented by D. Wuertz
   # Description:
   #   Returns the value of the skewness of a
   #   distribution function. Missing values
   #   can be handled.
  
   # FUNCTION:
  
   # Warnings:
   if (!is.numeric(x)  !is.complex(x)  !is.logical(x)) {
   warning(argument is not numeric or logical: returning NA)
   return(as.numeric(NA))}
  
   # Remove NAs:
   if (na.rm) x = x[!is.na(x)]

   # Skewness:
   n = length(x)
   if (is.integer(x)) x = as.numeric(x)
   skewness = sum((x-mean(x))^3/sqrt(var(x))^3)/length(x)
  
   # Return Value:
   skewness 
}

# 
--

skewness.data.frame =
function (x, ...)
{   # A function implemented by D. Wuertz
   sapply(x, skewness, ...)
}

Vito Ricci wrote:
Hi,
in which R-package I could find skewness and kurtosis
measures for a distribution?
I built some functions:
gamma1-function(x)
{
m=mean(x)
n=length(x)
s=sqrt(var(x))
m3=sum((x-m)^3)/n
g1=m3/(s^3)
return(g1)
}
skewness-function(x)
{
m=mean(x)
me=median(x)
s=sqrt(var(x))
sk=(m-me)/s
return(sk)
}
bowley-function(x)
{
q-as.vector(quantile(x,prob=c(.25,.50,.75)))
b=(q[3]+q[1]-2*q[2])/(q[3]-q[2])
return(b)
}
b3-function(x)
{
m=mean(x)
me=median(x)
n=length(x)
d=sum(abs(x-me))/n
b=(m-me)/d
return(b)
}
but I'm looking for some already included in a
package.
Thanks in advance.
Best
Vito
=
Diventare costruttori di soluzioni
The business of the statistician is to catalyze 
the scientific learning process.  
George E. P. Box

Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html