[R] confidence intervals of proportions from complex surveys

2007-09-05 Thread Dirk Enzmann
This is partly an R and partly a general statistics question.

I'm trying to get confidence intervals of proportions (sometimes for 
subgroups) estimated from complex survey data. Because a function like 
prop.test() does not exist for the survey package I tried the following:

1) Define a survey object (PSU of clustered sample, population weights);
2) Use svyglm() of the package survey to estimate a binary logistic 
regression (family='binomial'): For the confidence interval of a single 
proportion regress the binary dependent variable on a constant (1), for 
confidence intervals of that variable for subgroups regress this 
variable on the groups (factor) variable;
3) Use predict() to obtain estimated logits and the respective standard 
errors (mod.dat specifiying either the constant or the subgroups):

pred=predict(model,mod.dat,type='link',se.fit=T)

and apply the following to obtain the proportion with its confidence 
intervals (for example, for conf.level=.95):

lo.e = pred[1:length(pred)]-qnorm((1+conf.level)/2)*SE(pred)
hi.e = pred[1:length(pred)]+qnorm((1+conf.level)/2)*SE(pred)
prop = 1/(1+exp(-pred[1:length(pred)]))
lo = 1/(1+exp(-lo.e))
hi = 1/(1+exp(-hi.e))

I think that in that way I get CI's based on asymptotic normality - 
either for a single proportion or split up into subgroups.

Question: Is this a correct or a defensible procedure? Or should I use a 
different approach? Note that this approach should also allow to 
estimate CI's for proportions of subgroups taking into account the 
complex survey design.

TIA,
Dirk


R version 2.5.1 Patched (2007-08-10 r42469)
i386-pc-mingw32

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to reply to a thread if receiving R-help mails in digest form

2006-08-13 Thread Dirk Enzmann
I receive R-help messages in digest form that makes it difficult to 
answer a post. I noticed that my answer is not added to the thread 
(instead, a new thread is started) although I use the same subject line 
(starting with Re: ) as the original post. Is there a solution (I 
prefer the digest to separate messages for several reasons and don't 
want to change my email reader)?

The way I answer post up to now is:
1) I press the reply button of my email program (Mozilla / Thunderbird, 
Windows)
2) I delete all contents of the digest except for the post (including 
name and mail address of the posting person) I want to answer so that 
the original question will be included (cited) in my answer.
3) I add the email address to the individual sender to cc: to the 
automatically generated address of the R-help list.
4) I replace the automatically generated subject line (for example Re: 
R-help Digest, Vol 42, Issue 13 by Re:  followed by a copy of the 
original subject line of the post.
5) I write my answer and send the mail to the mailing list.

It's not that this is tedious - the problem is that the thread is 
broken. Is there a better way even if I want to keep receiving messages 
in digest form? The posting guide is silent about this.

Dirk

*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-(0)40-42838.7498 (office)
+49-(0)40-42838.4591 (Billon)
fax:   +49-(0)40-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Geometrical Interpretation of Eigen value and Eigen vector

2006-08-12 Thread Dirk Enzmann
Arun,

have a look at:

http://149.170.199.144/multivar/eigen.htm

HTH,
Dirk

Arun Kumar Saha [EMAIL PROTECTED] wrote:

 It is not a R related problem rather than statistical/mathematical. However
 I am posting this query hoping that anyone can help me on this matter. My
 problem is to get the Geometrical Interpretation of Eigen value and Eigen
 vector of any square matrix. Can anyone give me a light on it?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NLME: Problem with plotting ranef vs a factor

2006-08-03 Thread Dirk Enzmann
Greg,

be careful using attach() and detach(). From the syntax snippets you 
showed it seems that you did create an object pcat (factor 
variable), but you did not change the respective variable in your 
data frame.

Try to remove pcat and see what happens do the results of lme()!

Dirk

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] standardized residuals (random effects) using nlme and ranef

2006-07-31 Thread Dirk Enzmann
 
 www.R-project.org/posting-guide.html and provide commented, minimal, 
 self-contained, reproducible code.
 
 Dirk Enzmann wrote:
 
 Using ranef() (package nlme, version 3.1-75) with an 'lme' object I 
 can obtain random effects for intercept and slope of a certain level 
 (say: 1) - this corresponds to (say level 1) residuals in MLWin. 
 Maybe I'm mistaken here, but the results are identical.

 However, if I try to get the standardized random effects adding the 
 paramter standard=T to the specification of ranef(), the results 
 differ considerably from the results of MLWin (although MLWin defines 
 standardized in the same way as divided by its estimated 
 (diagnostic) standard error).

 Why do the results differ although the estimates (random effects and 
 thus their variances) are almost identical? I noticed that lme() does 
 not compute the standard errors of the variances of the random effects 
 - for several reasons, but if this is true, how does ranef() calculate 
 the standardized random effects (the help says: 'standardized (i.e. 
 divided by the corresponding estimated standard error)').

 Is there a way to obtain similar results as in MLWin (or: should I 
 prefer the results of ranef() for certain reasons)?

 Dirk

 -
 R version: 2.3.1 Patched (2006-06-21 r38367)

-- 
*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-(0)40-42838.7498 (office)
+49-(0)40-42838.4591 (Billon)
fax:   +49-(0)40-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] conflict of package Matrix and summary of lme object

2006-07-24 Thread Dirk Enzmann
After loading the package Matrix (version 0.995-12), using the summary 
function with an lme (package nlme version 3.1-75) object results in an 
error message saying

Fehler in dim(x) : kein Slot des Namens Dim für dieses Objekt der 
Klasse correlation

(translated: 'Error in dim(x) : no slot of the name Dim for this 
object of class correlation)'.

Without loading Matrix this error message does not occur.

Any ideas?

--
R version: 2.3.1 Patched (2006-06-21 r38367)
Operating system: Windows XP (5.1 (Build 2600))
CPU: Pentium Model 2 Stepping 9

*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-(0)40-42838.7498 (office)
+49-(0)40-42838.4591 (Billon)
fax:   +49-(0)40-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] standardized random effects with ranef.lme()

2006-07-24 Thread Dirk Enzmann
Using ranef() (package nlme, version 3.1-75) with an 'lme' object I can 
obtain random effects for intercept and slope of a certain level (say: 
1) - this corresponds to (say level 1) residuals in MLWin. Maybe I'm 
mistaken here, but the results are identical.

However, if I try to get the standardized random effects adding the 
paramter standard=T to the specification of ranef(), the results 
differ considerably from the results of MLWin (although MLWin defines 
standardized in the same way as divided by its estimated (diagnostic) 
standard error).

Why do the results differ although the estimates (random effects and 
thus their variances) are almost identical? I noticed that lme() does 
not compute the standard errors of the variances of the random effects - 
for several reasons, but if this is true, how does ranef() calculate the 
standardized random effects (the help says: 'standardized (i.e. 
divided by the corresponding estimated standard error)').

Is there a way to obtain similar results as in MLWin (or: should I 
prefer the results of ranef() for certain reasons)?

Dirk

-
R version: 2.3.1 Patched (2006-06-21 r38367)


*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-(0)40-42838.7498 (office)
+49-(0)40-42838.4591 (Billon)
fax:   +49-(0)40-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with read.spss() and as.data.frame(), or: alternative to subset()?

2005-09-21 Thread Dirk Enzmann
The selection problem can be solved by

dr2000=read.spss('myfile')
d=lapply(dr2000,subset,dr2000$RBINZ99  0)

however, there is still the problem that R crashes when using

d = as.data.frame(dr2000)

or

dr2000=read.spss('myfile',to.data.frame=T)

Any suggestions why? I checked whether all components of dr2000 are of 
the same length and the sort of object of each component. This is not 
the problem: Each component has the same length (9232) and there are 66 
components of the class 'character', 981 of the class 'factor', and 479 
of the class 'numeric'.


 Trying to select a subset of cases (rows of data) I encountered several 
 problems:
 
 Firstly, because I did not read the help to read.spss() thoroughly 
 enough, I treated the data read as a data frame. For example,
 
 dr2000 - read.spss('myfile.sav')
 d - subset(dr2000,RBINZ99  0)
 
 and thus received an error message (Object RBINZ99 not found), because 
 dr2000 is not a data.frame but a list (shown by class(dr2000)).
 
 d - subset(dr2000,dr2000$RBINZ99  0)
 
 didn' help either, because now d is empty (dim = NULL).
 
 Thus, I tried to use the option to.data.frame=T of read.spss():
 
 dr2000 - read.spss('myfile.sav',to.data.frame=T)
 
 However, now R crashes ('R for Windows GUI front-end has found an 
 error and must be closed') (the error message is in German).
 
 Finally, I tried again using read.spss() without the option 
 'to.data.frame=T' (as before) and tried to convert dr2000 to a data 
 frame by using
 
 d - as.data.frame(dr2000)
 
 However, R crashes again (with the same error message).
 
 Of course, I could use SPSS first and save only the cases with RBINZ99  
 0, but this is not always possible (all users of the data must have SPSS 
 available and we have to use different selection criteria). Is there 
 another possibility to solve the problem by using R? I want to select 
 certain rows (cases) based on the values of one variable of dr2000, 
 but keep all columns (variables) - although dr2000 is not a data frame?
 
 And: R should not crash but rather give a warning.
 
 
 R version 2.1.1 Patched (2005-07-15)
 Package Foreign Version 0.8-10
 
 Operating system: Windows XP Professional (5.1 (Build 2600))
 CPU: Pentium Model 2 Stepping 9
 RAM: 512 MB

*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-040-42838.7498 (office)
+49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Problem with read.spss() and as.data.frame(), or: alternative to subset()?

2005-09-20 Thread Dirk Enzmann
Trying to select a subset of cases (rows of data) I encountered several 
problems:

Firstly, because I did not read the help to read.spss() thoroughly 
enough, I treated the data read as a data frame. For example,

dr2000 - read.spss('myfile.sav')
d - subset(dr2000,RBINZ99  0)

and thus received an error message (Object RBINZ99 not found), because 
dr2000 is not a data.frame but a list (shown by class(dr2000)).

d - subset(dr2000,dr2000$RBINZ99)

didn' help either, because now d is empty (dim = NULL).

Thus, I tried to use the option to.data.frame=T of read.spss():

dr2000 - read.spss('myfile.sav',to.data.frame=T)

However, now R crashes ('R for Windows GUI front-end has found an 
error and must be closed') (the error message is in German).

Finally, I tried again using read.spss() without the option 
'to.data.frame=T' (as before) and tried to convert dr2000 to a data 
frame by using

d - as.data.frame(dr2000)

However, R crashes again (with the same error message).

Of course, I could use SPSS first and save only the cases with RBINZ99  
0, but this is not always possible (all users of the data must have SPSS 
available and we have to use different selection criteria). Is there 
another possibility to solve the problem by using R? I want to select 
certain rows (cases) based on the values of one variable of dr2000, 
but keep all columns (variables) - although dr2000 is not a data frame?

And: R should not crash but rather give a warning.


R version 2.1.1 Patched (2005-07-15)
Package Foreign Version 0.8-10

Operating system: Windows XP Professional (5.1 (Build 2600))
CPU: Pentium Model 2 Stepping 9
RAM: 512 MB

*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-040-42838.7498 (office)
+49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Problem with read.spss() and as.data.frame(), or: alternative to subset()? (2)

2005-09-20 Thread Dirk Enzmann
In my previous mail there is a typo. Instead of

d - subset(dr2000,dr2000$RBINZ99)

I used

d - subset(dr2000,dr2000$RBINZ99  0)


*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-040-42838.7498 (office)
+49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] using argument names (of indeterminate number) within a function

2005-07-19 Thread Dirk Enzmann
Although I tried to find an answer in the manuals and archives, I cannot 
solve this (please excuse that my English and/or R programming skills 
are not good enough to state my problem more clearly):

I want to write a function with an indeterminate (not pre-defined) 
number of arguments and think that I should use the ... construct and 
the match.call() function. The goal is to write a function that (among 
other things) uses cbind() to combine a not pre-defined number of 
vectors specified in the function call. For example, if my vectors are 
x1, x2, x3, ... xn, within the function I want to use cbind(x1, x2) or 
cbind(x1, x3, x5) or ... depending on the vector names I use in the 
funcion call. Additionally, the function has other arguments.

In the archives I found the following thread (followed by Marc Schwartz)

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/15186.html
[R] returning argument names from Peter Dalgaard BSA on 2003-04-10 (stdin)

that seems to contain the solution to my problem, but I am stuck because 
  sapply(match.call()[-1], deparse) gives me a vector of strings and I 
don't know how to use the names in this vector in the cbind() function.

Up to now my (clearly deficit) function looks like:

test - function(..., mvalid=1)
{
   args = sapply(match.call()[-1], deparse)
# and here, I don't know how the vector names in args
# can be used in the cbind() function to follow:
#
# temp - cbind( ???
   if (mvalid  1)
   {
#  here it goes on
   }
}

Ultimately, I want that the function can be called like
test(x1,x2,mvalid=1)
or
test(x1,x3,x5,mavlid=2)
and that within the function
cbind(x1,x2)
or cbind(x1,x3,x5)
will be used.

Can someone give and explain an example / a solution on how to proceed?

*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-040-42838.7498 (office)
+49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] skewness and kurtosis in e1071 correct?

2005-05-24 Thread Dirk Enzmann
To me your answer is opaque (but that seems to be rather a problem of 
language ;-) ). Perhaps my question has not been expressed clearly 
enough. Let me state it differently:


In the R package e1071 the formulas (implicit) used are (3) and (4) (see 
below), the standard deviation used in these formulas, however is based 
on (2) (see below). This seems to be inconsistent and my question is, 
whether there is a commonly used third definition of skewness and 
kurtosis in which the formulas for the biased skewness and kurtosis 
_but_ with the unbiased standard deviation are employed.


The standard deviation can be defined as the _sample_ statistic:

sd = 1/n * sum( (x - mean(x))^2 )  # (1)

and as the estimated population parameter:

sd = 1/(n-1) * sum( (x-mean(x))^2 )  # (2).

In R the function sd() calculates the latter.

In the same way, expressed via z-values skewness and kurtosis can be 
defined as the _sample_ statistic (also called biased estimator , see: 
http://www.mathdaily.com/lessons/Skewness ):


skewness = mean(z^3) # (3)

kurtosis = mean(z^4)-3   # (4)

with z = (x - mean(x))/sd(x)
with sd = 1/n * sum( (x - mean(x)^2 )
(thus: here sd is the _sample_ statistic, see (1) above!)

but they can also be defined as the estimated population parameters 
(also called unbiased, see: 
http://www.mathdaily.com/lessons/Kurtosis#Sample_kurtosis ):


skewness = n/((n-1)*(n-2)) * sum(z^3)  # (5)

kurtosis = n*(n+1)/((n-1)*(n-2)*(n-3)) * sum(z^4) - 
3*(n-1)^2/((n-2)*(n-3))  # (6)


with z = (x - mean(x))/sd(x)
with sd = 1/(n-1) * sum( (x - mean(x)^2 )
(thus: here sd is the estimated population parameter, see (2) 
above!. BTW: The R function scale() calculates the z-values based on 
this definition, as well.)



Campbell wrote:

This is probably an issue over definitions rather than the correct
answer.  To me skewness and kurtosis are functions of the distribution
rather than the population, they are equivalent to expectation rather
than mean.  For the normal distribution it makes no sense to estimate
them as the distribution is uniquely defined by its first two  moments. 
 However there are two defnitions of kurotsis as it is often

standardized such that the expectation is 0.


*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-040-42838.7498 (office)
   +49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] skewness and kurtosis in e1071 correct? (correction)

2005-05-24 Thread Dirk Enzmann
I'm sorry, but my previous message, as often happens, some brackets were 
wrong:


Here are the correct formulas:

sd = 1/n * sum((x-mean(x))^2) # (1)

sd = 1/(n-1) * sum((x-mean(x))^2) # (2)

This also occured in the last paragraph.

Dirk

*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-040-42838.7498 (office)
   +49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] skewness and kurtosis in e1071 correct?

2005-05-24 Thread Dirk Enzmann
My last question to the R list is another example of asking too fast 
before reading (sorry). But by the way and because it might be 
interesting for others, an answer can be found in:


Joanes, D. N.  Gill, C. A. (1998) Comparing measures of sample skewness 
and kurtosis. Journal of the Royal Statistical Society (Series D): The 
Statistician 47 (1), 183189.


The measures discussed in this article are:

g1 : skewness as defined in formula (3) of my mail (and in STATA)
G1 : skewness as defined in formula (5) of my mail (and in SAS, SPSS, 
and EXCEL)

b1 : skewness as defined in the package e1071 (and in MINITAB and BMDP)

g2 : kurtosis as defined in formula (4) of my mail (and in STATA)
G2 : kurtosis as defined in formula (6) of my mail (and in SAS, SPSS, 
and EXCEL)

b2 : kurtosis as defined in the package e1071 (and in MINITAB and BMDP)



Off topic: I don't know why the thread of these mails are not included 
in the correct thread.


Dirk

--
*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-040-42838.7498 (office)
   +49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] skewness and kurtosis in e1071 correct?

2005-05-22 Thread Dirk Enzmann
I wonder whether the functions for skewness and kurtosis in the e1071 
package are based on correct formulas.


The functions in the package e1071 are:

# 
skewness - function (x, na.rm = FALSE)
{
if (na.rm)
x - x[!is.na(x)]
sum((x - mean(x))^3)/(length(x) * sd(x)^3)
}
# 

and

# 
kurtosis - function (x, na.rm = FALSE)
{
if (na.rm)
x - x[!is.na(x)]
sum((x - mean(x))^4)/(length(x) * var(x)^2) - 3
}
# 

However, sd() and var() are the estimated population parameters. To 
calculate the sample statistics of skewness and kurtosis, shouldn't one 
use the sample statistics of the standard deviation (and variance), as 
well? For example:


# 
# Function to calculate the sample statistic of skewness:
skew_s=function(x)
{
 x = x[!is.na(x)]
 n = length(x)
 if (n  3)
 {
   cat('valid cases = ',n,'\nskewness is not defined for less than 3 
valid cases!\n')

 }
 else
 {
 z = sqrt(n/(n-1))*scale(x)
 mean(z^3)
 }
}
# 

and

# 
# Function to calculate the sample statistic of kurtosis:
kurt_s=function(x)
{
 x = x[!is.na(x)]
 n = length(x)
 if (n  4)
 {
   cat('valid cases = ',n,'\nkurtosis is not defined for less than 4 
valid cases!\n')

 }
 else
 {
 z = sqrt(n/(n-1))*scale(x)
 mean(z^4)-3
 }
}
# 

Whereas, to calculate the (unbiased) estimated population parameter of 
skewness and kurtosis, the correction should also include the number of 
cases in the following way:


# 
# Function to calculate the unbiased populataion estimate of skewness:
skew=function(x)
{
 x = x[!is.na(x)]
 n = length(x)
 if (n  3)
 {
   cat('valid cases = ',n,'\nskewness is not defined for less than 3 
valid cases!\n')

 }
 else
 {
 z = scale(x)
 sum(z^3)*n/((n-1)*(n-2))
 }
}
# 

and

# 
# Function to calculate the unbiased population estimate of kurtosis:
kurt=function(x)
{
 x = x[!is.na(x)]
 n = length(x)
 if (n  4)
 {
   cat('valid cases = ',n,'\nkurtosis is not defined for less than 4 
valid cases!\n')

 }
 else
 {
 z = scale(x)
 sum(z^4)*n*(n+1)/((n-1)*(n-2)*(n-3))-3*(n-1)^2/((n-2)*(n-3))
 }
}
# 

Thus, it seems that the formulas used in the e1071 package are neither 
formulas for the sample statistics nor for the (unbiased) estimates of 
the population parameters. Is there another definition of kurtosis and 
skewness that I am not aware of?


Dirk

--
*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-040-42838.7498 (office)
   +49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Source R code via menu File

2005-05-10 Thread Dirk Enzmann
If I use the first menu item Source R code of the menu File to call 
source() by searching the *.r files in my directory interactively, a 
window opens showing the active directory, but no files to choose (even 
if there are files with the extension .r). If I start entering the the 
name of a file that I know to be in the active directory, by entering a 
firt letter of the file name a list of files beginning with this letter 
shows up - but not a list of all files with the extension .r.

This happens if I use R version 2.1.0 patched (2005-04-18) running under 
Windows XP (German), the menu item is in German Lese R Code ein. In 
previous versions of R (e.g. version 2.0.1 patched (2004-11-17)) this 
never happend, i.e. in older versions I could choose a file from a list 
of all files with the extension .r in the current directory.

What has changed? Is this specific to my configuration (of R or of 
Windows) or is it a bug?

R: version 2.1.0 patched (2005-04-18)
Operating system: Windows XP (German)
*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany
phone: +49-040-42838.7498 (office)
   +49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] par(new=TRUE) vs par(new=FALSE)

2005-03-14 Thread Dirk Enzmann
Just out of curiosity (and I hope that my question does not confuse 
those who intuitively understood the par(new=TRUE) statement right in 
the beginning):

I would like to know whether beginners (learning R) had the same 
difficulty as I had: Intuitively I thought that par(new=TRUE) would draw 
a new plot (and not into an already existing plot) and that 
par(new=FALSE) would not draw a new plot but draw into an already 
existing plot - opposed to what the par(new=TRUE) statement actually 
does. (It would be interesting to receive an answer also from those who 
meanwhile reached an expert status  in working with the R syntax). 
Perhaps my confusion is due to the fact that I am a trained right-handed 
that really is a left-handed (those people tend to mistake contrasts).

If there are a lot of people having this difficulty, I would like to 
know why the decision was made to define the mode of action of the 
par(new=TRUE) statement as it is now. Perhaps that helps me to 
understand better the way programmers tend to think.

*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Schlueterstr. 28
D-20146 Hamburg
Germany
phone: +49-040-42838.7498 (office)
   +49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Confidence intervals for rates (dependent events)

2005-02-08 Thread Dirk Enzmann
I need advice or opinions for the following problem:
In a sample a part of the respondents has experienced victimizing 
events. Only a part of these events have been reported to the police. 
The rate of events reported to the police is

number of experienced events / number of reported events.
Because the events cannot be assumed to be independent (some victims 
have a high rate of vicitimization because they are more prone to become 
a victim) the construction of a confidence interval for the rate of 
events reported becomes difficult (to me).

Question 1: If nevertheless a use a binomial test to construct a 
confidence interval (say: 0 of 13 events reported,

binom.test(0,13,0/13)
CI = 0 to 24.7 %), is it correct that the width of this interval is a 
lower bound and thus a conservative estimate?

Question 2: (a) My intuition tells me that multilevel modeling could be 
a solution to obtain a correct confidence interval by treating the 
events and the events reported as the first level and the victims as the 
second. Is this correct and how should I specify this model?

(b) Alternatively, could I treat the events (and events reported?) as 
coming from a negative binomial distribution and use this for 
constructing a confidence interval? How can this be done technically, 
for example by using nb.glm?

Thanks in advance,
Dirk
*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Schlueterstr. 28
D-20146 Hamburg
Germany
phone: +49-040-42838.7498 (office)
   +49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Percentages in contingency tables *warning trivial question*

2004-12-16 Thread Dirk Enzmann
Being still unsatisfied with the CrossTable() function I modified the 
code so that the function will create an output similar to the SPSS 
procedure CROSSTABS. Most probably the code will not meet most R 
programmers' standards, perhaps someone else is willing to optimize it. 
Unfortunately, as an R beginner I am not able to write a documentation 
file (perhaps someone is willing to put some effort in it, too)- the 
parameters that can be used can be found next to function.

Including the function code here would cause nasty line breaks, you can 
find it at

http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Software/crosstabs.r
Dirk
At Mon, 13 Dec 2004 05:47:17 -0500 Chuck Cleland 
[EMAIL PROTECTED] wrote:

(snip)
 You might want to look at CrossTable() in the gmodels package
 of the gregmisc bundle.
(snip)
--
*
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Schlueterstr. 28
D-20146 Hamburg
Germany
phone: +49-040-42838.7498 (office)
   +49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: [EMAIL PROTECTED]
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Zero inflated count models

2003-09-21 Thread Dirk Enzmann
Can someone show me how to specify zero inflated poisson and zero 
inflated negative poisson models in R? I would like to replicate the 
example given in Long (1997: Regression Models for Categorical and 
Limited Dependent Variables) in Chapter 8.5 (pp. 242-247).

TIA
Dirk
*
Dr. Dirk Enzmann
Criminological Research Institute of Lower Saxony
Luetzerodestr. 9
D-30161 Hannover
Germany
phone: +49-511-348.36.32
fax:   +49-511-348.36.10
email: [EMAIL PROTECTED]
http://www.kfn.de

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Zero inflated count models (2)

2003-09-21 Thread Dirk Enzmann
In my previous mail, meant to say negative binomial regression (not 
negative poisson), of course.

Dirk

*
Dr. Dirk Enzmann
Criminological Research Institute of Lower Saxony
Luetzerodestr. 9
D-30161 Hannover
Germany
phone: +49-511-348.36.32
fax:   +49-511-348.36.10
email: [EMAIL PROTECTED]
http://www.kfn.de

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Job Announcement

2003-09-19 Thread Dirk Enzmann
The Criminological Research Institute of Lower Saxony (KFN) in Germany 
(Hannover) seeks a quantitative methodologist (Psychologist/Sociologist) 
to work in a research project on the developmental consequences of 
incarceration of juvenile delinquents.

For more detailed information see:

http://www.kfn.de/KFN_180903.pdf

Application deadline is October 12, 2003, but applications will be
accepted until the position is filled.
More information about the Institute is available at our home page 
http://www.kfn.de

*
Dr. Dirk Enzmann
Criminological Research Institute of Lower Saxony
Luetzerodestr. 9
D-30161 Hannover
Germany
phone: +49-511-348.36.32
fax:   +49-511-348.36.10
email: [EMAIL PROTECTED]
http://www.kfn.de

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Update VR_7.1-6

2003-06-04 Thread Dirk Enzmann
The update of VR by downloading VR_7.1-6.zip and using install.packages
(from local zip files) fails with the following error message:

Error in file(file, r) : unable to open connection
In addition: Warning message:
cannot open file `VR/DESCRIPTION'

Other packages can be installed without problems, except of dse_2003.4-1
with a similar error message.

Why?

Operating System: Windows NT (4.0)
R Version: R 1.7.0

*
Dr. Dirk Enzmann
Criminological Research Institute of Lower Saxony
Luetzerodestr. 9
D-30161 Hannover
Germany

phone: +49-511-348.36.32
fax:   +49-511-348.36.10
email: [EMAIL PROTECTED]

http://www.kfn.de

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help