Fwd: [R] Enduring LME confusion or Psychologists and Mixed-Effects

2004-08-12 Thread Gijs Plomp
I will follow the suggestion of John Maindonald and present the problem 
by example with some data.

I also follow the advice to use mean scores, somewhat reluctantly 
though. I know it is common practice in psychology, but wouldnt it be 
more elegant if one could use all the data points in an analysis?

The data for 5 subjects (myData) are provided at the bottom of this 
message. It is a crossed within-subject design with three factors and 
reaction time (RT) as the dependent variable.

An initial repeated-measures model would be:
aov1-aov(RT~fact1*fact2*fact3+Error(sub/(fact1*fact2*fact3)),data=myData)
Aov complains that the effects involving fact3 are unbalanced:
 aov1

Stratum 4: sub:fact3
Terms:
 fact3   Residuals
Sum of Squares  4.81639e-07 5.08419e-08
Deg. of Freedom   2   8
Residual standard error: 7.971972e-05
6 out of 8 effects not estimable
Estimated effects may be unbalanced

Presumably this is because fact3 has three levels and the other ones 
only two, making this a non-orthogonal design.

I then fit an equivalent mixed-effects model:
lme1-lme(RT~fact1*fact2*fact3,data=meanData2,random=~1|sub)
Subsequently I test if my factors had any effect on RT:
 anova(lme1)
 numDF denDF   F-value p-value
(Intercept)   144 105294024  .0001
fact1 14422  .0001
fact2 144 7  0.0090
fact3 24419  .0001
fact1:fact2   144 9  0.0047
fact1:fact3   244 1  0.4436
fact2:fact3   244 1  0.2458
fact1:fact2:fact3 244 0  0.6660
Firstly, why are the F-values in the output whole numbers?
The effects are estimated over the whole range of the dataset and this 
is so because all three factors are nested under subjects, on the same 
level. This, however, seems to inflate the F-values compared to the 
anova(aov1) output, e.g.
  anova(aov1)

Error: sub:fact2
 Df Sum SqMean Sq F value Pr(F)
fact2  1 9.2289e-08 9.2289e-08  2.2524 0.2078
Residuals  4 1.6390e-07 4.0974e-08


I realize that the (probably faulty) aov model may not be directly 
compared to the lme model, but my concern is if the lme estimation of 
the effects is right, and if so, how can a nave skeptic be convinced of 
this?

The suggestion to use simulate.lme() to find this out seems good, but I 
cant seem to get it working (from [R] lme: simulate.lme in R it seems 
it may never work in R).

I have also followed the suggestion to fit the exact same model with 
lme4. However, format of the anova output does not give me the 
estimation in the way nlme does. More importantly, the degrees of 
freedom in the denominator dont change, theyre still large:
 library(lme4)
 lme4_1-lme(RT~fact1*fact2*fact3,random=~1|sub,data=myData)
 anova(lme4_1)
Analysis of Variance Table

DfSum Sq   Mean Sq Denom F valuePr(F)  
fact1I1 2.709e-07 2.709e-0748 21.9205 2.360e-05 ***
fact2I1 9.229e-08 9.229e-0848  7.4665  0.008772 **
fact3L1 4.906e-08 4.906e-0848  3.9691  0.052047 .
fact3M1 4.326e-07 4.326e-0748 34.9972 3.370e-07 ***
fact1I:fact2I 1 1.095e-07 1.095e-0748  8.8619  0.004552 **
fact1I:fact3L 1 8.988e-10 8.988e-1048  0.0727  0.788577  
fact1I:fact3M 1 1.957e-08 1.957e-0848  1.5834  0.214351  
fact2I:fact3L 1 3.741e-09 3.741e-0948  0.3027  0.584749  
fact2I:fact3M 1 3.207e-08 3.207e-0848  2.5949  0.113767  
fact1I:fact2I:fact3L  1 2.785e-09 2.785e-0948  0.2253  0.637162  
fact1I:fact2I:fact3M  1 7.357e-09 7.357e-0948  0.5952  0.444206  
---

I hope that by providing a sample of the data someone can help me out on 
the questions I asked in my previous mail:

 1. When aovs assumptions are violated, can lme provide the right
 model for within-subjects designs where multiple fixed effects are
 NOT hierarchically ordered?

 2. Are the degrees of freedom in anova(lme1) the right ones to
 report? If so, how do I convince a reviewer that, despite the large
 number of degrees of freedom, lme does provide a conservative
 evaluation of the effects? If not, how does one get the right denDf
 in a way that can be easily understood? 

If anyone thinks he can help me better by looking at the entire data 
set, I very much welcome them to email me for further discussion.

In great appreciation of your help and work for the R-community,
Gijs Plomp
[EMAIL PROTECTED]

myData
sub fact1 fact2 fact3RT
1   s1 C C G 0.9972709
2   s2 C C G 0.9981664
3   s3 C C G 0.9976909
4   s4 C C G 0.9976047
5   s5 C C G 0.9974346
6   s1 I C G 0.9976206
7   s2 I C G 0.9981980
8   s3 I C G 0.9980503
9   s4 I C G 0.9980620
10  s5 I C G 0.9977682
11  s1 C I G 0.9976633
12  s2 C  

[R] Help with generating data from a 'not quite' Normal distriburtion

2004-08-12 Thread Crabb, David
I would be very grateful for any help from members of this list for what
might be a simple problem...

We are trying to simulate the behaviour of a clinical measurement in a
series of computer experiments. This is simple enough to do in R if we
assume the measurements to be Gaussian, but their empirical distribution
has a much higher peak at the mean and the distribution has much longer
tails. (The distribution is quite symmetrical) Can anyone suggest any
distributions I could fit to this data, and better still how I can then
generate random data from this 'distribution' using R?

---
Dr. David Crabb
School of Science,
The Nottingham Trent University,
Clifton Campus, Nottingham. NG11 8NS
Tel: 0115 848 3275   Fax: 0115 848 6690

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Help with generating data from a 'not quite' Normal distriburtion

2004-08-12 Thread Dimitris Rizopoulos
Hi David,

you could try a Student's t distribution with appropriate degrees of
freedom and extra scale paremter, i.e.,

?rt
rgt - function(n, mu=0, sigma=1, df=stop(no df arg)) mu+sigma*rt(n,
df=df)

I hope this helps.

Best,
Dimitris


Dimitris Rizopoulos
Doctoral Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/396887
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
 http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm



- Original Message - 
From: Crabb, David [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, August 12, 2004 10:44 AM
Subject: [R] Help with generating data from a 'not quite' Normal
distriburtion


 I would be very grateful for any help from members of this list for
what
 might be a simple problem...

 We are trying to simulate the behaviour of a clinical measurement in
a
 series of computer experiments. This is simple enough to do in R if
we
 assume the measurements to be Gaussian, but their empirical
distribution
 has a much higher peak at the mean and the distribution has much
longer
 tails. (The distribution is quite symmetrical) Can anyone suggest
any
 distributions I could fit to this data, and better still how I can
then
 generate random data from this 'distribution' using R?

 ---
 Dr. David Crabb
 School of Science,
 The Nottingham Trent University,
 Clifton Campus, Nottingham. NG11 8NS
 Tel: 0115 848 3275   Fax: 0115 848 6690

 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Help with generating data from a 'not quite' Normal distriburtion

2004-08-12 Thread Vito Ricci
Hi,

the Student's t distribution   could be considered:
it's symmetrical, but with a low number of degree of
freedom is different from Normal distribution I think
in the way you said:has a much higher peak at the
mean and the distribution has much longer
tails.  Try to use:

rt(n, df) where n=number of obs. df=degree of freedom.

for samples simulations.

Best
Vito


I would be very grateful for any help from members of
this list for what
might be a simple problem...

We are trying to simulate the behaviour of a clinical
measurement in a
series of computer experiments. This is simple enough
to do in R if we
assume the measurements to be Gaussian, but their
empirical distribution
has a much higher peak at the mean and the
distribution has much longer
tails. (The distribution is quite symmetrical) Can
anyone suggest any
distributions I could fit to this data, and better
still how I can then
generate random data from this 'distribution' using R?

---
Dr. David Crabb
School of Science,
The Nottingham Trent University,
Clifton Campus, Nottingham. NG11 8NS
Tel: 0115 848 3275   Fax: 0115 848 6690

=
Diventare costruttori di soluzioni

Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Help with generating data from a 'not quite' Normal distriburtion

2004-08-12 Thread Vito Ricci
Hi,

Also the Cauchy's distribution could be good:

rcauchy(n, location = 0, scale = 1)


Best
Vito


I would be very grateful for any help from members of
this list for what
might be a simple problem...

We are trying to simulate the behaviour of a clinical
measurement in a
series of computer experiments. This is simple enough
to do in R if we
assume the measurements to be Gaussian, but their
empirical distribution
has a much higher peak at the mean and the
distribution has much longer
tails. (The distribution is quite symmetrical) Can
anyone suggest any
distributions I could fit to this data, and better
still how I can then
generate random data from this 'distribution' using R?

---
Dr. David Crabb
School of Science,
The Nottingham Trent University,
Clifton Campus, Nottingham. NG11 8NS
Tel: 0115 848 3275   Fax: 0115 848 6690

=
Diventare costruttori di soluzioni

Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Giving a first good impression of R to Social Scientists

2004-08-12 Thread Rau, Roland
Dear all,

in the coming Winter Semester, I will be a teaching assistant for a course
in Survival Analysis. My job will be to do the lab sessions. The software
used for these lab sessions will be R. Most of the students have a
background in social sciences and the only stats package they used so far is
most likely SPSS.
So I assume they might be quite surprised the first time they see R (where
is my rectangular data window?, where do I have to click to make a new
variable?, ...).

That is why would like to ask the experts on this list if anyone of you has
encountered a similar experience and what you could advise to persuade
people quickly that it is worth learning a new software?
I imagined to give them a short presentation about the nice capabilities
what R can do which would be impossible or troublesome with conventional
software like SPSS.[1] The reason is that I want to create an atmosphere
where people have a positive attitude towards learning a new software right
from the beginning. This would make it easier for me and I guess also the
students learn more and faster if they have a positive first encounter with
R.

(Afterwards I plan to introduce them to the basics of R with the help of
Venables/Smith/R Core Team: An Introduction to R, Dalgaard Introductory
Statistics with R and Krause/Olson The Basics of S and S-Plus before
doing any survival analysis relevant exercises.)

I would appreciate any suggestions!

Thanks for your help,
Roland


[1] I originally thought to show them how easy it is to estimate in R a
Kaplan-Meier-Survival curve in the presence of left truncation, whereas I
have seen no possibility so far to do that in SPSS (that could be also due
to my lack of knowledge using SPSS) but the KM-estimator is a topic during
the course, so I can not use this example.


+
This mail has been sent through the MPI for Demographic Rese...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Giving a first good impression of R to Social Scientists

2004-08-12 Thread Vito Ricci
Hi,

do you know there are several GUI for R? See:

http://www.sciviews.org/_rgui/

and in particular:

http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/

R-Commander is quite like GUI of commercial softwares.
Give a look: they can help your pupils which are able
to use SPSS.
Although I prefer command line, for me is simplier.

Best

Vito
 

Dear all,

in the coming Winter Semester, I will be a teaching
assistant for a course
in Survival Analysis. My job will be to do the lab
sessions. The software
used for these lab sessions will be R. Most of the
students have a
background in social sciences and the only stats
package they used so far is
most likely SPSS.
So I assume they might be quite surprised the first
time they see R (where
is my rectangular data window?, where do I have to
click to make a new
variable?, ...).

That is why would like to ask the experts on this list
if anyone of you has
encountered a similar experience and what you could
advise to persuade
people quickly that it is worth learning a new
software?
I imagined to give them a short presentation about the
nice capabilities
what R can do which would be impossible or troublesome
with conventional
software like SPSS.[1] The reason is that I want to
create an atmosphere
where people have a positive attitude towards learning
a new software right
from the beginning. This would make it easier for me
and I guess also the
students learn more and faster if they have a positive
first encounter with
R.

(Afterwards I plan to introduce them to the basics of
R with the help of
Venables/Smith/R Core Team: An Introduction to R,
Dalgaard Introductory
Statistics with R and Krause/Olson The Basics of S
and S-Plus before
doing any survival analysis relevant exercises.)

I would appreciate any suggestions!

Thanks for your help,
Roland


[1] I originally thought to show them how easy it is
to estimate in R a
Kaplan-Meier-Survival curve in the presence of left
truncation, whereas I
have seen no possibility so far to do that in SPSS
(that could be also due
to my lack of knowledge using SPSS) but the
KM-estimator is a topic during
the course, so I can not use this example.



http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/

=
Diventare costruttori di soluzioni

Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] RE: Giving a first good impression of R to Social Scientists

2004-08-12 Thread Rau, Roland
Hi,

 -Original Message-
 From: Vito Ricci [SMTP:[EMAIL PROTECTED]
 do you know there are several GUI for R? See:
[...]
 R-Commander is quite like GUI of commercial softwares.
 
 
Yes, I do know the R-Commander. But I did not want to give them a
GUI but rather expose them to the command line after I demonstrated that the
steep learning curve in the beginning is worth the effort for the final
results.

That is why I wanted to ask the list if anyone has faced the same
situation to persuade students to use R. Are social science students most
impressionable with some nice graphs (e.g. filled.contour) or will they get
a more positive attitude if I used the R as an overgrown calculator like
in Peter Dalgaards book? Or should I write an SPSS script to perform a
certain task and demonstrate how easy, compact, and elegant it is to fulfill
the same job in R? Just telling them We will use R during our course
without any explanation would be not a good choice in my opinion.

As I have written before: I would like the students to trust me that
it is worth to invest some extra energy in the beginning. I do not expect to
receive any prepared demonstration from anyone of you. I am more curious
about your teaching experiences and how you got people enthusiastic to use
this software.

Thanks,
Roland



+
This mail has been sent through the MPI for Demographic Rese...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Giving a first good impression of R to Social Scientists

2004-08-12 Thread susana barbosa

Hi,

I have encountered big difficulties trying to persuade my undergraduate
students, with very slight background either in statistics or computing to
use R instead of SPSS.

 I tried to start with a sort of very, very simple sample session, just for
showing that R is not as complicated as they think and it is worth trying
The idea was to give them a kind of simple lab guide they could follow at
their own pace, and stimulate them to search in help files, etc... I think it
is important that they get used to try around the examples and documentation
to solve specific problems...


Best
Susana


--
Susana Barbosa
Departamento de Matematica Aplicada
Faculdade de Ciências, Universidade Porto
Rua do Campo Alegre, 687, 4169-007, Porto
Tel: 220 100 840
Fax: 220 100 809

---

-- 
Susana Barbosa
Departamento de Matematica Aplicada
Faculdade de Ciências, Universidade Porto
Rua do Campo Alegre, 687, 4169-007, Porto
Tel: 220 100 840
Fax: 220 100 809

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] error using daisy() in library(cluster). Bug?

2004-08-12 Thread javier garcia - CEBAS
Hi,
I'm using the cluster library to examine multivariate data.
The data come from a connection to a postgres database, and I did a short R 
script to do the analisys. With the cluster version included in R1.8.0, daisy 
worked well for my data, but now, when I call daisy, I obtain the following 
messages:
-
Error in if (any(sx == 0)) { : missing value where TRUE/FALSE needed
In addition: Warning message:
binary variable(s) 116 treated as interval scaled in: 
daisy(concentracion.data.frame, stand = TRUE)
-

Al the variables in my dataframe are numeric. Although I've got NA values, 
and I've seen that if a do the analisys for a subset of the dataframe, 
selecting just columns with no NA, the result is good.
Could this be a bug?

Thanks, and best regards

Javier

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Enduring LME confusion or Psychologists and Mixed-Effects

2004-08-12 Thread Doran, Harold
Dear Gijs:
 
lme fits models using restricted maximum likelihood by default. So, I believe this is 
why you have a different DF. If you include method=ML in the modeling function the 
DF should be similar to aov.
 
I think your lme code is incorrect given my understanding of your problem. You can use 
all data points using a repeated measures design. Each subject is measured on multiple 
occasions. So, observations are nested within individual. You need to add a time 
variable to indicate the measurement occasion for each subject (e.g., t={0,...,T}). 
You might try the following:
 
fm1.lme-lme(rt~time*fact1*fact2*fact3, data=mydata, random=~time|sub, method=ML)
 
Use summary() to get the entire output
 
hope this helps
 
Harold

-Original Message- 
From: [EMAIL PROTECTED] on behalf of Gijs Plomp 
Sent: Thu 8/12/2004 4:13 AM 
To: [EMAIL PROTECTED] 
Cc: 
Subject: Fwd: [R] Enduring LME confusion… or Psychologists and Mixed-Effects 



I will follow the suggestion of John Maindonald and present the problem
by example with some data.

I also follow the advice to use mean scores, somewhat reluctantly
though. I know it is common practice in psychology, but wouldn’t it be
more elegant if one could use all the data points in an analysis?

The data for 5 subjects (myData) are provided at the bottom of this
message. It is a crossed within-subject design with three factors and
reaction time (RT) as the dependent variable.

An initial repeated-measures model would be:
aov1-aov(RT~fact1*fact2*fact3+Error(sub/(fact1*fact2*fact3)),data=myData)

Aov complains that the effects involving fact3 are unbalanced:
  aov1
…
Stratum 4: sub:fact3
Terms:
  fact3   Residuals
Sum of Squares  4.81639e-07 5.08419e-08
Deg. of Freedom   2   8

Residual standard error: 7.971972e-05

6 out of 8 effects not estimable
Estimated effects may be unbalanced
…

Presumably this is because fact3 has three levels and the other ones
only two, making this a ‘non-orthogonal’ design.

I then fit an equivalent mixed-effects model:
lme1-lme(RT~fact1*fact2*fact3,data=meanData2,random=~1|sub)

Subsequently I test if my factors had any effect on RT:
  anova(lme1)
  numDF denDF   F-value p-value

(Intercept)   144 105294024  .0001
fact1 14422  .0001
fact2 144 7  0.0090
fact3 24419  .0001
fact1:fact2   144 9  0.0047
fact1:fact3   244 1  0.4436
fact2:fact3   244 1  0.2458
fact1:fact2:fact3 244 0  0.6660

Firstly, why are the F-values in the output whole numbers?

The effects are estimated over the whole range of the dataset and this
is so because all three factors are nested under subjects, on the same
level. This, however, seems to inflate the F-values compared to the
anova(aov1) output, e.g.
   anova(aov1)
…
Error: sub:fact2
  Df Sum SqMean Sq F value Pr(F)
fact2  1 9.2289e-08 9.2289e-08  2.2524 0.2078
Residuals  4 1.6390e-07 4.0974e-08
…

I realize that the (probably faulty) aov model may not be directly
compared to the lme model, but my concern is if the lme estimation of
the effects is right, and if so, how can a naïve skeptic be convinced of
this?

The suggestion to use simulate.lme() to find this out seems good, but I
can’t seem to get it working (from [R] lme: simulate.lme in R it seems
it may never work in R).

I have also followed the suggestion to fit the exact same model with
lme4. However, format of the anova output does not give me the
estimation in the way nlme does. More importantly, the degrees of
freedom in the denominator don’t change, they’re still large:
  library(lme4)
  lme4_1-lme(RT~fact1*fact2*fact3,random=~1|sub,data=myData)
  anova(lme4_1)
Analysis of Variance Table

 DfSum Sq   Mean Sq Denom F valuePr(F) 
fact1I1 2.709e-07 2.709e-0748 21.9205 2.360e-05 ***
fact2I1 9.229e-08 9.229e-0848  7.4665  0.008772 **
fact3L1 4.906e-08 4.906e-0848  3.9691  0.052047 .
fact3M1 4.326e-07 4.326e-0748 34.9972 3.370e-07 ***
fact1I:fact2I 1 1.095e-07 

Re: [R] Help with generating data from a 'not quite' Normal distriburtion

2004-08-12 Thread Martin Maechler
 Vito == Vito Ricci [EMAIL PROTECTED]
 on Thu, 12 Aug 2004 10:59:23 +0200 (CEST) writes:

Vito Hi, Also the Cauchy's distribution could be good:

Vito rcauchy(n, location = 0, scale = 1)

also is an exaggeration, after you already told him to use the
t-distribution family:

Cauchy = t-Dist(*, df = 1) !


DCrabb I would be very grateful for any help from members of
DCrabb this list for what might be a simple problem...

DCrabb We are trying to simulate the behaviour of a clinical
DCrabb measurement in a series of computer experiments. This
DCrabb is simple enough to do in R if we assume the
DCrabb measurements to be Gaussian, but their empirical
DCrabb distribution has a much higher peak at the mean and
DCrabb the distribution has much longer tails. (The
DCrabb distribution is quite symmetrical) Can anyone suggest
DCrabb any distributions I could fit to this data, and better
DCrabb still how I can then generate random data from this
DCrabb 'distribution' using R?

I'd first try with the t distribution, using  fitdistr() from
package MASS, e.g.,

   x - rt(1000, df = 1.5)
   library(MASS)
   fx - fitdistr(x, densfun = t)
   fx
  m sdf 
-0.013967851.043381511.57749052 
   ( 0.04426267) ( 0.04766543) ( 0.10809543)
   

(so it *does* estimate location and scale in addition to the df).

If you read the help page
   ?fitdistr

you'll see in the example that estimating 'df' is said to be
problematic.
AFAIK it can be better to reparametrize, possibly using 1/df or
log(df) as new parameter.
{but then you can't use fitdistr() but rather mle() and the
 log likelihood or optim() directly}.

Martin Maechler

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: RE: [R] Enduring LME confusion or Psychologists and Mixed-Effects

2004-08-12 Thread Prof Brian Ripley
On Thu, 12 Aug 2004, Doran, Harold wrote:

  lme fits models using restricted maximum likelihood by default. So, I
 believe this is why you have a different DF. If you include method=ML
 in the modeling function the DF should be similar to aov.

It is REML and not ML that generalizes the classical multistratum AoV.
In any case, REML and ML use the same subspaces, just different methods 
for estimating variances within them.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Help with generating data from a 'not quite' Normal distriburtion

2004-08-12 Thread Prof Brian Ripley
On Thu, 12 Aug 2004, Martin Maechler wrote:

  Vito == Vito Ricci [EMAIL PROTECTED]
  on Thu, 12 Aug 2004 10:59:23 +0200 (CEST) writes:
 
 Vito Hi, Also the Cauchy's distribution could be good:
 
 Vito rcauchy(n, location = 0, scale = 1)
 
 also is an exaggeration, after you already told him to use the
 t-distribution family:
 
 Cauchy = t-Dist(*, df = 1) !
 
 
 DCrabb I would be very grateful for any help from members of
 DCrabb this list for what might be a simple problem...
 
 DCrabb We are trying to simulate the behaviour of a clinical
 DCrabb measurement in a series of computer experiments. This
 DCrabb is simple enough to do in R if we assume the
 DCrabb measurements to be Gaussian, but their empirical
 DCrabb distribution has a much higher peak at the mean and
 DCrabb the distribution has much longer tails. (The
 DCrabb distribution is quite symmetrical) Can anyone suggest
 DCrabb any distributions I could fit to this data, and better
 DCrabb still how I can then generate random data from this
 DCrabb 'distribution' using R?
 
 I'd first try with the t distribution, using  fitdistr() from
 package MASS, e.g.,
 
x - rt(1000, df = 1.5)
library(MASS)
fx - fitdistr(x, densfun = t)
fx
 m sdf 
 -0.013967851.043381511.57749052 
( 0.04426267) ( 0.04766543) ( 0.10809543)

 
 (so it *does* estimate location and scale in addition to the df).
 
 If you read the help page
?fitdistr
 
 you'll see in the example that estimating 'df' is said to be
 problematic.
 AFAIK it can be better to reparametrize, possibly using 1/df or
 log(df) as new parameter.
 {but then you can't use fitdistr() but rather mle() and the
  log likelihood or optim() directly}.

It is the use of ML for the df that is *in theory* problematic, not the
optimization per se.  See the reference, p.110, for some of the 
literature.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] error using daisy() in library(cluster). Bug?

2004-08-12 Thread Martin Maechler
¡Hola Javier!

since I am the maintainer of the cluster  
*package* (not library), I'm interested to find out more about
this problem.  I assume, you now use R 1.9.1.

Can you give us an example we can reproduce?
Give the exact R commands you use and 
maybe attach the save()d data file (*.rda) in a private e-mail?

Or do this on R-help and give an URL where one can download the
data (you can't attach such binary files for R-help).

Thank you,
Martin Maechler

 javier == javier garcia - CEBAS [EMAIL PROTECTED]
 on Thu, 12 Aug 2004 12:53:28 +0200 writes:

javier Hi, I'm using the cluster library to examine
javier multivariate data.  The data come from a connection
javier to a postgres database, and I did a short R script
javier to do the analisys. With the cluster version
javier included in R1.8.0, daisy worked well for my data,
javier but now, when I call daisy, I obtain the following
javier messages: - Error in if (any(sx == 0)) { :
javier missing value where TRUE/FALSE needed In addition:
javier Warning message: binary variable(s) 116 treated as
javier interval scaled in: daisy(concentracion.data.frame,
javier stand = TRUE) -

javier Al the variables in my dataframe are
javier numeric. Although I've got NA values, and I've seen
javier that if a do the analisys for a subset of the
javier dataframe, selecting just columns with no NA, the
javier result is good.  Could this be a bug?

javier Thanks, and best regards

javier Javier

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Re: R-help Digest, Vol 18, Issue 12

2004-08-12 Thread John Maindonald
The message for aov1 was Estimated effects may be unbalanced.  The  
effects are not unbalanced.  The design is 'orthogonal'.

The problem is that there are not enough degrees of freedom to estimate  
all those error terms.  If you change the model to:
  aov1 -  
aov(RT~fact1*fact2*fact3+Error(sub/(fact1+fact2+fact3)),data=myData)

or to
  aov2 -  
aov(RT~fact1*fact2*fact3+Error(sub/ 
((fact1+fact2+fact3)^2)),data=myData)

all is well.  This last model (aov2) seems to me to have an excessive  
number of error terms.

The lme model lme(RT~fact1*fact2*fact3, random=~1|sub, data=myData)
is equivalent to aov0 - aov(RT~fact1*fact2*fact3+Error(sub),  
data=myData)
It can be verified that the estimated variance components can be used  
to reproduce the mean squares in the anova table.

A conservative approach is to estimate e.g. contrasts involving fact1  
for each subject separately, then comparing these against SE estimates  
that have 4df (5 subjects -1).  This is the ultimate check -- are  
claimed effects consistent against the 5 subjects?  The standard errors  
should though, probably be based on some variety of averaged variance.

I do not know how to set up the equivalents of aov1 and aov2 (where the  
errors are indeed crossed) in lme4.

John Maindonald.
On 12 Aug 2004, at 8:03 PM, [EMAIL PROTECTED] wrote:
I will follow the suggestion of John Maindonald and present the  
problem by example with some data.

I also follow the advice to use mean scores, somewhat reluctantly  
though. I know it is common practice in psychology, but wouldnt it be  
more elegant if one could use all the data points in an analysis?

The data for 5 subjects (myData) are provided at the bottom of this  
message. It is a crossed within-subject design with three factors and  
reaction time (RT) as the dependent variable.

An initial repeated-measures model would be:
aov1-aov(RT~fact1*fact2*fact3+Error(sub/ 
(fact1*fact2*fact3)),data=myData)

Aov complains that the effects involving fact3 are unbalanced:
 aov1

Stratum 4: sub:fact3
Terms:
 fact3   Residuals
Sum of Squares  4.81639e-07 5.08419e-08
Deg. of Freedom   2   8
Residual standard error: 7.971972e-05
6 out of 8 effects not estimable
Estimated effects may be unbalanced

Presumably this is because fact3 has three levels and the other ones  
only two, making this a non-orthogonal design.

I then fit an equivalent mixed-effects model:
lme1-lme(RT~fact1*fact2*fact3,data=meanData2,random=~1|sub)
Subsequently I test if my factors had any effect on RT:
 anova(lme1)
 numDF denDF   F-value p-value
(Intercept)   144 105294024  .0001
fact1 14422  .0001
fact2 144 7  0.0090
fact3 24419  .0001
fact1:fact2   144 9  0.0047
fact1:fact3   244 1  0.4436
fact2:fact3   244 1  0.2458
fact1:fact2:fact3 244 0  0.6660
Firstly, why are the F-values in the output whole numbers?
The effects are estimated over the whole range of the dataset and this  
is so because all three factors are nested under subjects, on the same  
level. This, however, seems to inflate the F-values compared to the  
anova(aov1) output, e.g.
  anova(aov1)

Error: sub:fact2
 Df Sum SqMean Sq F value Pr(F)
fact2  1 9.2289e-08 9.2289e-08  2.2524 0.2078
Residuals  4 1.6390e-07 4.0974e-08


I realize that the (probably faulty) aov model may not be directly  
compared to the lme model, but my concern is if the lme estimation of  
the effects is right, and if so, how can a nave skeptic be convinced  
of this?

The suggestion to use simulate.lme() to find this out seems good, but  
I cant seem to get it working (from [R] lme: simulate.lme in R it  
seems it may never work in R).

I have also followed the suggestion to fit the exact same model with  
lme4. However, format of the anova output does not give me the  
estimation in the way nlme does. More importantly, the degrees of  
freedom in the denominator dont change, theyre still large:
 library(lme4)
 lme4_1-lme(RT~fact1*fact2*fact3,random=~1|sub,data=myData)
 anova(lme4_1)
Analysis of Variance Table

DfSum Sq   Mean Sq Denom F valuePr(F)   
fact1I1 2.709e-07 2.709e-0748 21.9205 2.360e-05  
***
fact2I1 9.229e-08 9.229e-0848  7.4665  0.008772 **
fact3L1 4.906e-08 4.906e-0848  3.9691  0.052047 .
fact3M1 4.326e-07 4.326e-0748 34.9972 3.370e-07 ***
fact1I:fact2I 1 1.095e-07 1.095e-0748  8.8619  0.004552 **
fact1I:fact3L 1 8.988e-10 8.988e-1048  0.0727  0.788577   
fact1I:fact3M 1 1.957e-08 1.957e-0848  1.5834  0.214351   
fact2I:fact3L 1 3.741e-09 3.741e-0948  0.3027  0.584749   
fact2I:fact3M 1 3.207e-08 3.207e-0848  2.5949  0.113767   
fact1I:fact2I:fact3L  1 2.785e-09 2.785e-09

RE: [R] Help with generating data from a 'not quite' Normal distriburtion

2004-08-12 Thread Kahra Hannu
Consider using the HyperbolicDist package. With the package you can both fit the 
hyperbolic distribution to your data and generate random numbers from the 
distribution. Hyperbolic distribution/s provide/s good fit to financial returns that 
commonly exhibit high peaks and heavy tails.

Hannu Kahra  

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Crabb, David
Sent: Thursday, August 12, 2004 10:44 AM
To: [EMAIL PROTECTED]
Subject: [R] Help with generating data from a 'not quite' Normal
distriburtion


I would be very grateful for any help from members of this list for what
might be a simple problem...

We are trying to simulate the behaviour of a clinical measurement in a
series of computer experiments. This is simple enough to do in R if we
assume the measurements to be Gaussian, but their empirical distribution
has a much higher peak at the mean and the distribution has much longer
tails. (The distribution is quite symmetrical) Can anyone suggest any
distributions I could fit to this data, and better still how I can then
generate random data from this 'distribution' using R?

---
Dr. David Crabb
School of Science,
The Nottingham Trent University,
Clifton Campus, Nottingham. NG11 8NS
Tel: 0115 848 3275   Fax: 0115 848 6690

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Giving a first good impression of R to Social Scientists

2004-08-12 Thread Thomas Lumley
On Thu, 12 Aug 2004, Rau, Roland wrote:

 That is why would like to ask the experts on this list if anyone of you has
 encountered a similar experience and what you could advise to persuade
 people quickly that it is worth learning a new software?

One problem is that it may not be true.  Unless these people are going to
be doing their own statistics in the future (which is probably true only
for a minority) they might actually be better off with a point and click
interface.  I'm (obviously) not arguing that SPSS is a better statistical
environment than R, but it is easier to learn, and in 10 or 15 weeks they
may not get to see the benefits of R.


-thomas

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Giving a first good impression of R to Social Scientists

2004-08-12 Thread Barry Rowlingson
Thomas Lumley wrote:
On Thu, 12 Aug 2004, Rau, Roland wrote:
That is why would like to ask the experts on this list if anyone of you has
encountered a similar experience and what you could advise to persuade
people quickly that it is worth learning a new software?

 The usual way of teaching R seems to be bottom-up. Here's the command 
prompt, type some arithmetic, make some assignments, learn about 
function calls and arguments, write your own functions, write your own 
packages.

 Perhaps a top-down approach might help certain cases. People using 
point-n-click packages tend to use a limited range of analyses. Write 
some functions that do these analyses, or give them wrappers so that 
they get something like:

  myData = readDataFile(foo.dat)
   Read 4 variables: Z, Age, Sex, Disease
  analyseThis(myData, response=Z, covariate=Age)
  Z = 0.36 * Age, Significance level = 0.932
 or whatever. Really spoon feed the things they need to do. Make it 
really easy, foolproof.

 Then show them what's behind the analyseThis() function. How its not 
even part of the R distribution. How easy you made it for a beginner to 
do a complex and novel analysis. Then maybe it'll click for them, and 
they'll see how having a programming language behind their statistics 
functions lets them explore in ways not thought possible with the 
point-n-click paradigm. Perhaps they'll start editing analyseThis() and 
write analyseThat(), start thinking for themselves.

 Or maybe they'll just stare at you blankly...
Baz
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] truly object oriented programming in R

2004-08-12 Thread Jason Liao
Good morning! I recently implemented a KD tree in JAVA for faster
kernel density estimation (part of the code follows). It went well. To
hook it with R, however, has proved more difficult. My question is: is
it possible to implement the algorithm in R? My impression seems to
indicate no as the code requires a complete class-object framework that
R does not support. But is there an R package or something that may
make it possible? Thanks in advance for your help.

Java implementation of KD tree:

public class Kdnode {

private double[] center; //center of the bounding box
private double diameter; //maximum distance from center to
anywhere within the bounding box
private int numOfPoints; //number of source data points in the
bounding box 

private Kdnode left, right;


public Kdnode(double[][] points, int split_dim, int [][]
sortedIndices, double[][] bBox) {
   //bBox: the bounding box, 1st row the lower bound, 2nd row
the upper bound 
numOfPoints = points.length;
int d = points[0].length;   

center = new double[d];
for(int j=0; jd; j++) center[j] =
(bBox[0][j]+bBox[1][j])/2.;
diameter = get_diameter(bBox);

if(numOfPoints==1) {
  diameter = 0.;
  for(int j=0; jd; j++) center[j] = points[0][j];
  left = null;
  right = null;   
}
else {
  int middlePoint =
sortedIndices[split_dim][numOfPoints/2];  
  double splitValue = points[middlePoint][split_dim];

  middlePoint =
sortedIndices[split_dim][numOfPoints/2-1]; 
  double splitValue_small =
points[middlePoint][split_dim]; 

  int left_size = numOfPoints/2;
  int right_size = numOfPoints - left_size; 

  double[][] leftPoints = new double[left_size][d];
  double[][] rightPoints = new double[right_size][d];   


  int[][] leftSortedIndices = new int[d][left_size];   
 
  int[][] rightSortedIndices = new int[d][right_size];

  int left_counter = 0, right_counter = 0; 
 
  int[] splitInfo = new int [numOfPoints];

  for(int i = 0; i  numOfPoints; i++) {
if(points[i][split_dim]  splitValue) {
for(int j=0; jd; j++) leftPoints[left_counter][j] = 
points[i][j];
splitInfo[i] = right_counter;
left_counter++;
}

else {
for(int j=0; jd; j++) rightPoints[right_counter][j] = 
points[i][j];
splitInfo[i] = left_counter; 
right_counter++;
}
  }
// modify appropriately the indices to correspond to the new 
lists
for(int i = 0; i  d; i++) {
int left_index = 0, right_index = 0;
for(int j = 0; j  numOfPoints; j++) {
if(points[sortedIndices[i][j]][split_dim]  
splitValue) 
leftSortedIndices[i][left_index++] = 
sortedIndices[i][j] -
splitInfo[sortedIndices[i][j]]; 
elserightSortedIndices[i][right_index++] = 
sortedIndices[i][j]
- splitInfo[sortedIndices[i][j]];
}   
}

// Recursively compute the kdnodes for the points in the two
splitted spaces 
double[][] leftBBox = new double[2][];
double[][] rightBBox = new double[2][];  

for(int i=0; i2; i++) {
leftBBox[i] =
(double[])bBox[i].clone();
rightBBox[i] =
(double[])bBox[i].clone();  
}

leftBBox[1][split_dim] = splitValue_small;
rightBBox[0][split_dim] = splitValue; 

int next_dim = (split_dim + 1) % (d);
left = new Kdnode(leftPoints, next_dim, leftSortedIndices,
leftBBox);  
right = new 

Re: [R] truly object oriented programming in R

2004-08-12 Thread Gabor Grothendieck
Jason Liao jg_liao at yahoo.com writes:

: 
: Good morning! I recently implemented a KD tree in JAVA for faster
: kernel density estimation (part of the code follows). It went well. To
: hook it with R, however, has proved more difficult. My question is: is
: it possible to implement the algorithm in R? My impression seems to
: indicate no as the code requires a complete class-object framework that
: R does not support. But is there an R package or something that may
: make it possible? Thanks in advance for your help.

R comes with the S3 and S4 object systems out-of-the-box and there is an
addon package oo.R available at:

   http://www.maths.lth.se/help/R/R.classes/ 

that provides a more conventional OO system.   Its likely that one or more
of these would satisfy your requirements.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] linear constraint optim with bounds/reparametrization

2004-08-12 Thread Kahra Hannu
From Spencer Graves:

However, for an equality constraint, I've had good luck by with an objective function 
that adds something like the
following to my objective function: constraintViolationPenalty*(A%*%theta-c)^2, where 
constraintViolationPenalty is
passed via ... in a call to optim.

I applied Spencer's suggestion to a set of eight different constrained portfolio 
optimization problems. It seems to give a usable practice to solve the portfolio 
problem, when the QP optimizer is not applicable. After all, practical portfolio 
management is more an art than a science.   

I may first run optim with a modest value for constraintViolationPenalty then restart 
it with the output of the 
initial run as starting values and with a larger value for 
constraintViolationPenalty. 

I wrote a loop that starts with a small value for the penalty and stops when the 
change of the function value, when increasing the penalty, is less than epsilon. I 
found that epsilon = 1e-06 provides a reasonable accuracy with respect to 
computational time.

Spencer, many thanks for your suggestion.

Hannu Kahra

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] error using daisy() in library(cluster). Bug?

2004-08-12 Thread Martin Maechler
[Reverted back to R-help, after private exchange]

 MM == Martin Maechler [EMAIL PROTECTED]
 on Thu, 12 Aug 2004 17:12:01 +0200 writes:

 javier == javier garcia - CEBAS [EMAIL PROTECTED]
 on Thu, 12 Aug 2004 16:28:27 +0200 writes:

javier Martin; Yes I know that there are variables with all
javier five values 'NA'. I've left them as they are just
javier because of saving a couple of lines in the script,
javier and because I like to see that they are there,
javier although all values are 'NA'.  I don't expect they
javier are used in the analysis, but are they the source of
javier the problem?

MM yes, but only because of stand = TRUE.

MM Yes, one could imagine that it might be good when
MM standardizing these all NA variables would work

MM I'll think a bit more about it.  Thank you for the
MM example.

Ok. I've thought (and looked at the R code) a bit longer.
Also considered the fact (you mentioned) that this worked in R 1.8.0.
Hence, I'm considering the current behavior a bug.

Here is the patch (apply to cluster/R/daisy.q in the *source*
 or at the appriopriate place in cluster_installed/R/cluster ) :

--- daisy.q 2004/06/25 16:17:47 1.17
+++ daisy.q 2004/08/12 15:23:26
@@ -78,8 +78,8 @@
 if(all(type2 == I)) {
if(stand) {
 x - scale(x, center = TRUE, scale = FALSE) #- 0-means
-sx - colMeans(abs(x))
-if(any(sx == 0)) {
+   sx - colMeans(abs(x), na.rm = TRUE)# can still have NA's
+   if(0 %in% sx) {
 warning(sQuote(x),  has constant columns ,
 pColl(which(sx == 0)), ; these are standardized to 0)
 sx[sx == 0] - 1


Thank you for helping to find and fix this bug.
Martin Maechler, ETH Zurich, Switzerland

javier El Jue 12 Ago 2004 15:11, MM escribió:

 Javier, I could well read your .RData and try your
 script to produce the same error from daisy().
 
 Your dataframe is of dimension 5 x 180 and has many
 variables that have all five values 'NA' (see below).
 
 You can't expect to use these, do you?  Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] RE: Giving a first good impression of R to Social Scientists

2004-08-12 Thread Gabor Grothendieck
Rau, Roland Rau at demogr.mpg.de writes:

   Yes, I do know the R-Commander. But I did not want to give them a
 GUI but rather expose them to the command line after I demonstrated that the
 steep learning curve in the beginning is worth the effort for the final
 results.

Note that Rcmdr displays all the underlying generated R code that does
the analysis as it runs so you are exposed to the command line.  This
might pique the interest of students wishing to learn more while giving
an easy-to-use and immediately useful environment for those who just want
to get results in the shortest most direction fashion.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Approaches to using RUnit

2004-08-12 Thread Seth Falcon
On Tue, Aug 10, 2004 at 04:53:49PM +0200, Klaus Juenemann wrote:
 If you don't organize your code into packages but source individual R
 files your approach to source the code at the beginning of a test file
 looks the right thing to do.

Appears to be working pretty well for me too ;-)

 We mainly use packages and the code we use to test packages A and B, 
 say, looks like 

SNIP

 We use the tests subdirectory of a package to store our RUnit tests
 even though this is not really according to R conventions.

In an off list exchange with A.J. Rossini, we discussed an alternative
for using RUnit in a package.  The idea was to put the runit_*.R files
(containing test code) into somePackage/inst/runit/ and then put a
script, say dorunit.R inside somePackage/test/ that would create the
test suite's similar to the code you included in your mail.  The
advantage of this would be that the unit tests would run using R CMD
check.

In the next week or so I hope to package-ify some code and try this out.  


+ seth

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Giving a first good impression of R to Social Scientists

2004-08-12 Thread Liaw, Andy
 From: Barry Rowlingson
 
 Thomas Lumley wrote:
  On Thu, 12 Aug 2004, Rau, Roland wrote:
  
 That is why would like to ask the experts on this list if 
 anyone of you has
 encountered a similar experience and what you could advise 
 to persuade
 people quickly that it is worth learning a new software?
  
 
   The usual way of teaching R seems to be bottom-up. Here's 
 the command 
 prompt, type some arithmetic, make some assignments, learn about 
 function calls and arguments, write your own functions, write 
 your own 
 packages.
 
   Perhaps a top-down approach might help certain cases. People using 
 point-n-click packages tend to use a limited range of analyses. Write 
 some functions that do these analyses, or give them wrappers so that 
 they get something like:
 
myData = readDataFile(foo.dat)
 Read 4 variables: Z, Age, Sex, Disease
 
analyseThis(myData, response=Z, covariate=Age)
 
Z = 0.36 * Age, Significance level = 0.932
 
   or whatever. Really spoon feed the things they need to do. Make it 
 really easy, foolproof.

The problem is that the only `fool' that had been `proof' against is the one
that the developer(s) had imagined.  One cannot under-estimate users'
ability to out-fool the developers' imagination...

Cheers,
Andy

 
   Then show them what's behind the analyseThis() function. 
 How its not 
 even part of the R distribution. How easy you made it for a 
 beginner to 
 do a complex and novel analysis. Then maybe it'll click for 
 them, and 
 they'll see how having a programming language behind their statistics 
 functions lets them explore in ways not thought possible with the 
 point-n-click paradigm. Perhaps they'll start editing 
 analyseThis() and 
 write analyseThat(), start thinking for themselves.
 
   Or maybe they'll just stare at you blankly...
 
 Baz
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] truly object oriented programming in R

2004-08-12 Thread Seth Falcon
For an overview of the OOP R package, see
http://cran.r-project.org/doc/Rnews/Rnews_2001-3.pdf

+ seth

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] truly object oriented programming in R

2004-08-12 Thread Thomas Lumley
On Thu, 12 Aug 2004, Jason Liao wrote:

 Good morning! I recently implemented a KD tree in JAVA for faster
 kernel density estimation (part of the code follows). It went well. To
 hook it with R, however, has proved more difficult. My question is: is
 it possible to implement the algorithm in R? My impression seems to
 indicate no as the code requires a complete class-object framework that
 R does not support. But is there an R package or something that may
 make it possible? Thanks in advance for your help.


This code doesn't seem to have any  requirement for a class-object
framework. The methods can all be written as functions, there isn't any
use of inheritance or polymorphism. The data structure can then be a list.

Now, you might want to make this list an object, to improve printing or to
make it easier to check that the functions don't get called with arguments
that aren't really k-d trees.  This is well within the facilities of even
the S3 method system.

AFAICS the only class/object facility that Java provides and the
methods package doesn't is enforcement of private methods and data,
which has absolutely no impact on the complexity of programs (it can
affect how easy code *maintenance* is, because it forces you to decide
what is and isn't in your API, but that's a separate issue).

The old S3 class system is weaker, since it doesn't support function
polymorphism based on more than one of the arguments and doesn't have
reliable reflectance facilities.


-thomas



 Java implementation of KD tree:

 public class Kdnode {

 private double[] center; //center of the bounding box
 private double diameter; //maximum distance from center to
 anywhere within the bounding box
 private int numOfPoints; //number of source data points in the
 bounding box

 private Kdnode left, right;


   public Kdnode(double[][] points, int split_dim, int [][]
 sortedIndices, double[][] bBox) {
//bBox: the bounding box, 1st row the lower bound, 2nd row
 the upper bound
 numOfPoints = points.length;
   int d = points[0].length;

 center = new double[d];
 for(int j=0; jd; j++) center[j] =
 (bBox[0][j]+bBox[1][j])/2.;
 diameter = get_diameter(bBox);

   if(numOfPoints==1) {
   diameter = 0.;
   for(int j=0; jd; j++) center[j] = points[0][j];
 left = null;
 right = null;
   }
   else {
   int middlePoint =
 sortedIndices[split_dim][numOfPoints/2];
 double splitValue = points[middlePoint][split_dim];

   middlePoint =
 sortedIndices[split_dim][numOfPoints/2-1];
   double splitValue_small =
 points[middlePoint][split_dim];

 int left_size = numOfPoints/2;
   int right_size = numOfPoints - left_size;

 double[][] leftPoints = new double[left_size][d];
   double[][] rightPoints = new double[right_size][d];


 int[][] leftSortedIndices = new int[d][left_size];
 int[][] rightSortedIndices = new int[d][right_size];

 int left_counter = 0, right_counter = 0;
 int[] splitInfo = new int [numOfPoints];

 for(int i = 0; i  numOfPoints; i++) {
   if(points[i][split_dim]  splitValue) {
   for(int j=0; jd; j++) leftPoints[left_counter][j] = 
 points[i][j];
   splitInfo[i] = right_counter;
 left_counter++;
 }

   else {
   for(int j=0; jd; j++) rightPoints[right_counter][j] = 
 points[i][j];
   splitInfo[i] = left_counter;
 right_counter++;
 }
   }
   // modify appropriately the indices to correspond to the new 
 lists
   for(int i = 0; i  d; i++) {
   int left_index = 0, right_index = 0;
   for(int j = 0; j  numOfPoints; j++) {
   if(points[sortedIndices[i][j]][split_dim]  
 splitValue)
   leftSortedIndices[i][left_index++] = 
 sortedIndices[i][j] -
 splitInfo[sortedIndices[i][j]];
   elserightSortedIndices[i][right_index++] = 
 sortedIndices[i][j]
 - splitInfo[sortedIndices[i][j]];
 }
   }

   // Recursively compute the kdnodes for the points in the two
 splitted spaces
   double[][] leftBBox = new double[2][];
   double[][] rightBBox = new double[2][];

 for(int i=0; i2; i++) {
 leftBBox[i] =
 (double[])bBox[i].clone();

Re: [R] truly object oriented programming in R

2004-08-12 Thread Jason Liao
Dear Thomas,
Thank you very much again for taking time to answer my questions. I am
sorry that my knoweldge of R is limited as I have only learned what is
necessary to do my work. In the KD tree, we have this recursive data
structure in that each knod has two children knods and this process
continues until the data points are divided. Does R's list support this
recursive data structure? If yes, can you give a sample program? 
Regards,
Jason

--- Thomas Lumley [EMAIL PROTECTED] wrote:

 On Thu, 12 Aug 2004, Jason Liao wrote:
 
  Good morning! I recently implemented a KD tree in JAVA for faster
  kernel density estimation (part of the code follows). It went well.
 To
  hook it with R, however, has proved more difficult. My question is:
 is
  it possible to implement the algorithm in R? My impression seems to
  indicate no as the code requires a complete class-object framework
 that
  R does not support. But is there an R package or something that may
  make it possible? Thanks in advance for your help.
 
 
 This code doesn't seem to have any  requirement for a class-object
 framework. The methods can all be written as functions, there isn't
 any
 use of inheritance or polymorphism. The data structure can then be a
 list.
 
 Now, you might want to make this list an object, to improve printing
 or to
 make it easier to check that the functions don't get called with
 arguments
 that aren't really k-d trees.  This is well within the facilities of
 even
 the S3 method system.
 
 AFAICS the only class/object facility that Java provides and the
 methods package doesn't is enforcement of private methods and
 data,
 which has absolutely no impact on the complexity of programs (it can
 affect how easy code *maintenance* is, because it forces you to
 decide
 what is and isn't in your API, but that's a separate issue).
 
 The old S3 class system is weaker, since it doesn't support function
 polymorphism based on more than one of the arguments and doesn't have
 reliable reflectance facilities.
 
 
   -thomas
 
 
 
  Java implementation of KD tree:
 
  public class Kdnode {
 
  private double[] center; //center of the bounding box
  private double diameter; //maximum distance from center to
  anywhere within the bounding box
  private int numOfPoints; //number of source data points in
 the
  bounding box
 
  private Kdnode left, right;
 
 
  public Kdnode(double[][] points, int split_dim, int [][]
  sortedIndices, double[][] bBox) {
 //bBox: the bounding box, 1st row the lower bound, 2nd
 row
  the upper bound
  numOfPoints = points.length;
  int d = points[0].length;
 
  center = new double[d];
  for(int j=0; jd; j++) center[j] =
  (bBox[0][j]+bBox[1][j])/2.;
  diameter = get_diameter(bBox);
 
  if(numOfPoints==1) {
diameter = 0.;
for(int j=0; jd; j++) center[j] = points[0][j];
left = null;
right = null;
  }
  else {
int middlePoint =
  sortedIndices[split_dim][numOfPoints/2];
double splitValue = points[middlePoint][split_dim];
 
middlePoint =
  sortedIndices[split_dim][numOfPoints/2-1];
double splitValue_small =
  points[middlePoint][split_dim];
 
int left_size = numOfPoints/2;
int right_size = numOfPoints - left_size;
 
double[][] leftPoints = new double[left_size][d];
double[][] rightPoints = new
 double[right_size][d];
 
 
int[][] leftSortedIndices = new int[d][left_size];
int[][] rightSortedIndices = new int[d][right_size];
 
int left_counter = 0, right_counter = 0;
int[] splitInfo = new int [numOfPoints];
 
for(int i = 0; i  numOfPoints; i++) {
  if(points[i][split_dim]  splitValue) {
  for(int j=0; jd; j++) leftPoints[left_counter][j] =
 points[i][j];
  splitInfo[i] = right_counter;
  left_counter++;
  }
 
  else {
  for(int j=0; jd; j++) rightPoints[right_counter][j] =
 points[i][j];
  splitInfo[i] = left_counter;
  right_counter++;
  }
}
  // modify appropriately the indices to correspond to the new
 lists
  for(int i = 0; i  d; i++) {
  int left_index = 0, right_index = 0;
  for(int j = 0; j  numOfPoints; j++) {
  if(points[sortedIndices[i][j]][split_dim]  
  splitValue)
  leftSortedIndices[i][left_index++] = 
  sortedIndices[i][j] -
  splitInfo[sortedIndices[i][j]];

[R] rgl snapshot command

2004-08-12 Thread Joern Kamradt
Hi,
I am using the rgl package for 3D display. Unfortunately, I am not able 
to get the snapshot command running.
I tried the following:

 example(rgl.surface)
rgl.sr data(volcano)
rgl.sr y - 2 * volcano
rgl.sr x - 10 * (1:nrow(y))
rgl.sr z - 10 * (1:ncol(y))
rgl.sr ylim - range(y)
rgl.sr ylen - ylim[2] - ylim[1] + 1
rgl.sr colorlut - terrain.colors(ylen)
rgl.sr col - colorlut[y - ylim[1] + 1]
rgl.sr rgl.clear()
rgl.sr rgl.surface(x, z, y, color = col)
 rgl.snapshot(filename=./volcano.png,fmt=png)
[1] failed
Any help is highly appreciated
Joern

__
Joern Kamradt, MD
Cancer Genetic Branch
National Human Genome Research Institute
National Institutes of Health
50 South Drive, building 50, room 5147
Bethesda, MD 20892-8000, USA
Phone#: +1 (301) 496 5382
FAX#: +1 (301) 402 3241
Email: [EMAIL PROTECTED]
Website: www.genome.gov
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] [R-pkgs] new package fortunes 1.0-3

2004-08-12 Thread Achim Zeileis
Dear useRs,

I used the summer months to work on all of my packages,
and so this is the first of a sequence of announcements
of new or updated packages. The new packages are new 
in the sense that previous versions had been on CRAN for
some months but hadn't been announced to the R community
via this list until now.

All packages are available from the CRAN master site
in source form - binary versions should be available
from the mirrors in the next days.


So the first announcement is for fortunes 1.0-3:

The fortunes package provides simple infrastructure for
reading fortunes from a .csv file and displaying them.
Furthermore, it contains a growing list of fortunes
related to R, mainly collected from the mailing lists
but also from quotes at conferences. The author list
contains me (as I've written the R code) as well as
the people who contributed quotes by sending me a mail.
The original authors of each quote are always given
in the respective fortune.

For those of you who want to see a quote each time they 
start up R: you can add to your .Rprofile something like

if(interactive()) { library(fortunes); fortune() }

If you want to create your own list of fortunes you can
simply add another fortune collection in .csv format.
Of course, it would also be great if you could contribute
some quotes to the package...simply send me an e-mail.

Enjoy!
Z


-
Package: fortunes
Version: 1.0-3
Date: 2004-08-10
Title: R Fortunes
Author: Achim Zeileis, fortune contributions from Torsten Hothorn, Peter
Dalgaard, Uwe Ligges, Kevin Wright, Martin Maechler
Maintainer: Achim Zeileis [EMAIL PROTECTED]
Description: R Fortunes
Depends: R (= 1.4.0)
License: GPL

___
R-packages mailing list
[EMAIL PROTECTED]
https://stat.ethz.ch/mailman/listinfo/r-packages

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] [R-pkgs] new package sandwich 0.1-3

2004-08-12 Thread Achim Zeileis
Dear useRs,

here is the announcement for the next new package:
sandwich 0.1-3.

sandwich provides heteroskedasticity (and autocorrelation)
consistent covariance matrix estimators (also called HC
and HAC estimators).

The former are implemented in the function vcovHC() (which
was available in strucchange  before - and independently
in hccm() in John Fox's car package).

And the latter are implemented in the function vcovHAC().
This implements sandwich-type estimators in a rather 
flexible way, allowing for user-defined weights or 
weight functions. It builds on some of the functionality
which was before available in Thomas Lumley's weave package
(not on CRAN). In particular it makes available the
class of WEAVE estimators introduced by Lumley  Heagerty (1999)
in the function weave() which is a convenience interface to
vcovHAC(). Furthermore, it implements the class of kernel
HAC estimators with automatic bandwidth-selection of
Andrews (1991) in the function kernHAC(), which is again a
convenience interface to vcovHAC().

Best wishes,
Z

-
Package: sandwich
Version: 0.1-3
Date: 2004-07-19
Title: Robust Covariance Matrix Estimators
Author: Thomas Lumley, Achim Zeileis
Maintainer: Achim Zeileis [EMAIL PROTECTED]
Description: Model-robust standard error estimators for time series
 and longitudinal data.
Depends: zoo, R (= 1.5.0)
License: GPL version 2

___
R-packages mailing list
[EMAIL PROTECTED]
https://stat.ethz.ch/mailman/listinfo/r-packages

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] [R-pkgs] new package zoo 0.2-0

2004-08-12 Thread Achim Zeileis
Dear useRs,

yet another new package: zoo 0.2-0.

zoo provides a simple S3 class and methods for totally
ordered indexed observations such as irregular time
series. Although there are other packages for irregular
time series available on CRAN (Giles Heywood's its 
package and irts() in Adrian Trapletti's tseries package)
I wrote this package because I needed something which
provides simple infrastructure for observations with
(almost) arbitrary indexes (and not only POSIXct time
stampes as in its() and irts()). And it was at least also
useful for Gabor Grothendieck who provided most of the
updates to this version.

Best wishes,
Z



Package: zoo
Version: 0.2-0
Date: 2004-08-12
Title: Z's ordered observations
Author: Achim Zeileis, Gabor Grothendieck
Maintainer: Achim Zeileis [EMAIL PROTECTED]
Description: A class with methods for totally ordered indexed
observations such as irregular time series.
Depends: R (= 1.7.0)
License: GPL

___
R-packages mailing list
[EMAIL PROTECTED]
https://stat.ethz.ch/mailman/listinfo/r-packages

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] [R-pkgs] updated package strucchange 1.2-4

2004-08-12 Thread Achim Zeileis
Dear useRs,

the strucchange package for testing for structural change
has been updated: the current version is 1.2-4.
The most significant additions were two functions gefp()
and efpFunctional().

gefp() implements a class of generalized M-fluctuation
tests for testing for parameter instability or structural
change in general parametric models including generalized
linear models (GLMs). 

efpFunctional() provides infrastructure for inference based
on functionals applied to empirical fluctuation processes
such as automatic tabulation of critical values and a choice
of a suitable visualization method.

The theory behind both functions is described in Zeileis
 Hornik (2003), the implementation ideas are explained in
Zeileis (2004). Links to both papers are available from
my web page: http://www.ci.tuwien.ac.at/~zeileis/

Best wishes,
Z



Package: strucchange
Version: 1.2-4
Date: 2004-08-10
Title: Testing for Structural Change
Author: Achim Zeileis, Friedrich Leisch, Bruce Hansen,
Kurt Hornik, Christian Kleiber, Andrea Peters
Maintainer: Achim Zeileis [EMAIL PROTECTED]
Description: Testing, dating and monitoring of structural change in
 linear regression relationships.
 strucchange features tests/methods from the generalized
 fluctuation test framework as well as from the F test (Chow
 test) framework. This includes methods to fit, plot and
 test fluctuation processes (e.g., CUSUM, MOSUM,
 recursive/moving estimates) and F statistics, respectively.
 It is possible to monitor incoming data online using
 fluctuation processes.
 Finally, the breakpoints in regression models with
 structural changes can be estimated together with
 confidence intervals. Emphasis is always given to methods
 for visualizing the data.
Depends: sandwich, zoo, R (= 1.5.0)
License: GPL

___
R-packages mailing list
[EMAIL PROTECTED]
https://stat.ethz.ch/mailman/listinfo/r-packages

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] [R-pkgs] updated package ineq 0.2-4

2004-08-12 Thread Achim Zeileis
Dear useRs,

my last announcement is an update of the ineq package
for measuring inequality, concentration and poverty.
The current version is now 0.2-4.

Thanks to suggestions from Rein Halbersma the Pen()
function for plotting Pen's parade was improved and now
allows for much more flexibility. See the help page
for examples.

Best wishes,
Z

-

Package: ineq
Version: 0.2-4
Date: 2004-08-10
Title: Measuring inequality, concentration and poverty
Author: Achim Zeileis
Maintainer: Achim Zeileis [EMAIL PROTECTED]
Description: Inequality, concentration and poverty measures
 Lorenz curves (empirical and theoretical)
License: GPL

___
R-packages mailing list
[EMAIL PROTECTED]
https://stat.ethz.ch/mailman/listinfo/r-packages

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] truly object oriented programming in R

2004-08-12 Thread Thomas Lumley
On Thu, 12 Aug 2004, Jason Liao wrote:

 Dear Thomas,
 Thank you very much again for taking time to answer my questions. I am
 sorry that my knoweldge of R is limited as I have only learned what is
 necessary to do my work. In the KD tree, we have this recursive data
 structure in that each knod has two children knods and this process
 continues until the data points are divided. Does R's list support this
 recursive data structure? If yes, can you give a sample program?

Yes, the elements of a list can be lists. For example, a simple binary
tree could have lists with elements left, right, key, and data

## create a new single node
newtree-function(key,data){ list(left=NULL,right=NULL, key=key,
data=data)}

## add a node to a sorted tree
addnode-function(tree, key, data){

if (key=tree$key){
   if (is.null(tree$left))
tree$left-newtree(data=data,key=key)
   else
tree$left-addnode(tree$left,key,data)
} else {
   if (is.null(tree$right))
tree$right-newtree(data=data,key=key)
   else
tree$right-addnode(tree$left,key,data)

}
return(tree)
}


## inorder traversal.  action() is any function that takes key and data
## arguments
applyinorder-function(tree, action){

c(if (!is.null(tree$left))
  applyinorder(tree$left,action),
action(tree$key,tree$data),
if (!is.null(tree$right))
  applyinorder(tree$right, action))

}


## an example
 a-newtree(R,two optional method systems and first-class functions)
 a-addnode(a,Java,compulsory object system)
 a-addnode(a,C,No built-in support but that needn't stop you)
 a-addnode(a,C++,If C++ is your hammer, everything looks like a
thumb)
 applyinorder(a,function(key,data) paste(key,data,sep=: ))
[1] C: No built-in support but that needn't stop you
[2] C++: If C++ is your hammer, everything looks like a thumb
[3] Java: compulsory object system
[4] R: two optional method systems and first-class functions

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] truly object oriented programming in R

2004-08-12 Thread Thomas Lumley
On Thu, 12 Aug 2004, Seth Falcon wrote:

 The thing that's very different from, say, Java is that everything is an
 object in R --- there isn't a notion of a *reference* to an object,
 which is why in the above I had to say head - insertNode(...) where
 as in Java you could pass in a reference to head and have the method
 modify what it points to.

 I think there are some ways around this, at least syntactically, using
 various tricks with environment(), but I don't yet understand them well
 enough to comment further.


Yes, and there is support in packages for other object systems in addition
to the two built-in ones.

Some of us feel that if you want Java you know where to find it...


-thomas

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] .Random.seed error

2004-08-12 Thread Richard Urbano
I have this snippet of code from an example in Dr. Harrel's book Regression
Modeling Strategies p 501

 

n-2000

.Random.seed -c(49,39,17,36,23,0,43,51,6,54,50,1)

age -50 + 12 * rnorm(n)

age

 

I get the error message:  Error in rnorm(n) : .Random.seed[1] is NOT a valid
RNG kind (code)

 

I have tried this on Windows and Linux  R versions 1.8.1, 1.9.0, and 1.9.1

If I comment out the .Random.seed line and call set.seed(49),  set.seed(39)
etc before each call to a random generator function, everyone is HAPPY.

 

Does anyone have a suggestion?

 

 

Richard dot urbano At Vanderbilt dot edu


[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .Random.seed error

2004-08-12 Thread Prof Brian Ripley
On Thu, 12 Aug 2004, Richard Urbano wrote:

 I have this snippet of code from an example in Dr. Harrel's book Regression
 Modeling Strategies p 501

That's in a section called `S-PLUS Examples'.
^^
 n-2000
 .Random.seed -c(49,39,17,36,23,0,43,51,6,54,50,1)
 age -50 + 12 * rnorm(n)
 age
 
 I get the error message:  Error in rnorm(n) : .Random.seed[1] is NOT a valid
 RNG kind (code)
 
 I have tried this on Windows and Linux  R versions 1.8.1, 1.9.0, and 1.9.1

But you did not try the S-PLUS examples in S-PLUS 

 If I comment out the .Random.seed line and call set.seed(49),  set.seed(39)
 etc before each call to a random generator function, everyone is HAPPY.

 Does anyone have a suggestion?

Don't confuse S-PLUS and R.

Read the R documentation before posting, as the posting guide asks -- see 
?.Random.seed for the explanation of the format of .Random.seed.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Fwd: Timebased predictions in postgresql.

2004-08-12 Thread Nicolai Petri
Hi r-people :)

I'm sorry to disturb but I must admit that I know amazingly little about R and 
similar statistics-packages/languages and I'm kind of lost on where to start.
I'm currently working on a datacollection framework for postgresql (The db 
doesn't really matter except that I hope to use PL/R) and I would like to be 
able to predict future values preferable 1 day or more ahead. The highest 
resolution on the historic data is 4 minutes but I'm already resampling that 
to whatever I need, so if it would be better to use 30min or other 
reasolution (because of performance) it would be perfectly ok.

The types of statistics in the database is typically network io/ cpu usage, 
temperatur, etc. and I will rarely have holes in the history. 

Can anyone tell me how (or give me a hint) I can predict traffic or similar 
maybe one or more days ahead when I have the data xxdays back ? (And many how 
many days / which interval would be optimal for best performance/precision).

You can also tell me it's impossible, but I think it could be really cool to 
present graphs of expected cpu-load or network IO to our users.

Can it be done in R (and PL/R) ?

Best regards,
Nicolai Petri
catpipe Systems Aps

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] hclust-segmentation fault

2004-08-12 Thread Medvedovic, Mario (medvedm)
Thanks a lot for the assistance. It was an installation mistake - apparently
we did not have BLAS libraries installed. The compiler did not seem to have
been a problem in this case, althought it is an older version (gcc 3.3.1).
Mario.


-Original Message-
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
Sent: Monday, August 09, 2004 11:51 AM
To: Medvedovic, Mario (medvedm)
Cc: '[EMAIL PROTECTED]'
Subject: RE: [R] hclust-segmentation fault


On Mon, 9 Aug 2004, Medvedovic, Mario (medvedm) wrote:

 Well, the use of debugger will take some time, but here is a 
simple code
 that invariably causes the fault. 
 Mario.
 
 indata-matrix(rnorm(1000,0,1),ncol=10)
 ed-dist(indata)
 hc.e-hclust(ed,average)

Works fine on R 1.9.1 on our dual Opteron 248 under FC2.

We know of some pertinent compiler bugs on x86_64, so is this 
gcc 3.3.3 
or later?

 -Original Message-
 From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
 Sent: Monday, August 09, 2004 11:14 AM
 To: Medvedovic, Mario (medvedm)
 Cc: '[EMAIL PROTECTED]'
 Subject: Re: [R] hclust-segmentation fault
 
 
 On Mon, 9 Aug 2004, Medvedovic, Mario (medvedm) wrote:
 
  I am getting the Segmentation fault when using hclust in 
 R-1.9.1 running
  under SuSe 9.0 64-bit kernel on a dual opteron system with 
 8G of RAM. 
  I was wandering if anybody could offer any insight?
 
 Please try to use the debugger to supply more information, or 
 give us some 
 code we can reproduce on a similar system to see if we can 
 reproduce the 
 segfault.
 
 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595
 
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
 
 

-- 
Brian 
D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Error Using pm.getabst()

2004-08-12 Thread larsenmtl
R Users:

After installing Bioconductor, RSXML and all the relevant Win32 DLLs (libxml2, zlib, 
iconv), I receive the following error message when using pm.getabst()

Error in xmlRoot(absts) : no applicable method for xmlRoot

I receive this when using the example from help(pm.getabst).  

Downloading the target XML file, parsing it with xmlTreeParse and applying xmlRoot 
returns no error.

Your thoughts/suggestions are appreciated.

Mark Larsen

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] correlation structures in NLME

2004-08-12 Thread Michael Jerosch-Herold
I am using the latest version of R on a Windows machine and get the
following error when I try to initialize a correlation structure with
the function corAR1 in NLME. This example is taken from the book of
Pinheiro and Bates, so it should work. What is going wrong?

 library(nlme)
 data(Orthodont)
 cs1AR1 - corAR1(0.8, form= ~1 | Subject)
 cs1AR1 - initialize(cs1AR1, data = Orthodont)
Error in methodsPackageMetaName(C, name) : 
The name of the object (e.g,. a class or generic function) to
find in the meta-data must be a single string (got a character vector of
length 2)
In addition: Warning message: 
the condition has length  1 and only the first element will be used
in: if (!is.na(match(Class, .BasicClasses))) return(newBasic(Class,  


Thank you!

Michael Jerosch-Herold

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] correlation structures in NLME

2004-08-12 Thread Prof Brian Ripley
On Thu, 12 Aug 2004, Michael Jerosch-Herold wrote:

 I am using the latest version of R on a Windows machine and get the
 following error when I try to initialize a correlation structure with
 the function corAR1 in NLME. This example is taken from the book of
 Pinheiro and Bates, so it should work. What is going wrong?

That book is about S(-PLUS), so there is no reason why it `should work' in
R.

  library(nlme)
  data(Orthodont)
  cs1AR1 - corAR1(0.8, form= ~1 | Subject)
  cs1AR1 - initialize(cs1AR1, data = Orthodont)
 Error in methodsPackageMetaName(C, name) : 
 The name of the object (e.g,. a class or generic function) to
 find in the meta-data must be a single string (got a character vector of
 length 2)
 In addition: Warning message: 
 the condition has length  1 and only the first element will be used
 in: if (!is.na(match(Class, .BasicClasses))) return(newBasic(Class,  

Try

 find(initialize)
[1] package:methods

so that's not right.

Have you looked in the R scripts in package nlme?  I think it is
Initialize() in R.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Error Using pm.getabst()

2004-08-12 Thread Robert Gentleman
You will almost surely do better to ask about Bioconductor packages on
the Bioconductor mailing list. Next, it is helpful to know what
versions of things you are using.

As for your problem, did you look to see what kind of object absts is?
There seems to be no default method for xmlRoot, and it is likely that
the call to create the absts object failed (prior to this). You might
want to try stepping through the commands, one at a time and checking
each step. Often, the problem arises because you have not properly set
up your connection to the internet and so none of the querying
software will work.

  Robert

On Thu, Aug 12, 2004 at 08:20:10PM +, [EMAIL PROTECTED] wrote:
 R Users:
 
 After installing Bioconductor, RSXML and all the relevant Win32 DLLs (libxml2, zlib, 
 iconv), I receive the following error message when using pm.getabst()
 
 Error in xmlRoot(absts) : no applicable method for xmlRoot
 
 I receive this when using the example from help(pm.getabst).  
 
 Downloading the target XML file, parsing it with xmlTreeParse and applying xmlRoot 
 returns no error.
 
 Your thoughts/suggestions are appreciated.
 
 Mark Larsen
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
+---+
| Robert Gentleman phone : (617) 632-5250   |
| Associate Professor  fax:   (617)  632-2444   |
| Department of Biostatistics  office: M1B20|
| Harvard School of Public Health  email: [EMAIL PROTECTED]|
+---+

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] How to use the whole dataset (including between events) in Cox model (time-varying covariates) ?

2004-08-12 Thread Mayeul KAUFFMANN
Hello,

coxph does not use any information that are in the dataset between event
times (or death times) , since computation only occurs at event  times.

For instance, removing observations when there is no event at that time in
the whole dataset does not change the results:
 set.seed(1)
 data -
as.data.frame(cbind(start=c(1:5,1:5,1:4),stop=c(2:6,2:6,2:5),status=c(rep(
0,7),1,rep(0,5),1),id=c(rep(1,5),rep(2,5),rep(3,4)),x1=rnorm(14)))
 data
start stop status id x1
1 1 2 0 1 -0.6264538
2 2 3 0 1 0.1836433
3 3 4 0 1 -0.8356286
4 4 5 0 1 1.5952808
5 5 6 0 1 0.3295078
6 1 2 0 2 -0.8204684
7 2 3 0 2 0.4874291
8 3 4 1 2 0.7383247
9 4 5 0 2 0.5757814
10 5 6 0 2 -0.3053884
11 1 2 0 3 1.5117812
12 2 3 0 3 0.3898432
13 3 4 0 3 -0.6212406
14 4 5 1 3 -2.2146999
coxph(Surv(start, stop,status)~ cluster(id)+x1,data=data ,robust=T)
coxph(Surv(start, stop,status)~ cluster(id)+x1,data=subset(data,stop %in%
4:5) ,robust=T) # the same !!! (except n)

First, some data is lost.
Second, this loss could be an important problem when  there is a
time-varying covariate that changes quicker than the frequency  of events.
Specifically, I have a covariate which has low values most of the time. It
sometimes jumps to high values and that is hypothesized as greatly
increasing the risk of an event.
With rare events, the effect of this covariate will only be measured at
event times. Chances are that the only time such a covariate is recorded
at high level, the individual for which it is measured as being high is
having an event.
This may bias the estimated coefficient.

Here is my question:
How to fully use the dataset?

(that is: how to have really _time-varying_ covariates (even if they
change step by step, not continuously), not covariates whose changes are
measured only at event time )

Ideally, the full dataset would be use to estimate the parameters, or at
least to estimate the standard error of the estimated parameters.
Any ideas ???
.
.
.

A second best (which might require less work) would be to use all the
dataset to assess the predictive power of the model.

Maybe by using the expected number of events for an individual over the
time interval that they were observed to be at risk
 predict(coxfit,type=expected)
and compare it with observed number of events
(does it use all data and takes into account all the baseline hazard, even
between events?)


Or, if not,  following Brian D. Ripley suggestion about the baseline
hazard: As an approximation you can smooth the fitted
baseline cumulative hazard (e.g. by package pspline) and ask for its
derivative (https://stat.ethz.ch/pipermail/r-help/2004-July/052376.html)

the following code could be use to estimate (and plot) a smooth baseline
hazard:
 t-seq(min(data$start),max(data$stop),length=100)
 lines(t,
predict(sm.spline(x=basehaz(coxfit)[,2],y=basehaz(coxfit)[,1],norder=2),
t,1))
#there is a problem with this code. One should add the contraint that the
baseline hazard cannot be negative.

The following computes the parametric part of the cox model.
 risk - predict(coxfit, type='risk') # gives exp(X'b)

something like
 baseline.hazard*risk
would give the true risk at any time (but it would be probably much harder
to compute)

which could help assess the predictive power of the model.
(still a lot of work)

Thanks in advance for any help or comment.

Mayeul KAUFFMANN
Univ. Pierre Mendes France
Grenoble - France

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] coxph question

2004-08-12 Thread Mayeul KAUFFMANN
 I have many variables to test using cox model (coxph), and I am only
interested in those variables with p value less than 0.01. Is there a
quick way to do this automatically instead of looking at the output of
each variable?
 Chris

I guess you need covariate selection.
for a lengthy discussion of another method (AIC/BIC), look at last month
archive:
https://www.stat.math.ethz.ch/pipermail/r-help/2004-July/subject.html#53519

and try using
 library(MASS)
 stepAIC (...)

Most of the time, it starts removing the covariate with the lower p-value.


Mayeul KAUFFMANN
Univ. Pierre Mendes France
Grenoble - France

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] pnorm, qnorm

2004-08-12 Thread David Duffy
Trenkler, Dietrich said:

 I found the following strange behavior  using qnorm() and pnorm():

  x-8.21;x-qnorm(pnorm(x))
 [1] 0.0004638484
  x-8.28;x-qnorm(pnorm(x))
 [1] 0.07046385
  x-8.29;x-qnorm(pnorm(x))
 [1] 0.08046385
  x-8.30;x-qnorm(pnorm(x))
 [1] -Inf

 qnorm(1-.Machine$double.eps)
 [1] 8.12589

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Error Using pm.getabst()

2004-08-12 Thread lars
Robert,

Thank you for your reply.  I thought RSXML was an R (CRAN) package? I
realize your package is part of bioconductor so I'll try the
bioconductor mailing list as well.  

Also and more importantly I took your suggestion and stepped through the
calls.   It seems the failure in pm.getabst() occurs when creating absts
with the pubmed function.  When I'm back at work, I'll have to research
why it fails further.  Right now, while I'm at home, it works
flawlessly.

Any reason this would fail behind a corporate firewall as opposed to my
home network?

Oh and my apologies for posting so hastily without the full
information.  R - 1.9.1, annotate - 1.4.0, RSXML - 0.97, all on a
Windows2000 OS.

Thanks,

Mark Larsen


On Thu, 2004-08-12 at 17:28, Robert Gentleman wrote: 
 You will almost surely do better to ask about Bioconductor packages on
 the Bioconductor mailing list. Next, it is helpful to know what
 versions of things you are using.
 
 As for your problem, did you look to see what kind of object absts is?
 There seems to be no default method for xmlRoot, and it is likely that
 the call to create the absts object failed (prior to this). You might
 want to try stepping through the commands, one at a time and checking
 each step. Often, the problem arises because you have not properly set
 up your connection to the internet and so none of the querying
 software will work.
 
   Robert
 
 On Thu, Aug 12, 2004 at 08:20:10PM +, [EMAIL PROTECTED] wrote:
  R Users:
  
  After installing Bioconductor, RSXML and all the relevant Win32 DLLs (libxml2, 
  zlib, iconv), I receive the following error message when using pm.getabst()
  
  Error in xmlRoot(absts) : no applicable method for xmlRoot
  
  I receive this when using the example from help(pm.getabst).  
  
  Downloading the target XML file, parsing it with xmlTreeParse and applying xmlRoot 
  returns no error.
  
  Your thoughts/suggestions are appreciated.
  
  Mark Larsen
  
  __
  [EMAIL PROTECTED] mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Object oriented programming resources

2004-08-12 Thread Matthew Walker
Hi,
I'm looking for resources to read about the object-oriented features of R.
I have looked through the Manuals page on r-project.org.  The most 
useful of the documents seemed to be the draft of the R language 
definition.  However it had only about 6 pages on the topic. 

I have also used Google, but my problem here is that R appears in a 
*lot* of webpages!  I tried limiting the search by using 
site:r-project.org, but didn't find anything very useful.

Specifically, I'm trying to find information on member variables (I 
think that's the correct term), as I'd like to copy this concept from C++:

class a {
 ...
private:
 int x;  // I think the term for this is a member variable
};
Thanks for your thoughts,
Matthew
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Object oriented programming resources

2004-08-12 Thread Vadim Ogranovich
The Bioconductor project posts a short tutorial A guide to using S4
Objects under Developer Page frame. I've found it useful. 

Note that R-s S4-classes approach to OOP is very different from the one
of C++ or Java. Yet you will find member vars, they are called slots.

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Matthew Walker
 Sent: Thursday, August 12, 2004 7:56 PM
 To: [EMAIL PROTECTED]
 Subject: [R] Object oriented programming resources
 
 Hi,
 
 I'm looking for resources to read about the object-oriented 
 features of R.
 
 I have looked through the Manuals page on r-project.org.  
 The most useful of the documents seemed to be the draft of 
 the R language definition.  However it had only about 6 
 pages on the topic. 
 
 I have also used Google, but my problem here is that R appears in a
 *lot* of webpages!  I tried limiting the search by using 
 site:r-project.org, but didn't find anything very useful.
 
 Specifically, I'm trying to find information on member 
 variables (I think that's the correct term), as I'd like to 
 copy this concept from C++:
 
 class a {
   ...
 private:
   int x;  // I think the term for this is a member variable };
 
 Thanks for your thoughts,
 
 Matthew
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html