Re: [R] error code 5 from Lapack routine 'dsyevr'

2007-03-16 Thread Prof Brian Ripley
On Fri, 16 Mar 2007, Richard D. Morey wrote:

 While using the rmvnorm function, I get the error:

 Error in eigen(sigma, sym = TRUE) : error code 5 from Lapack routine
 'dsyevr'

 The same thing happens when I try the eigen() function on my covariance
 matrix. The matrix is a symmetric 111x111 matrix. Well, it is almost
 symmetric; there are slight deviations from symmetry (the largest is
 3e-18).  I have this in an MCMC loop, and it happens about once in every
 40 iterations or so.

 What does this error mean?

See the LAPACK code (in src/modules/lapack).

Internal LAPACK errors are usually problems with arithmetic accuracy, 
and as such are compiler- and CPU-specific.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ARIMA standard error

2007-03-16 Thread Prof Brian Ripley
On Fri, 16 Mar 2007, Gad Abraham wrote:

 Andrew Robinson wrote:
 Hi Gad,

 try:


 class(a)
 [1] Arima
 getAnywhere(print.Arima)

 Thanks Andrew.

 For the record, the standard error is the square root of the diagonal of
 the covariance matrix a$var.coef (itself obtained through some magic):

 ses[x$mask] - round(sqrt(diag(x$var.coef)), digits = digits)

And for the record, ?arima does tell you about the component var.coef, and 
also suggests the vcov() method to extract the variance matrix.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Defining a floor value in a data.frame on a row-by-row basis

2007-03-16 Thread Feng, Ken [CIB-EQTY]
Hi,

I have a data frame x, and a vector y which limits x by rows
so that no element of x in that row is smaller than y.

For example,

 x - as.data.frame( t( array( 1:9, dim=c(3,3
 y - c( 2, 8, 0 )
 x
  V1 V2 V3
1  1  2  3
2  4  5  6
3  7  8  9
 y
[1] 2 8 0

=
I want to get this:

  V1 V2 V3
1  2  2  3
2  8  8  8
3  7  8  9

=

I tried:

x[ x=y ] - y

but R wasn't happy.

I know I can do this with a loop, but there has to be a way which I can avoid 
it...

Thanks in advance.

- Ken

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R and clinical studies

2007-03-16 Thread Delphine Fontaine
Thanks for your answer which was very helpfull. I have another question:

I have read in this document  
(http://cran.r-project.org/doc/manuals/R-intro.pdf) that most of the  
programs written in R are ephemeral and that new releases are not  
always compatible with previous releases. What I would like to know is  
if R functions are already validated and if not, what should we do to  
validate a R function ?

-- 
Delphine Fontaine


Quoting Soukup, Mat [EMAIL PROTECTED]:

 Delphine,

 Please see the following message posted a week ago:
 http://comments.gmane.org/gmane.comp.lang.r.general/80175.

 HTH,

 -Mat

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Delphine Fontaine
 Sent: Friday, March 09, 2007 8:29 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] R and clinical studies

 Does anyone know if for clinical studies the FDA would accept
 statistical analyses performed with R ?

 Delphine Fontaine

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help on sigmoid curve fitting

2007-03-16 Thread Hufkens Koen
Hi list,
 
I was wondering how I should go about fitting a sigmoid curve to a dataset. 
More specifically how I estimate parameters a and b in the following equation:
 
1 / 1+exp(-(x-a)*b)
 
with b the steepness of the sigmoid curve and a the shift of the center of the 
sigmoid curve relative to the center of your dataframe. The fit is in function 
of x, the location within the input vector and y, an ecological value.
 
So I would like to estimate parameters a and b and a goodness of fit/ F-value 
of some kind.
 
Any suggestions?
 
Kind regards,
Koen
 

-- 

Checked by AVG Free Edition.

 11:27
 

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How can i do the same thing in the China map?

2007-03-16 Thread usstata

The maps package has a function called match.map, which is for map coloring 
.
Its example is followed:

# filled map showing Republican vote in 1900
# (figure 6 in the reference)
data(state, package = datasets)
data(votes.repub)
state.to.map - match.map(state, state.name)
x - votes.repub[state.to.map, 1900]
gray.colors - function(n) gray(rev(0:(n - 1))/n)
color - gray.colors(100)[floor(x)]
map(state, fill = TRUE, col = color); map(state, add = TRUE)

I want to do the same thing in the China map, but I can't find the Provinces 
name of China.
Who can help me ?



a rookie

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] expm() within the Matrix package

2007-03-16 Thread Martin Maechler
 Gad == Gad Abraham [EMAIL PROTECTED]
 on Fri, 16 Mar 2007 09:48:52 +1100 writes:

Gad Laura Hill wrote:
 Hi
 
 Could anybody give me a bit of advice on some code I'm having trouble 
with?
 
 I've been trying to calculate the loglikelihood of a function iterated 
over
 a data of time values and I seem to be experiencing difficulty when I use
 the function expm(). Here's an example of what I am trying to do
 
 
 y-c(5,10)  #vector of 2 survival times
 
 p-Matrix(c(1,0),1,2)   #1x2 matrix
 
 Q-Matrix(c(1,2,3,4),2,2)   #2x2 matrix
 
 q-Matrix(c(1,2),2,1)   #2x1 matrix
 
 Loglik-rep(0,length(y))#creating empty vector of same length as y
 
 for(i in 1:length(y)){
 
 Loglik[i]-log((p %*% (expm(Q*y[i]))) %*% q)  #calculating
 
 #  Loglikelihood for each y[i]
 }
 
 The code works perfectly if I use exp(Q*y[i]) but not for expm()

Gad You need to do a type conversion here, because you get the loglik as a 
Gad 1x1 Matrix, instead of a scalar:

Gad (after running your code)

 log(p %*% expm(Q * y[i]) %*% q)
Gad 1 x 1 Matrix of class dgeMatrix
Gad [,1]
Gad [1,] 134.5565


Gad If you convert to numeric, you can then assign it to Loglik:
 Loglik[1] - as.numeric(log(p %*% expm(Q * y[i]) %*% q))
 Loglik[1]
Gad [1] 134.5565


Hmm, I don't think that's Laura's problem
(and actually I don't know what her problem is) :

Assignment of a 1 x 1 matrix to a vector is not a problem,
and hence the  as.numeric(.) above  really has no effect :

 ll - 1:2
 (m - matrix(pi, 1,1))
 [,1]
[1,] 3.141593
 ll[1] - m
 ll
[1] 3.141593 2.00
 

Martin Maechler, ETH Zurich

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] expm() within the Matrix package

2007-03-16 Thread Martin Maechler
 Gad == Gad Abraham [EMAIL PROTECTED]
 on Fri, 16 Mar 2007 09:48:52 +1100 writes:

Gad Laura Hill wrote:
 Hi
 
 Could anybody give me a bit of advice on some code I'm having trouble 
with?
 
 I've been trying to calculate the loglikelihood of a function iterated 
over
 a data of time values and I seem to be experiencing difficulty when I use
 the function expm(). Here's an example of what I am trying to do
 
 
 y-c(5,10)  #vector of 2 survival times
 
 p-Matrix(c(1,0),1,2)   #1x2 matrix
 
 Q-Matrix(c(1,2,3,4),2,2)   #2x2 matrix
 
 q-Matrix(c(1,2),2,1)   #2x1 matrix
 
 Loglik-rep(0,length(y))#creating empty vector of same length as y
 
 for(i in 1:length(y)){
 
 Loglik[i]-log((p %*% (expm(Q*y[i]))) %*% q)  #calculating
 
 #  Loglikelihood for each y[i]
 }
 
 The code works perfectly if I use exp(Q*y[i]) but not for expm()

Gad You need to do a type conversion here, because you get the loglik as a 
Gad 1x1 Matrix, instead of a scalar:

Gad (after running your code)

 log(p %*% expm(Q * y[i]) %*% q)
Gad 1 x 1 Matrix of class dgeMatrix
Gad [,1]
Gad [1,] 134.5565


Gad If you convert to numeric, you can then assign it to Loglik:
 Loglik[1] - as.numeric(log(p %*% expm(Q * y[i]) %*% q))
 Loglik[1]
Gad [1] 134.5565



Gad -- 
Gad Gad Abraham
Gad Department of Mathematics and Statistics
Gad The University of Melbourne
Gad Parkville 3010, Victoria, Australia
Gad email: [EMAIL PROTECTED]
Gad web: http://www.ms.unimelb.edu.au/~gabraham

Gad __
Gad R-help@stat.math.ethz.ch mailing list
Gad https://stat.ethz.ch/mailman/listinfo/r-help
Gad PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
Gad and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rattle- MSACCESS database problem

2007-03-16 Thread Graham Williams
Received Thu 08 Mar 2007  2:38am +1100 from j.joshua thomas:
 library(RGtk2)
 library(rattle)
 rattle()
 
 click the ODBC option it as the DSN i am a bit confused with this i already
 put my *.mdb file in C:drive
 i try put the DSN name as Microsoft Access driver, in the appropriate text
 box but i couldnt locate the table
 
 i tried the other way round open- locate the *.mdb in C:drive couldnt
 locate
 
 i tried RODBC aswell, but i want to use rattle to Data mine my database
 
 need someone's help

Did it work with RODBC? Could you send what appears in the Log
tab. That will give me an idea of what is going on.

Regards,
Graham

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem installing R onto Solaris 2.10 system - need advice!!!!!

2007-03-16 Thread Jenny Barnes
Dear R-Help friends,

I am unable to get the latest version of R (2.4.1) to compile on my solaris 10 
system - has anybody else experienced this problem and are you able to offer me 
any advice? 

I appreciate your time, many thanks,

Jenny Barnes


Here are my CURRENT specifications:

platform   sparc-sun-solaris2.10 
arch   sparc 
os solaris2.10   
system sparc, solaris2.10
status   
major  2 
minor  3.1   
year   2006  
month  06
day01
svn rev38247 
language   R 
version.string Version 2.3.1 (2006-06-01)



~~
Jennifer Barnes
PhD student: long range drought prediction 
Climate Extremes Group
Department of Space and Climate Physics
University College London
Holmbury St Mary 
Dorking, Surrey, RH5 6NT
Tel: 01483 204149
Mob: 07916 139187
Web: http://climate.mssl.ucl.ac.uk

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Defining a floor value in a data.frame on a row-by-row basis

2007-03-16 Thread Uwe Ligges
* Feng, Ken [CIB-EQTY] [EMAIL PROTECTED] [070316 09:40]:
 Hi,
 
 I have a data frame x, and a vector y which limits x by rows
 so that no element of x in that row is smaller than y.
 
 For example,
 
  x - as.data.frame( t( array( 1:9, dim=c(3,3
  y - c( 2, 8, 0 )
  x
   V1 V2 V3
 1  1  2  3
 2  4  5  6
 3  7  8  9
  y
 [1] 2 8 0
 
 =
 I want to get this:
 
   V1 V2 V3
 1  2  2  3
 2  8  8  8
 3  7  8  9
 
 =
 
 I tried:
 
 x[ x=y ] - y
 
 but R wasn't happy.
 
 I know I can do this with a loop, but there has to be a way which I can avoid 
 it...

sapply(x, function(xc) ifelse(xc  y, y, xc))

Uwe Ligges

 
 Thanks in advance.
 
 - Ken
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Complex superscript plot text

2007-03-16 Thread Uwe Ligges
* Bob Farmer [EMAIL PROTECTED] [070316 05:57]:
 Hi all:
 
 I would like to create a line of plot margin text (using mtext() ) that 
 features both a superscript and a subset of an object.  However, I 
 cannot seem to do both things at once, nor have I found a way to paste 
 the two results together.
 
 (I pull the object subset because this is part of a for-next loop to 
 make multiple graphs)
 
 This example doesn't give me what I want from mtext():
 
 GoodnessOfFits-c(1, 0.75)
 
 graphNumber-1  #first loop of the for-next loop (not shown)
 
 x-seq(-10, 10, 1)
 y-(x^2)
 plot(x,y)
 lines(x, predict(lm(y~I(x^2
 mtext(text=
   expression(R^2 * : * GoodnessOfFits[graphNumber]))
 

plot(x,y)
lines(x, predict(lm(y~I(x^2
mtext(text = substitute(R^2 == GF, list(GF = GoodnessOfFits[graphNumber])), 
line = 1)
 

Uwe Ligges


 I recognize that in this example, I could extract the R-squared value 
 from the model in each loop, however this does not apply to my more 
 complicated real scenario.
 
 Any suggestions?
 
 Thanks.
 --Bob Farmer
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Complex superscript plot text

2007-03-16 Thread talepanda
try:

mtext(substitute(R^2 * : * GoodnessOfFits[i],list(i=graphNumber)))

HTH.

On 3/16/07, Bob Farmer [EMAIL PROTECTED] wrote:
 Hi all:

 I would like to create a line of plot margin text (using mtext() ) that
 features both a superscript and a subset of an object.  However, I
 cannot seem to do both things at once, nor have I found a way to paste
 the two results together.

 (I pull the object subset because this is part of a for-next loop to
 make multiple graphs)

 This example doesn't give me what I want from mtext():

 GoodnessOfFits-c(1, 0.75)

 graphNumber-1  #first loop of the for-next loop (not shown)

 x-seq(-10, 10, 1)
 y-(x^2)
 plot(x,y)
 lines(x, predict(lm(y~I(x^2
 mtext(text=
   expression(R^2 * : * GoodnessOfFits[graphNumber]))


 I recognize that in this example, I could extract the R-squared value
 from the model in each loop, however this does not apply to my more
 complicated real scenario.

 Any suggestions?

 Thanks.
 --Bob Farmer

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help on sigmoid curve fitting

2007-03-16 Thread talepanda
you can use nls:

see

?nls

and its example.

HTH.


On 3/16/07, Hufkens Koen [EMAIL PROTECTED] wrote:
 Hi list,

 I was wondering how I should go about fitting a sigmoid curve to a dataset.
 More specifically how I estimate parameters a and b in the following
 equation:

 1 / 1+exp(-(x-a)*b)

 with b the steepness of the sigmoid curve and a the shift of the center of
 the sigmoid curve relative to the center of your dataframe. The fit is in
 function of x, the location within the input vector and y, an ecological
 value.

 So I would like to estimate parameters a and b and a goodness of fit/
 F-value of some kind.

 Any suggestions?

 Kind regards,
 Koen


 --

 Checked by AVG Free Edition.

  11:27


   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How can i do the same thing in the China map?

2007-03-16 Thread Ray Brownrigg
usstata wrote:
 The maps package has a function called match.map, which is for map 
 coloring .
 Its example is followed:
 
 # filled map showing Republican vote in 1900
 # (figure 6 in the reference)
 data(state, package = datasets)
 data(votes.repub)
 state.to.map - match.map(state, state.name)
 x - votes.repub[state.to.map, 1900]
 gray.colors - function(n) gray(rev(0:(n - 1))/n)
 color - gray.colors(100)[floor(x)]
 map(state, fill = TRUE, col = color); map(state, add = TRUE)
 
 I want to do the same thing in the China map, but I can't find the Provinces 
 name of China.
 Who can help me ?
 
nobody
 
 
 a rookie
 
Tell us who you are, and you may get a more substantial reply.

Ray Brownrigg

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] gls bug?

2007-03-16 Thread simon bond
Dear R-help,

I found that the following code crashes R (version 2.4.0 in windows).


 x=rnorm(10,0.1,1)
 library(nlme)
 gls(x~0)



I quickly found a work-around for what I was trying to do, but I thought I 
should report this.


Best wishes


Simon Bond

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem to add legend to panel.histogram function

2007-03-16 Thread KOITA Lassana - STAC/ACE



Hi, R users,
The following script works well, but I have problem to add some legend on
graphs. For example, I would like to past on each topright graphs one
thing:
   brown color for Gumbel density
   green color for Weibull density

Thank you for your help

(See attached file: compar_densités.txt)

Lassana KOITA
Chargé d'Etudes de Sécurité Aéroportuaire et d'Analyse Statistique  /
Project Engineer Airport Safety Studies  Statistical analysis
Service Technique de l'Aviation Civile (STAC) / Civil Aviation Technical
Department
Direction Générale de l'Aviation Civile (DGAC) / French Civil Aviation
Headquarters
Tel: 01 49 56 80 60
Fax: 01 49 56 82 14
E-mail: [EMAIL PROTECTED]
http://www.stac.aviation-civile.gouv.fr/__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to use result of approxfun in a package?

2007-03-16 Thread Uwe Ligges
* Steve Buyske [EMAIL PROTECTED] [070315 19:05]:
 I am working on a project where we start with start with 2 long,  
 equal-length vectors, massage them in various ways, and end up with a  
 function mapping one interval to another. I'll call that function  
 f1. The last step in R is to generate f1 as the value of the  
 approxfun function. I would like to put f1 into a package, but  
 without having the package redo the creation of function f1. The  
 value of approxfun is a function; for example, I have
 
   f1
 function (v)
 .C(R_approx, as.double(x), as.double(y), as.integer(n), xout =  
 as.double(v),
  as.integer(length(v)), as.integer(method), as.double(yleft),
  as.double(yright), as.double(f), NAOK = TRUE, PACKAGE = base) 
 $xout
 environment: 0x17719324
 
 I don't really understand how this is stored, and in particular, how  
 I should handle it so as to include the function f1 in a package. I  
 would like the users to be able to load the package and use f1  
 directly, rather than re-generate it using approxfun.


approxfun() makes use of some feature of some nice lexical scoping feature:
If a function (approxfun) returns some other function (f1), then this one is 
stored together with the environment of the first function (approxfun).
f1 requires not only v (its formal argument), but also objects x, y, n, method, 
yleft, yright, and f. All these are in the envionment that was created 
at runtime of approxfun() which is referenced as environment: 0x17719324 in 
your special case. 

You can save() the object f1 together with the environment as in:
save(f1, f1.Rdata)

and reload it later with 
load(f1.Rdata)

In a package, this means you can either save it as data or you can specify 
the call to approxfun() directly in an R file that executes it during 
either loading the installed package or during installation if a saved 
image of the packages is used, depending on the setting in your 
DESCRIPTION file.

Uwe Ligges



 
 Thanks,
 Steve
 ---
 Steve Buyske (rhymes with nice key)
 Associate Research Professor
 Department of Statistics  Biostatistics
 Rutgers University
 Hill Center, 110 Frelinghuysen Rd
 Piscataway, NJ 08854
 [EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem installing R onto Solaris 2.10 system - need advice!!!!!

2007-03-16 Thread Jenny Barnes
Dear Andrew and R-help,

Here is the error message that we got when trying to install R v2.4 (which we 
tried to install before this newer version 2.4.1 - I'm afrid I didn't save the 
error message from the latest attempt with the new version):

configure
make
...
...
...

f90: CODE: 0 WORDS, DATA: 0 WORDS
gcc -G -L/usr/local/lib -o stats.so init.o kmeans.o  ansari.o bandwidths.o
chisq
sim.o d2x2xk.o fexact.o kendall.o ks.o  line.o smooth.o  prho.o swilk.o 
ksmooth
.o loessc.o isoreg.o Srunmed.o Trunmed.o  dblcen.o distance.o
hclust-utils.o  nl
s.o  HoltWinters.o PPsum.o arima.o burg.o filter.o  mAR.o pacf.o starma.o
port.o
 family.o sbart.o bsplvd.o bvalue.o bvalus.o loessf.o ppr.o qsbart.o 
sgram.o si
nerp.o sslvrg.o stxwx.o  hclust.o kmns.o  eureka.o stl.o portsrc.o
-L../../../..
/lib -lRblas  -lg2c -lm -lgcc_s
mkdir ../../../../library/stats/libs
building package 'datasets'
mkdir ../../../library/datasets
mkdir ../../../library/datasets/R
mkdir ../../../library/datasets/data
Error in dyn.load(x, as.logical(local), as.logical(now)) :
unable to load shared library
'/tmp/R-2.4.0/library/stats/libs/stats.so'
:
  ld.so.1: R: fatal: relocation error: file
/tmp/R-2.4.0/library/stats/libs/stat
s.so: symbol __i_abs: referenced symbol not found
Execution halted
*** Error code 1

Many thanks,

Jenny


Hi Jenny,

advice: try posting the error message accompanying the failure to
compile.

Cheers

Andrew

On Fri, Mar 16, 2007 at 09:19:24AM +, Jenny Barnes wrote:
 Dear R-Help friends,
 
 I am unable to get the latest version of R (2.4.1) to compile on my solaris 
10 
 system - has anybody else experienced this problem and are you able to offer 
me 
 any advice? 
 
 I appreciate your time, many thanks,
 
 Jenny Barnes
 
 
 Here are my CURRENT specifications:
 
 platform   sparc-sun-solaris2.10 
 arch   sparc 
 os solaris2.10   
 system sparc, solaris2.10
 status   
 major  2 
 minor  3.1   
 year   2006  
 month  06
 day01
 svn rev38247 
 language   R 
 version.string Version 2.3.1 (2006-06-01)
 
 
 
 ~~
 Jennifer Barnes
 PhD student: long range drought prediction 
 Climate Extremes Group
 Department of Space and Climate Physics
 University College London
 Holmbury St Mary 
 Dorking, Surrey, RH5 6NT
 Tel: 01483 204149
 Mob: 07916 139187
 Web: http://climate.mssl.ucl.ac.uk
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Andrew Robinson  
Department of Mathematics and StatisticsTel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/ 

~~
Jennifer Barnes
PhD student: long range drought prediction 
Climate Extremes Group
Department of Space and Climate Physics
University College London
Holmbury St Mary 
Dorking, Surrey, RH5 6NT
Tel: 01483 204149
Mob: 07916 139187
Web: http://climate.mssl.ucl.ac.uk

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] corAR1 in a random effects panel

2007-03-16 Thread Guillermo Villa
Hi everyone,

I am interested in estimating this type of random effects panel:

y_it = x'_it * beta + u_it + e_it

u_it = rho * u_it-1 + d_itrho belongs to (-1, 1)

where:

u and e are independent and normally zero-mean distributed.

d is also independently normally zero-mean distributed.

So, I want random effects for group i to be correlated in t, following an
AR(1) process.

I am using the mle command, including correlation=corAR1:

lme(asis~prec+pobl+gola+entr,random=~1|codi,correlation=corAR1(0.8
,form=~temp|codi)))

i = codi
t = temp

I am not sure whether the AR(1) process is applied to the random effects
(u_it) or the error term (e_it)... Any idea?

Thanks.

G



-- 
Guillermo Villa

Universidad Carlos III de Madrid
Business Economics Department
Office 6.0.26
Madrid, 126
28903, Getafe (Madrid)
SPAIN

Email: [EMAIL PROTECTED]
Phone: (+34) 916249772
Mobil: (+34) 655112743
Fax: (+34) 916249607
Skype: guillermo.villa
Website: www.guillermovilla.com

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bad points in regression

2007-03-16 Thread Alberto Monteiro
I have a question, maybe it's better to explain by example:

alpha - 0.3
beta - 0.4
sigma - 0.5
err - rnorm(100)
err[15] - 5; err[25] - -4; err[50] - 10
x - 1:100
y - alpha + beta * x + sigma * err
ll - lm(y ~ x)
plot(ll)

Now, the graphs clearly show that 15, 25 and 50 are the indexes
of the bad points. How can I retrieve this information from ll?

Alberto Monteiro

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Segmentation fault in estimating structural equation models with the SEM package.

2007-03-16 Thread Snaebjorn Gunnsteinsson
Dear R-users,

 I am running a large number of simulations and estimating a  
structural equation model for each one using the SEM package. Each  
run of my program has around 8000 simulations. Most of the time the  
program completes all of them correctly but sometimes I get a  
segmentation fault in the sem routine and my program stops with the  
following error message:

  *** caught segfault ***
 address (nil), cause 'unknown'

 Traceback:
 1: nlm(if (analytic.gradient) objective.2 else objective.1,  
 start, hessian = TRUE, iterlim = maxiter, print.level = if  
 (debug) 2 else 0, typsize = typsize, .
 ..)
 2: sem.default(ram = ram, S = S, N = N, param.names = pars,  
 var.names = vars, fixed.x = fixed.x, debug = debug, ...)
 3: sem(ram = ram, S = S, N = N, param.names = pars, var.names =  
 vars, fixed.x = fixed.x, debug = debug, ...)
 4: sem.mod(modelspec, cor.or.cov.mat, N, obs.variables =  
 comb.set, warn = FALSE)
 5: sem(modelspec, cor.or.cov.mat, N, obs.variables = comb.set, warn  
 = FALSE)
 6: try(sem(modelspec, cor.or.cov.mat, N, obs.variables =  
 comb.set, warn = FALSE), silent = TRUE)
 7: eval.with.vis(expr, envir, enclos)
 8: eval.with.vis(ei, envir)
 9: source(analysis.base.R)
 aborting ...

Does anybody have an idea about how I could prevent this from  
happening or catch it so that my program could continue running?  
Could I, in some way, manage the memory myself to prevent this?

The model is correctly specified and clearly identified but the  
covariance matrix changes between simulations (it is a two latent  
variable measurement model with more than 4 observed variables on  
each LV).

Many thanks,
regards,
Snaebjorn

P.S. Using R 2.4.1 (2006-06-01) on x86_64 GNU/Linux
Using version 0.9-6 (2006/11/22) of the SEM package.


---
Snaebjorn Gunnsteinsson
JiVitA Science Fellow
The JiVitA Project (NIPHP-USAID)
Rangpur, Bangladesh
www.jivita.org
Mobile: +880-171-320-2560

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Can I scale the labels in a 'persp' graph?

2007-03-16 Thread salcaraz
Hi all:

I'm using 'persp' for 3D graphics.

I need the axis's labels smaller than by defect.

I see in 'help()', the information about 'par()'.

I have wrote:

par(.,cex.axis=0.5,cex.lab=0.5)
perspc(.)

and the result don't change.

The question is: Can I change the size of labels in the perps graph??

Thank you in advance:

/salva





'cex.axis' The magnification to be used for axis annotation
   relative to the current setting of 'cex'. (Some functions
   such as 'points' accept a vector of values which are
   recycled.  Other uses will take just the first value if a
   vector of length greater than one is supplied.)

'cex.lab' The magnification to be used for x and y labels relative
   to the current setting of 'cex'.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bad points in regression

2007-03-16 Thread Ted Harding
On 16-Mar-07 11:56:56, Alberto Monteiro wrote:
 I have a question, maybe it's better to explain by example:
 
 alpha - 0.3
 beta - 0.4
 sigma - 0.5
 err - rnorm(100)
 err[15] - 5; err[25] - -4; err[50] - 10
 x - 1:100
 y - alpha + beta * x + sigma * err
 ll - lm(y ~ x)
 plot(ll)
 
 Now, the graphs clearly show that 15, 25 and 50 are the indexes
 of the bad points. How can I retrieve this information from ll?
 
 Alberto Monteiro

ll is the output of a linear model fiited by lm(), and so has
several components (see ?lm in the section Value), one of
which is residuals (which can be abbreviated to res).

So, in the case of your example,

  which(abs(ll$res)2)
  15 25 50 

extracts the information you want (and the 2 was inspired by
looking at the residuals plot from your plot(ll)).

Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 16-Mar-07   Time: 12:19:02
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bad points in regression

2007-03-16 Thread Alberto Monteiro
Ted Harding wrote:
 
 alpha - 0.3
 beta - 0.4
 sigma - 0.5
 err - rnorm(100)
 err[15] - 5; err[25] - -4; err[50] - 10
 x - 1:100
 y - alpha + beta * x + sigma * err
 ll - lm(y ~ x)
 plot(ll)
 
 ll is the output of a linear model fiited by lm(), and so has
 several components (see ?lm in the section Value), one of
 which is residuals (which can be abbreviated to res).
 
 So, in the case of your example,
 
   which(abs(ll$res)2)
   15 25 50
 
 extracts the information you want (and the 2 was inspired by
 looking at the residuals plot from your plot(ll)).

Ok, but how can I grab those points _in general_? What is the
criterium that plot used to mark those points as bad points?

names(ll)

gives:

 [1] coefficients  residuals effects   rank 
 [5] fitted.values assignqrdf.residual  
 [9] xlevels   call  terms model

None of them include information about those bad points.

Alberto Monteiro

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem installing R onto Solaris 2.10 system - need advice!!!!!

2007-03-16 Thread Prof Brian Ripley
It seems you are using f90, but FLIBS was computed using g77.  That would 
mean that something has been altered since R was configured, or that the 
way the Fortran compiler was specified was incorrect.

I think you need to start again and make use of only one set of compilers.

On Fri, 16 Mar 2007, Jenny Barnes wrote:

 Dear Andrew and R-help,

 Here is the error message that we got when trying to install R v2.4 (which we
 tried to install before this newer version 2.4.1 - I'm afrid I didn't save the
 error message from the latest attempt with the new version):

 configure
 make
 ...
 ...
 ...

 f90: CODE: 0 WORDS, DATA: 0 WORDS
 gcc -G -L/usr/local/lib -o stats.so init.o kmeans.o  ansari.o bandwidths.o
 chisq
 sim.o d2x2xk.o fexact.o kendall.o ks.o  line.o smooth.o  prho.o swilk.o
 ksmooth
 .o loessc.o isoreg.o Srunmed.o Trunmed.o  dblcen.o distance.o
 hclust-utils.o  nl
 s.o  HoltWinters.o PPsum.o arima.o burg.o filter.o  mAR.o pacf.o starma.o
 port.o
 family.o sbart.o bsplvd.o bvalue.o bvalus.o loessf.o ppr.o qsbart.o
 sgram.o si
 nerp.o sslvrg.o stxwx.o  hclust.o kmns.o  eureka.o stl.o portsrc.o
 -L../../../..
 /lib -lRblas  -lg2c -lm -lgcc_s
 mkdir ../../../../library/stats/libs
 building package 'datasets'
 mkdir ../../../library/datasets
 mkdir ../../../library/datasets/R
 mkdir ../../../library/datasets/data
 Error in dyn.load(x, as.logical(local), as.logical(now)) :
unable to load shared library
 '/tmp/R-2.4.0/library/stats/libs/stats.so'
 :
  ld.so.1: R: fatal: relocation error: file
 /tmp/R-2.4.0/library/stats/libs/stat
 s.so: symbol __i_abs: referenced symbol not found
 Execution halted
 *** Error code 1

 Many thanks,

 Jenny


 Hi Jenny,

 advice: try posting the error message accompanying the failure to
 compile.

 Cheers

 Andrew

 On Fri, Mar 16, 2007 at 09:19:24AM +, Jenny Barnes wrote:
 Dear R-Help friends,

 I am unable to get the latest version of R (2.4.1) to compile on my solaris
 10
 system - has anybody else experienced this problem and are you able to offer
 me
 any advice?

 I appreciate your time, many thanks,

 Jenny Barnes


 Here are my CURRENT specifications:

 platform   sparc-sun-solaris2.10
 arch   sparc
 os solaris2.10
 system sparc, solaris2.10
 status
 major  2
 minor  3.1
 year   2006
 month  06
 day01
 svn rev38247
 language   R
 version.string Version 2.3.1 (2006-06-01)



 ~~
 Jennifer Barnes
 PhD student: long range drought prediction
 Climate Extremes Group
 Department of Space and Climate Physics
 University College London
 Holmbury St Mary
 Dorking, Surrey, RH5 6NT
 Tel: 01483 204149
 Mob: 07916 139187
 Web: http://climate.mssl.ucl.ac.uk

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 --
 Andrew Robinson
 Department of Mathematics and StatisticsTel: +61-3-8344-9763
 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
 http://www.ms.unimelb.edu.au/~andrewpr
 http://blogs.mbs.edu/fishing-in-the-bay/

 ~~
 Jennifer Barnes
 PhD student: long range drought prediction
 Climate Extremes Group
 Department of Space and Climate Physics
 University College London
 Holmbury St Mary
 Dorking, Surrey, RH5 6NT
 Tel: 01483 204149
 Mob: 07916 139187
 Web: http://climate.mssl.ucl.ac.uk

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can I scale the labels in a 'persp' graph?

2007-03-16 Thread Duncan Murdoch
On 3/16/2007 8:02 AM, [EMAIL PROTECTED] wrote:
 Hi all:
 
 I'm using 'persp' for 3D graphics.
 
 I need the axis's labels smaller than by defect.
 
 I see in 'help()', the information about 'par()'.
 
 I have wrote:
 
par(.,cex.axis=0.5,cex.lab=0.5)
 perspc(.)
 
 and the result don't change.
 
 The question is: Can I change the size of labels in the perps graph??
 
 Thank you in advance:
 
 /salva
 
 
 
 
 
 'cex.axis' The magnification to be used for axis annotation
relative to the current setting of 'cex'. (Some functions
such as 'points' accept a vector of values which are
recycled.  Other uses will take just the first value if a
vector of length greater than one is supplied.)
 
 'cex.lab' The magnification to be used for x and y labels relative
to the current setting of 'cex'.

Those don't appear to be supported by persp, but cex is: e.g.

x - 1:10
y - 1:10
z - outer(x,y,function(x,y) sin((x+y)/10))
persp(x,y,z, cex=0.5)

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bad points in regression

2007-03-16 Thread Prof Brian Ripley
On Fri, 16 Mar 2007, Alberto Monteiro wrote:

 Ted Harding wrote:

 alpha - 0.3
 beta - 0.4
 sigma - 0.5
 err - rnorm(100)
 err[15] - 5; err[25] - -4; err[50] - 10
 x - 1:100
 y - alpha + beta * x + sigma * err
 ll - lm(y ~ x)
 plot(ll)

 ll is the output of a linear model fiited by lm(), and so has
 several components (see ?lm in the section Value), one of
 which is residuals (which can be abbreviated to res).

 So, in the case of your example,

   which(abs(ll$res)2)
   15 25 50

 extracts the information you want (and the 2 was inspired by
 looking at the residuals plot from your plot(ll)).

 Ok, but how can I grab those points _in general_? What is the
 criterium that plot used to mark those points as bad points?

 names(ll)

 gives:

 [1] coefficients  residuals effects   rank
 [5] fitted.values assignqrdf.residual
 [9] xlevels   call  terms model

 None of them include information about those bad points.

But it is the plot method that you are using, not the object ll.  If you 
examine stats::plot.lm you will see what it does: label the points with 
the 'id.n' largest (in absolute value) residuals (standardized residuals 
for types 2 and 3).

And ?plot.lm also tells you that.

BTW, 'bad points' seems your own description: it does not appear in the R 
documentation.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bad points in regression

2007-03-16 Thread Petr Pikal
Hi

you can check ?influence  or ?influence.measures to evaluate some 
regression diagnostics

Regards
Petr


On 16 Mar 2007 at 9:56, Alberto Monteiro wrote:

From:   Alberto Monteiro [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Date sent:  Fri, 16 Mar 2007 09:56:56 -0200
Subject:[R] Bad points in regression

 I have a question, maybe it's better to explain by example:
 
 alpha - 0.3
 beta - 0.4
 sigma - 0.5
 err - rnorm(100)
 err[15] - 5; err[25] - -4; err[50] - 10
 x - 1:100
 y - alpha + beta * x + sigma * err
 ll - lm(y ~ x)
 plot(ll)
 
 Now, the graphs clearly show that 15, 25 and 50 are the indexes
 of the bad points. How can I retrieve this information from ll?
 
 Alberto Monteiro
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented,
 minimal, self-contained, reproducible code.

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R and clinical studies

2007-03-16 Thread Frank E Harrell Jr
Delphine Fontaine wrote:
 Thanks for your answer which was very helpfull. I have another question:
 
 I have read in this document  
 (http://cran.r-project.org/doc/manuals/R-intro.pdf) that most of the  
 programs written in R are ephemeral and that new releases are not  
 always compatible with previous releases. What I would like to know is  
 if R functions are already validated and if not, what should we do to  
 validate a R function ?
 

In the sense in which most persons use the term 'validate', it means to 
show with one or more datasets that the function is capable of producing 
the right answer.  It doesn't mean that it produces the right answer for 
every dataset although we hope it does.  [As an aside, most errors are 
in the data manipulation phase, not in the analysis phase.]  So I think 
that instead of validating functions we should spend more effort on 
validating analyses [and validating analysis file derivation].  Pivotal 
analyses can be re-done a variety of ways, in R or in separate 
programmable packages such as Stata.

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Implementing trees in R

2007-03-16 Thread Yuk Lap Yip (Kevin)
Hi all,

I am rather new to R. Recently I have been trying to implement some 
tree algorithms in R. I used lists to model tree nodes. I thought 
something like this would work:

parent - list();
child - list();
parent$child1 - child;
child$parent - parent;

When I tried to check whether a node is its parent's first child 
using if (node$parent$child1 == node), it always returned false. Then 
I realized that it does not really work because parent$child1 - child 
actually makes a copy of child instead of referencing it. I think one 
possible fix is to keep a list of node objects, and make references 
using the positions in the list. For example, I think the following 
would work:

parent - list();
child - list();
nodes - list(parent, child);
parent$child1 - 2;
child$parent - 1;

Then the first child test can be rewritten as if 
(nodes[[nodes[[nodeId]]$parent]]$child1 == nodeId). However, I would 
prefer not to implement trees in this way, as it requires the 
inconvenient and error-prone manipulations of node IDs.

May I know if there is a way to make object references to lists? Or 
are there other ways to implement tree data structures in R?

BTW, I checked how hclust was implemented, and noticed that it calls 
an external Fortran program. I would want a solution not involving any 
external programs.

Thanks.

-- 


God bless.

Kevin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bad points in regression

2007-03-16 Thread Ted Harding
On 16-Mar-07 12:41:50, Alberto Monteiro wrote:
 Ted Harding wrote:
 
 alpha - 0.3
 beta - 0.4
 sigma - 0.5
 err - rnorm(100)
 err[15] - 5; err[25] - -4; err[50] - 10
 x - 1:100
 y - alpha + beta * x + sigma * err
 ll - lm(y ~ x)
 plot(ll)
 
 ll is the output of a linear model fiited by lm(), and so has
 several components (see ?lm in the section Value), one of
 which is residuals (which can be abbreviated to res).
 
 So, in the case of your example,
 
   which(abs(ll$res)2)
   15 25 50
 
 extracts the information you want (and the 2 was inspired by
 looking at the residuals plot from your plot(ll)).

 Ok, but how can I grab those points _in general_? What is the
 criterium that plot used to mark those points as bad points?

Ahh ... ! I see what you're after. OK, look at the plot method
for lm():

?plot.lm
  ## S3 method for class 'lm':
  plot(x, which = 1:4,
caption = c(Residuals vs Fitted, Normal Q-Q plot,
  Scale-Location plot, Cook's distance plot),
  panel = points,
  sub.caption = deparse(x$call), main = ,
  ask = prod(par(mfcol))  length(which)  dev.interactive(),
  ...,
  id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75)


where (see further down):

  id.n: number of points to be labelled in each plot, starting with
the most extreme.

and note, in the default parameter-values listing above:

  id.n = 3

Hence, the 3 most extreme points (according to the criterion being
plotted in each plot) are marked in each plot.

So, for instance3, try

  plot(ll,id.n=5)

and you will get points 10,15,25,28,50. And so on. But that
pre-supposes that you know how many points are exceptional.


What is meant by extremeis not stated in the help page ?plot.lm,
but can be identified by inspecting the code for plot.lm(), which
you can see by entering

  plot.lm

In your example, if you omit the line which assigns anomalous values
to err[15[, err[25] and err[50], then you are likely to observe that
different points get identified on different plots. For instance,
I just got the following results for the default id.n=3:

[1] Residuals vs Fitted:   41,53,59
[2] Standardised Residuals:41,53,59
[3] sqrt(Stand Res) vs Fitted: 41,53,59
[4] Cook's Distance:   59,96,97


There are several approaches (with somewhat different outcomes)
to identifying outliers. If you apply one of these, you will
probably get the identities of the points anyway.

Again in the context of your example (where in fact you
deliberately set 3 points to have exceptional errors, thus
coincidentally the same as the default value 3 of id.n),
you could try different values for id.n and inspect the graphs
to see whether a given value of id.n marks some points that
do not look exceptional relative to the mass of the other points.

So, the above plot(ll,id.n=5) gave me one point, 10 on the
residuals plot, which apparently belonged to the general
distribution of residuals.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 16-Mar-07   Time: 13:43:54
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fitting of all possible models

2007-03-16 Thread vaniscotte
Rense Nieuwenhuis a écrit :
 Dear Lukas,

 allthough I'm  intrigued by the purpose of what you are trying to do,  
 as mentioned by some of the other persons on this list, I liked the  
 challenge to write such a function.

 I came up with the following during some train-traveling this morning:


 tum - function(x)
   {
   tum - matrix(data=NA, nrow=2^x, ncol=x)
   
   for (i in 1:x)
   {
   tum[,i] - c(rep(NA,2^i/2),rep(i+1,2^i/2))
   }
   
   return(tum)
   }

 ###

 all.models - function(model)
   {
   npred - length(model$coefficients) - 1
   matr.model - tum(npred)
   output - matrix(data=NA, nrow=2^npred, ncol=1)
   
   for (t in 2:2^npred)
   {
   preds - names(model$coefficients)
   interc - names(model$coefficients)[1]
   form - as.formula(paste(. ~, 
 paste(preds[na.omit(matr.model 
 [t,])],collapse=+)))

   model2 - update(model, form)
   output[t,] - mean(resid(model2)^2)
   }
   
   return(output)
   
   }

 ##


 As you can see, I used a helper-function (tum, the ultimate matrix)  
 to the actual function. Also, I wrote it using lm instead of glm, but  
 I suppose that you can easily alter that. As well, the function now  
 returns just some regular fit-measurement. But that is not all that  
 essential, I think.

 The main point is: it works! Using this on my G4 mac, with a lm of 10  
 predictors and 18 cases, it returns the output quite fast (1 minute).

 I hope you can put this to use. It needs some easy adapting to your  
 specific needs, but I don't expect that to be a problem. If you need  
 help with that, please contact me.

 I'd appreciate to hear from you, if this function is helpful in any way.

 sincerely,

 Rense Nieuwenhuis





 On Feb 27, 2007, at 8:46 , Indermaur Lukas wrote:

   
 Hi,
 Fitting all possible models (GLM) with 10 predictors will result in  
 loads of (2^10 - 1) models. I want to do that in order to get the  
 importance of variables (having an unbalanced variable design) by  
 summing the up the AIC-weights of models including the same  
 variable, for every variable separately. It's time consuming and  
 annoying to define all possible models by hand.

 Is there a command, or easy solution to let R define the set of all  
 possible models itself? I defined models in the following way to  
 process them with a batch job:

 # e.g. model 1
 preference- formula(Y~Lwd + N + Sex + YY)
 # e.g. model 2
 preference_heterogeneity- formula(Y~Ri + Lwd + N + Sex + YY)
 etc.
 etc.


 I appreciate any hint
 Cheers
 Lukas





 °°°
 Lukas Indermaur, PhD student
 eawag / Swiss Federal Institute of Aquatic Science and Technology
 ECO - Department of Aquatic Ecology
 Überlandstrasse 133
 CH-8600 Dübendorf
 Switzerland

 Phone: +41 (0) 71 220 38 25
 Fax: +41 (0) 44 823 53 15
 Email: [EMAIL PROTECTED]
 www.lukasindermaur.ch

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

 


   [[alternative HTML version deleted]]

   
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
   
Dear Lukas,

In order to define the set of all possible models I used the function combos() 
of the hier.part package. 
The function all.regs() of the same package fit all the possible models.

God luck

Amélie Vaniscotte (Phd)

Department of Environmental Biology
UsC INRA EA3184MRT
Institute for Environmental Sciences and Technology

University of Franche-comté
Place Leclerc, 25030 Besançon cedex
FRANCE
Tel. :   +33 (0)381 665 714
Fax :  +33 (0)381 665 797
[EMAIL PROTECTED]: [EMAIL PROTECTED]
http://lbe.univ-fcomte.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Implementing trees in R

2007-03-16 Thread Gabor Grothendieck
Lists are not good for this.  There is an example in section 3.3 of
the proto vignette of using proto objects for this.  That section
also references an S4 example although its pretty messy with S4.

You might want to look at the graph, RBGL and graphviz packages
in Bioconductor and the dynamicgraph, mathgraph and sna packages
on CRAN.

On 3/16/07, Yuk Lap Yip (Kevin) [EMAIL PROTECTED] wrote:
 Hi all,

I am rather new to R. Recently I have been trying to implement some
 tree algorithms in R. I used lists to model tree nodes. I thought
 something like this would work:

parent - list();
child - list();
parent$child1 - child;
child$parent - parent;

When I tried to check whether a node is its parent's first child
 using if (node$parent$child1 == node), it always returned false. Then
 I realized that it does not really work because parent$child1 - child
 actually makes a copy of child instead of referencing it. I think one
 possible fix is to keep a list of node objects, and make references
 using the positions in the list. For example, I think the following
 would work:

parent - list();
child - list();
nodes - list(parent, child);
parent$child1 - 2;
child$parent - 1;

Then the first child test can be rewritten as if
 (nodes[[nodes[[nodeId]]$parent]]$child1 == nodeId). However, I would
 prefer not to implement trees in this way, as it requires the
 inconvenient and error-prone manipulations of node IDs.

May I know if there is a way to make object references to lists? Or
 are there other ways to implement tree data structures in R?

BTW, I checked how hclust was implemented, and noticed that it calls
 an external Fortran program. I would want a solution not involving any
 external programs.

Thanks.

 --


God bless.

Kevin

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] corAR1 in a random effects panel

2007-03-16 Thread Guillermo Villa
(I am posting the same question again, as some has replied my previous
question with his own question...)

Hi everyone,

I am interested in estimating this type of random effects panel:

y_it = x'_it * beta + u_it + e_it

u_it = rho * u_it-1 + d_itrho belongs to (-1, 1)

where:

u and e are independent and normally zero-mean distributed.

d is also independently normally zero-mean distributed.

So, I want random effects for group i to be correlated in t, following an
AR(1) process.

I am using the mle command, including correlation=corAR1:

lme(asis~prec+pobl+gola+entr,random=~1|codi,correlation=corAR1(0.8
,form=~temp|codi)))

i = codi
t = temp

I am not sure whether the AR(1) process is applied to the random effects
(u_it) or the error term (e_it)... Any idea?

Thanks.

G



-- 
Guillermo Villa

Universidad Carlos III de Madrid
Business Economics Department
Office 6.0.26
Madrid, 126
28903, Getafe (Madrid)
SPAIN

Email: [EMAIL PROTECTED]
Phone: (+34) 916249772
Mobil: (+34) 655112743
Fax: (+34) 916249607
Skype: guillermo.villa
Website: www.guillermovilla.com

-- 
Guillermo Villa

Universidad Carlos III de Madrid
Business Economics Department
Office 6.0.26
Madrid, 126
28903, Getafe (Madrid)
SPAIN

Email: [EMAIL PROTECTED]
Phone: (+34) 916249772
Mobil: (+34) 655112743
Fax: (+34) 916249607
Skype: guillermo.villa
Website: www.guillermovilla.com

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Hierarchical partitioning

2007-03-16 Thread vaniscotte
Hello,

I iterate my question concerning hierarchical partitioning: Is it 
possible to run that for a multinomial model?

I have adapted the function hier.part (hier.part package) to my 
multinomial model. I met a problem with the partition() function: I 
obtain NEGATIVE Independent contributions!
It seems to be a huge error!

Unfortunately, I cannot access the script as it calls a C code. 
Consequently this function is a black box for me.

Could anyone help me? I really need to investigate this independent 
contributions!
Thanks a lot,

Amélie Vaniscotte (Phd)

Department of Environmental Biology
UsC INRA EA3184MRT
Institute for Environmental Sciences and Technology

University of Franche-comté
Place Leclerc, 25030 Besançon cedex
FRANCE
Tel. :   +33 (0)381 665 714
Fax :  +33 (0)381 665 797
[EMAIL PROTECTED]: [EMAIL PROTECTED]
http://lbe.univ-fcomte.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Implementing trees in R

2007-03-16 Thread Gabor Grothendieck
Let me rephrase that.  Lists do not support references but they
could be used to represent trees.

list(a = list(a = 1, b = list(2, 3, d = list(4, 5)), c = list(4, 5))

is a tree whose top nodes are a, b, c and b contains three nodes
2, 3 and d and d contains 2 nodes.

However, if you want to do it via references as requested then lists
are not appropriate.

On 3/16/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 Lists are not good for this.  There is an example in section 3.3 of
 the proto vignette of using proto objects for this.  That section
 also references an S4 example although its pretty messy with S4.

 You might want to look at the graph, RBGL and graphviz packages
 in Bioconductor and the dynamicgraph, mathgraph and sna packages
 on CRAN.

 On 3/16/07, Yuk Lap Yip (Kevin) [EMAIL PROTECTED] wrote:
  Hi all,
 
 I am rather new to R. Recently I have been trying to implement some
  tree algorithms in R. I used lists to model tree nodes. I thought
  something like this would work:
 
 parent - list();
 child - list();
 parent$child1 - child;
 child$parent - parent;
 
 When I tried to check whether a node is its parent's first child
  using if (node$parent$child1 == node), it always returned false. Then
  I realized that it does not really work because parent$child1 - child
  actually makes a copy of child instead of referencing it. I think one
  possible fix is to keep a list of node objects, and make references
  using the positions in the list. For example, I think the following
  would work:
 
 parent - list();
 child - list();
 nodes - list(parent, child);
 parent$child1 - 2;
 child$parent - 1;
 
 Then the first child test can be rewritten as if
  (nodes[[nodes[[nodeId]]$parent]]$child1 == nodeId). However, I would
  prefer not to implement trees in this way, as it requires the
  inconvenient and error-prone manipulations of node IDs.
 
 May I know if there is a way to make object references to lists? Or
  are there other ways to implement tree data structures in R?
 
 BTW, I checked how hclust was implemented, and noticed that it calls
  an external Fortran program. I would want a solution not involving any
  external programs.
 
 Thanks.
 
  --
 
 
 God bless.
 
 Kevin
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Implementing trees in R

2007-03-16 Thread Yuk Lap Yip (Kevin)
Hi Gabor,

Thanks for the suggestions. I tried to look for the proto vignette 
document but could not find it, could you tell me how to reach it?

Besides, is it possible to define my own node object type with a 
default behavior for the - operator of its member variables being 
referencing rather than copying? Any good reference material/ similar 
code examples?

Thanks.

Gabor Grothendieck wrote:
 Lists are not good for this.  There is an example in section 3.3 of
 the proto vignette of using proto objects for this.  That section
 also references an S4 example although its pretty messy with S4.

 You might want to look at the graph, RBGL and graphviz packages
 in Bioconductor and the dynamicgraph, mathgraph and sna packages
 on CRAN.

 On 3/16/07, Yuk Lap Yip (Kevin) [EMAIL PROTECTED] wrote:
 Hi all,

I am rather new to R. Recently I have been trying to implement some
 tree algorithms in R. I used lists to model tree nodes. I thought
 something like this would work:

parent - list();
child - list();
parent$child1 - child;
child$parent - parent;

When I tried to check whether a node is its parent's first child
 using if (node$parent$child1 == node), it always returned false. Then
 I realized that it does not really work because parent$child1 - child
 actually makes a copy of child instead of referencing it. I think one
 possible fix is to keep a list of node objects, and make references
 using the positions in the list. For example, I think the following
 would work:

parent - list();
child - list();
nodes - list(parent, child);
parent$child1 - 2;
child$parent - 1;

Then the first child test can be rewritten as if
 (nodes[[nodes[[nodeId]]$parent]]$child1 == nodeId). However, I would
 prefer not to implement trees in this way, as it requires the
 inconvenient and error-prone manipulations of node IDs.

May I know if there is a way to make object references to lists? Or
 are there other ways to implement tree data structures in R?

BTW, I checked how hclust was implemented, and noticed that it calls
 an external Fortran program. I would want a solution not involving any
 external programs.

Thanks.

 -- 


God bless.

Kevin

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 


God bless.

Kevin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ideas to speed up code: converting a matrix of integers to a matrix of normally distributed values

2007-03-16 Thread Matthew Keller
Hi all,

[this is a bit hard to describe, so if my initial description is
confusing, please try running my code below]

#WHAT I'M TRYING TO DO
I'd appreciate any help in trying to speed up some code. I've written
a script that converts a matrix of integers (usually between 1-10,000
- these represent allele names) into two new matrices of normally
distributed values (representing genetic effects), such that a given
number in the integer (allele name) matrix always corresponds to the
*same* randomly distributed (genetic) effects.

For example, every time my function sees the allele name 3782, it
converts that integer into the same two effects (e.g., -.372  1.727),
which follow normal distributions (these effects can be correlated;
below I've set their correlation to .5). I have an entire matrix of
integers, and am converting those into two entire matrices of effects.


#HOW I'VE DONE IT SO FAR
To get the correlations between the effects, I've used the mvrnorm
function from MASS

To convert the allele names to genetic effects, I've created a
function (make.effect) that resets the set.seed() to the allele name
each time its called.

To get the matrix of genetic effects, I use sapply.


#THE PROBLEM
The problem is that I often need to convert matrices that have 500K
cells, and do this over several hundred iterations, so it is quite
slow. If anyone has ideas to speed this up (e.g., some clever way to
convert a matrix of integers to a matrix of effects without using
sapply), I would be forever indebted.


##MY CODE

library(MASS)

##run this example to see what I'm talking about above

make.effects - function(x,mn=0,var=1,cov=.5){
  set.seed(x)
  
return(mvrnorm(1,mu=c(mn,mn),Sigma=matrix(c(var,cov,cov,var),nrow=2),empirical=FALSE))}

(alleles - 
matrix(c(5400,3309,2080,1080,2080,,389,9362,6372,3787,2798,1009),ncol=4))

a12 - array(sapply(alleles,make.effects),dim=c(2,nrow(alleles),ncol(alleles)))
(a1 - a12[1,,])
(a2 - a12[2,,])

#I've set the population correlation = .5; empirical corr=.4635
cor(as.vector(a1),as.vector(a2))

##run this to see that the code begins to get pretty slow with even a
3000x4 matrix

system.time({

alleles - 
matrix(rep(c(5400,3309,2080,1080,2080,,389,9362,6372,3787,2798,1009),1000),ncol=4)

a12 - array(sapply(alleles,make.effects),dim=c(2,nrow(alleles),ncol(alleles)))
a1 - a12[1,,]
a2 - a12[2,,]

})




-- 
Matthew C Keller
Postdoctoral Fellow
Virginia Institute for Psychiatric and Behavioral Genetics

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bad points in regression

2007-03-16 Thread Bert Gunter
(mount soapbox...)

While I know the prior discussion represents common practice, I would argue
-- perhaps even plead -- that the modern(?? 30 years old now) alternative
of robust/resistant estimation be used, especially in the readily available
situation of least-squares regression. RSiteSearch(Robust) will bring up
numerous possibilities.rrcov and robustbase are at least two packages
devoted to this, but the functionality is available in many others (e.g.
rlm() in MASS).

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374





-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ted Harding
Sent: Friday, March 16, 2007 6:44 AM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] Bad points in regression

On 16-Mar-07 12:41:50, Alberto Monteiro wrote:
 Ted Harding wrote:
 
 alpha - 0.3
 beta - 0.4
 sigma - 0.5
 err - rnorm(100)
 err[15] - 5; err[25] - -4; err[50] - 10
 x - 1:100
 y - alpha + beta * x + sigma * err
 ll - lm(y ~ x)
 plot(ll)
 
 ll is the output of a linear model fiited by lm(), and so has
 several components (see ?lm in the section Value), one of
 which is residuals (which can be abbreviated to res).
 
 So, in the case of your example,
 
   which(abs(ll$res)2)
   15 25 50
 
 extracts the information you want (and the 2 was inspired by
 looking at the residuals plot from your plot(ll)).

 Ok, but how can I grab those points _in general_? What is the
 criterium that plot used to mark those points as bad points?

Ahh ... ! I see what you're after. OK, look at the plot method
for lm():

?plot.lm
  ## S3 method for class 'lm':
  plot(x, which = 1:4,
caption = c(Residuals vs Fitted, Normal Q-Q plot,
  Scale-Location plot, Cook's distance plot),
  panel = points,
  sub.caption = deparse(x$call), main = ,
  ask = prod(par(mfcol))  length(which)  dev.interactive(),
  ...,
  id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75)


where (see further down):

  id.n: number of points to be labelled in each plot, starting with
the most extreme.

and note, in the default parameter-values listing above:

  id.n = 3

Hence, the 3 most extreme points (according to the criterion being
plotted in each plot) are marked in each plot.

So, for instance3, try

  plot(ll,id.n=5)

and you will get points 10,15,25,28,50. And so on. But that
pre-supposes that you know how many points are exceptional.


What is meant by extremeis not stated in the help page ?plot.lm,
but can be identified by inspecting the code for plot.lm(), which
you can see by entering

  plot.lm

In your example, if you omit the line which assigns anomalous values
to err[15[, err[25] and err[50], then you are likely to observe that
different points get identified on different plots. For instance,
I just got the following results for the default id.n=3:

[1] Residuals vs Fitted:   41,53,59
[2] Standardised Residuals:41,53,59
[3] sqrt(Stand Res) vs Fitted: 41,53,59
[4] Cook's Distance:   59,96,97


There are several approaches (with somewhat different outcomes)
to identifying outliers. If you apply one of these, you will
probably get the identities of the points anyway.

Again in the context of your example (where in fact you
deliberately set 3 points to have exceptional errors, thus
coincidentally the same as the default value 3 of id.n),
you could try different values for id.n and inspect the graphs
to see whether a given value of id.n marks some points that
do not look exceptional relative to the mass of the other points.

So, the above plot(ll,id.n=5) gave me one point, 10 on the
residuals plot, which apparently belonged to the general
distribution of residuals.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 16-Mar-07   Time: 13:43:54
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fast lookup in ragged array

2007-03-16 Thread Peter McMahan
Hello,

I'm running an algorithm for graph structural cohesion that requires  
a depth-first search of subgraphs of a rather large network. The  
algorithm will necessarily be redundant in the subgraphs it recurses  
to, so to speed up the process I implemented a check at each subgraph  
to see if it's been searched already.

This algorithm is very slow, and takes days to complete on a graph  
with about 200 nodes. It seems that a significant portion of the  
computation time is spent looking up the current subgraph in the list  
of searched subgraphs to see if it is redundant, and I'm wondering if  
there's a faster way to do this.

Currently, the list of searched subgraphs is a list  
(`theseSearchedBranches`) of unnamed numeric vectors. I'm checking  
against the list using the following command:

 if(list(v) %in% theseSearchedBranches){
 cat(Branch already searched: passing.\n\n)
 return(NULL)
 }

v is a sorted numeric, with length anywhere between 3 and 200.  
Because theseSearchedBranches gets quite long as the algorithm  
progresses, the %in% command is taking maybe one or two seconds per  
lookup.

So to reiterate, I have a list() that looks something like this:

[[1]]
[1] 0 5 6 11 12 13 14 16 19 21 23 24 26 30 31 39 41
[18]56 72 75 76 85 95 105 110 134 158 159 165 186

[[2]]
[1] 0 5 6 11 12 13 14 16 19 21 23 24 26 30 31 39 41
[18]56 72 75 76 85 95 105 110 134 147 159 165 186

[[3]]
[1] 0 5 6 11 12 13 14 16 19 21 23 24 26 30 31 39 41
[18]50 56 72 75 85 95 105 110 134 147 158 159 165 186

...
and so on for tens of thousands of entries, and I am trying to find  
some sort of fast equivalent for %in% to search it. I'm also not  
adding the vectors to the list in any particular order, as I don't  
think %in% would know how to take advantage of that anyway.

Is there a data structure other than list() that I can use that would  
be faster? Would it be better to just create a hashed env and add  
empty variables named 0.5.6.11.12...?
I know there are fast lookup algorithms out there that could take  
advantage of the fact that the items being searched are indiviually  
ordered numeric vectors, but I can't find any info about R  
implementations on the mailing lists or help. Is there any data type  
that implements a b-tree type of lookup scheme?

Any help would be greatly appreciated.

Thanks,
Peter

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to...

2007-03-16 Thread [EMAIL PROTECTED]
for example:
I have got these data, organized in a dataframe. 

sample1 sample2 sample3 sample4 group
replicate1  1.000.020.350.50A
replicate2  1.000.021.541.11A
replicate3  1.000.021.541.11A
replicate4  1.000.021.541.11A
replicate5  1.000.100.180.72B
replicate6  1000.00 0.750.867.26B
replicate7  1000.00 0.750.180.36B
replicate8  1000.00 0.7512.09   0.74B
replicate9  1000.00 0.7512.09   0.84C
replicate10 1000.00 0.980.650.50C
replicate11 2.006.006.002.00C
replicate12 6.006.002.006.00C


the first four columns represent the diffent sample I have to test with 
ANOVA.the last column is related to the group of each entry. Using aov() I 
can run a test on each column. but I would like to run the ANOVAs for each 
colum (that in my case are hundreds) in an automated way. I can't set up a 
working script with the loop in this case, surely because of my scarce 
knowledge in programming. can you help me? 
the next problem is how to collect the results in a simple way. for example 
having them organized in a table such as


SAMPLE ANOVA
sample1 ok
sample2 ok
sample3 not significant



thank you so much


-- Initial Header ---

From  : Petr Pikal [EMAIL PROTECTED]
To  : [EMAIL PROTECTED] [EMAIL PROTECTED],R Help 
R-help@stat.math.ethz.ch
Cc  : 
Date  : Thu, 15 Mar 2007 20:38:25 +0100
Subject : Re: [R] how to...







 Hi
 
 I suppose you will not get usefull response for such poorly specified 
 question. 
 
 For automating procedures on data frames you can either do looping or 
 use lapply or maybe do.call can also provide some functionality.
 
 If you elaborate what you did and in what respect it was 
 unsatisfactory maybe you will get better answer.
 
 Anyway, before your next post you shall look to posting guide.
 
 Regards
 Petr
 
 
 
 On 15 Mar 2007 at 17:20, [EMAIL PROTECTED] wrote:
 
 Date sent:Thu, 15 Mar 2007 17:20:57 +0100
 From: [EMAIL PROTECTED] [EMAIL PROTECTED]
 To:   R Help R-help@stat.math.ethz.ch
 Subject:  [R] how to...
 
  I have to perform ANOVA's on many different data organized in a
  dataframe. I can run an ANOVA for each sample, but I've got hundreds
  of data and I would like to avoid manually carrying out each test. in
  addition, I would like to have the results organized in a simple way,
  for example in a table, wich could be easy to export. thank you for
  assistance
  
  simone 
  
  
  --
  Leggi GRATIS le tue mail con il telefonino i-mode  di Wind
  http://i-mode.wind.it
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html and provide commented,
  minimal, self-contained, reproducible code.
 
 Petr Pikal
 [EMAIL PROTECTED]
 
 -- Initial Header ---

From  : Petr Pikal [EMAIL PROTECTED]
To  : [EMAIL PROTECTED] [EMAIL PROTECTED],R Help 
R-help@stat.math.ethz.ch
Cc  : 
Date  : Thu, 15 Mar 2007 20:38:25 +0100
Subject : Re: [R] how to...







 Hi
 
 I suppose you will not get usefull response for such poorly specified 
 question. 
 
 For automating procedures on data frames you can either do looping or 
 use lapply or maybe do.call can also provide some functionality.
 
 If you elaborate what you did and in what respect it was 
 unsatisfactory maybe you will get better answer.
 
 Anyway, before your next post you shall look to posting guide.
 
 Regards
 Petr
 
 
 
 On 15 Mar 2007 at 17:20, [EMAIL PROTECTED] wrote:
 
 Date sent:Thu, 15 Mar 2007 17:20:57 +0100
 From: [EMAIL PROTECTED] [EMAIL PROTECTED]
 To:   R Help R-help@stat.math.ethz.ch
 Subject:  [R] how to...
 
  I have to perform ANOVA's on many different data organized in a
  dataframe. I can run an ANOVA for each sample, but I've got hundreds
  of data and I would like to avoid manually carrying out each test. in
  addition, I would like to have the results organized in a simple way,
  for example in a table, wich could be easy to export. thank you for
  assistance
  
  simone 
  
  
  --
  Leggi GRATIS le tue mail con il telefonino i-mode  di Wind
  http://i-mode.wind.it
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html and provide commented,
  minimal, self-contained, reproducible code.
 
 Petr Pikal
 [EMAIL PROTECTED]
 
 



Re: [R] Implementing trees in R

2007-03-16 Thread Gabor Grothendieck
1. Here is your example redone using proto:

library(proto)

parent - proto()
child - proto(a = 1)
parent$child1 - child
child$parent.env - parent

# also just for illustration lets change a

parent$child1$a # 1
child$a - 2
parent$child1$a # 2

2. To redefine $- use S3 or S4 but it can be done
in conjunction with proto like this:

# constructor
node - function(. = parent.frame(), ...)
   structure(proto(...), class = c(node, proto))

$-.node - function(this, s, value) {
if (s == .super)
parent.env(this) - value
if (is.function(value))
environment(value) - this
if (inherits(value, node))
parent.env(value) - this
this[[as.character(substitute(s))]] - value
this
}


p - node(a = 1)
p$child - node(b = 2)
p$child$parent.env()
p # same



On 3/16/07, Yuk Lap Yip (Kevin) [EMAIL PROTECTED] wrote:
 Hi Gabor,

Thanks for the suggestions. I tried to look for the proto vignette
 document but could not find it, could you tell me how to reach it?

Besides, is it possible to define my own node object type with a
 default behavior for the - operator of its member variables being
 referencing rather than copying? Any good reference material/ similar
 code examples?

Thanks.

 Gabor Grothendieck wrote:
  Lists are not good for this.  There is an example in section 3.3 of
  the proto vignette of using proto objects for this.  That section
  also references an S4 example although its pretty messy with S4.
 
  You might want to look at the graph, RBGL and graphviz packages
  in Bioconductor and the dynamicgraph, mathgraph and sna packages
  on CRAN.
 
  On 3/16/07, Yuk Lap Yip (Kevin) [EMAIL PROTECTED] wrote:
  Hi all,
 
 I am rather new to R. Recently I have been trying to implement some
  tree algorithms in R. I used lists to model tree nodes. I thought
  something like this would work:
 
 parent - list();
 child - list();
 parent$child1 - child;
 child$parent - parent;
 
 When I tried to check whether a node is its parent's first child
  using if (node$parent$child1 == node), it always returned false. Then
  I realized that it does not really work because parent$child1 - child
  actually makes a copy of child instead of referencing it. I think one
  possible fix is to keep a list of node objects, and make references
  using the positions in the list. For example, I think the following
  would work:
 
 parent - list();
 child - list();
 nodes - list(parent, child);
 parent$child1 - 2;
 child$parent - 1;
 
 Then the first child test can be rewritten as if
  (nodes[[nodes[[nodeId]]$parent]]$child1 == nodeId). However, I would
  prefer not to implement trees in this way, as it requires the
  inconvenient and error-prone manipulations of node IDs.
 
 May I know if there is a way to make object references to lists? Or
  are there other ways to implement tree data structures in R?
 
 BTW, I checked how hclust was implemented, and noticed that it calls
  an external Fortran program. I would want a solution not involving any
  external programs.
 
 Thanks.
 
  --
 
 
 God bless.
 
 Kevin
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 --


God bless.

Kevin



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Implementing trees in R

2007-03-16 Thread Gabor Grothendieck
On 3/16/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 1. Here is your example redone using proto:

 library(proto)

 parent - proto()
 child - proto(a = 1)
 parent$child1 - child
 child$parent.env - parent

This last line should have been:

parent.env(child) - parent



 # also just for illustration lets change a

 parent$child1$a # 1
 child$a - 2
 parent$child1$a # 2

 2. To redefine $- use S3 or S4 but it can be done
 in conjunction with proto like this:

 # constructor
 node - function(. = parent.frame(), ...)
   structure(proto(...), class = c(node, proto))

 $-.node - function(this, s, value) {
if (s == .super)
parent.env(this) - value
if (is.function(value))
environment(value) - this
if (inherits(value, node))
parent.env(value) - this
this[[as.character(substitute(s))]] - value
this
 }


 p - node(a = 1)
 p$child - node(b = 2)
 p$child$parent.env()
 p # same



 On 3/16/07, Yuk Lap Yip (Kevin) [EMAIL PROTECTED] wrote:
  Hi Gabor,
 
 Thanks for the suggestions. I tried to look for the proto vignette
  document but could not find it, could you tell me how to reach it?
 
 Besides, is it possible to define my own node object type with a
  default behavior for the - operator of its member variables being
  referencing rather than copying? Any good reference material/ similar
  code examples?
 
 Thanks.
 
  Gabor Grothendieck wrote:
   Lists are not good for this.  There is an example in section 3.3 of
   the proto vignette of using proto objects for this.  That section
   also references an S4 example although its pretty messy with S4.
  
   You might want to look at the graph, RBGL and graphviz packages
   in Bioconductor and the dynamicgraph, mathgraph and sna packages
   on CRAN.
  
   On 3/16/07, Yuk Lap Yip (Kevin) [EMAIL PROTECTED] wrote:
   Hi all,
  
  I am rather new to R. Recently I have been trying to implement some
   tree algorithms in R. I used lists to model tree nodes. I thought
   something like this would work:
  
  parent - list();
  child - list();
  parent$child1 - child;
  child$parent - parent;
  
  When I tried to check whether a node is its parent's first child
   using if (node$parent$child1 == node), it always returned false. Then
   I realized that it does not really work because parent$child1 - child
   actually makes a copy of child instead of referencing it. I think one
   possible fix is to keep a list of node objects, and make references
   using the positions in the list. For example, I think the following
   would work:
  
  parent - list();
  child - list();
  nodes - list(parent, child);
  parent$child1 - 2;
  child$parent - 1;
  
  Then the first child test can be rewritten as if
   (nodes[[nodes[[nodeId]]$parent]]$child1 == nodeId). However, I would
   prefer not to implement trees in this way, as it requires the
   inconvenient and error-prone manipulations of node IDs.
  
  May I know if there is a way to make object references to lists? Or
   are there other ways to implement tree data structures in R?
  
  BTW, I checked how hclust was implemented, and noticed that it calls
   an external Fortran program. I would want a solution not involving any
   external programs.
  
  Thanks.
  
   --
  
  
  God bless.
  
  Kevin
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
  --
 
 
 God bless.
 
 Kevin
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Duplicated non contiguous element in a list

2007-03-16 Thread Bruno C\.
Hello,

Given a vector I would like to rapidly identify duplicated non contiguous 
elements.
Given for example
c(1,1,2,   3,2,   4,   5,   6,4) 
I would like to get:
FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE

In fact I need to check this on the columns of a matrix!
I can do that of couse with loops but is there any function already available?

Thanks


--
Passa a Infostrada. ADSL e Telefono senza limiti e senza canone Telecom
http://infostrada.it

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] log(1+exp(x)) function

2007-03-16 Thread Feriel Lahlali
Hello,

I would like to know wether R implements efficient approximations of the 
functions log(1+exp(x)) and log(1-exp(x)). It seems the case for the latter one 
but I'd like to make sure.

Thank you in advance.

Feriel Lahlali
Graduate student - France.


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast lookup in ragged array

2007-03-16 Thread Peter McMahan
Well, I hadn't ever seen RBGL before, so that's great. I've been  
using igraph and sna mainly, but there are a few points lacking  
between these two. RBGL solves a lot of problems for me!

But I'm not sure it will solve this specific problem. Are you  
suggesting I use RBGL to do a depth-first search of all the  
subgraphs? For this particular depth-first search I'm not searching  
every subgraph, but just those that are constructed from a minimal  
cutset of the parent subgraph. At each level of the search, I have to  
compute graph cohesion (vertex connectivity), which can take  
considerable time. A lot of computation time is saved by only  
searching subgraphs obtained through cutsets. So a complete search of  
all the subgraphs won't work, but the redundancy I come across is I  
think unavoidable.

The particular algorithm I'm trying to implement is Moody and White's  
cohesive blocking, in which the end result is a nested set of all  
subgraphs with a higher cohesion (connectivity) than their parents.  
(see http://www.santafe.edu/research/publications/workingpapers/ 
00-08-049.pdf )

On Mar 16, 2007, at 11:00 AM, Robert Gentleman wrote:


 why not just use the graph package and RBGL at www.bioconductor.org


 Peter McMahan wrote:

 Hello,
 I'm running an algorithm for graph structural cohesion that  
 requires  a depth-first search of subgraphs of a rather large  
 network. The  algorithm will necessarily be redundant in the  
 subgraphs it recurses  to, so to speed up the process I  
 implemented a check at each subgraph  to see if it's been searched  
 already.
 This algorithm is very slow, and takes days to complete on a  
 graph  with about 200 nodes. It seems that a significant portion  
 of the  computation time is spent looking up the current subgraph  
 in the list  of searched subgraphs to see if it is redundant, and  
 I'm wondering if  there's a faster way to do this.
 Currently, the list of searched subgraphs is a list   
 (`theseSearchedBranches`) of unnamed numeric vectors. I'm  
 checking  against the list using the following command:
  if(list(v) %in% theseSearchedBranches){
  cat(Branch already searched: passing.\n\n)
  return(NULL)
  }
 v is a sorted numeric, with length anywhere between 3 and 200.   
 Because theseSearchedBranches gets quite long as the algorithm   
 progresses, the %in% command is taking maybe one or two seconds  
 per  lookup.
 So to reiterate, I have a list() that looks something like this:
 [[1]]
 [1] 0 5 6 11 12 13 14 16 19 21 23 24 26 30 31 39 41
 [18]56 72 75 76 85 95 105 110 134 158 159 165 186
 [[2]]
 [1] 0 5 6 11 12 13 14 16 19 21 23 24 26 30 31 39 41
 [18]56 72 75 76 85 95 105 110 134 147 159 165 186
 [[3]]
 [1] 0 5 6 11 12 13 14 16 19 21 23 24 26 30 31 39 41
 [18]50 56 72 75 85 95 105 110 134 147 158 159 165 186
 ...
 and so on for tens of thousands of entries, and I am trying to  
 find  some sort of fast equivalent for %in% to search it. I'm also  
 not  adding the vectors to the list in any particular order, as I  
 don't  think %in% would know how to take advantage of that anyway.
 Is there a data structure other than list() that I can use that  
 would  be faster? Would it be better to just create a hashed env  
 and add  empty variables named 0.5.6.11.12...?
 I know there are fast lookup algorithms out there that could take   
 advantage of the fact that the items being searched are  
 indiviually  ordered numeric vectors, but I can't find any info  
 about R  implementations on the mailing lists or help. Is there  
 any data type  that implements a b-tree type of lookup scheme?
 Any help would be greatly appreciated.
 Thanks,
 Peter
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


 -- 
 Robert Gentleman, PhD
 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M2-B876
 PO Box 19024
 Seattle, Washington 98109-1024
 206-667-7700
 [EMAIL PROTECTED]


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] log(1+exp(x)) function

2007-03-16 Thread Feriel Lahlali
Hello,

I would like to know wether R implements efficient approximations of the 
functions log(1+exp(x)) and log(1-exp(x)). It seems the case for the latter one 
but I'd like to make sure.

Thank you in advance.

Feriel Lahlali
Graduate student - France. 

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MANOVA permutation testing

2007-03-16 Thread Gavin Simpson
On Fri, 2007-03-16 at 00:50 +, Tyler Smith wrote:
 Hi,
 
 I've got a dataset with 7 variables for 8 different species. I'd like
 to test the null hypothesis of no difference among species for these
 variables. MANOVA seems like the appropriate test, but since I'm
 unsure of how well the data fit the assumptions of equal
 variance/covariance and multivariate normality, I want to use a
 permutation test. 
 
 I've been through CRAN looking at packages boot, bootstrap, coin,
 permtest, but they all seem to be doing more than I need. Is the
 following code an appropriate way to test my hypothesis:
 
 result.vect - c()
 
 for (i in 1:1000){
   wilks - summary.manova(manova(maxent~sample(max.spec)),
test=Wilks)$stats[1,2]
   result.vect - c(res.vect,wilks)
 }
 
 maxent is the data, max.spec is a vector of species names. Comparing
 the result.vect with the wilks value for the unpermuted data suggests
 there are very significant differences among species -- but did I do
 this properly?
 

Hi Tyler,

(without knowing more about your data) I think you are almost there, but
the R code can be made much more efficient.

When you create your result vector, is is of length 0. Each time you add
a result, R has to copy the current result object, enlarge it and so on.
This all takes a lot of time. Better to allocate storage first, then add
each result in turn be replacement. E.g.:

Using an example from ?summary.manova

tear - c(6.5, 6.2, 5.8, 6.5, 6.5, 6.9, 7.2, 6.9, 6.1, 6.3,
   6.7, 6.6, 7.2, 7.1, 6.8, 7.1, 7.0, 7.2, 7.5, 7.6)
gloss - c(9.5, 9.9, 9.6, 9.6, 9.2, 9.1, 10.0, 9.9, 9.5, 9.4,
9.1, 9.3, 8.3, 8.4, 8.5, 9.2, 8.8, 9.7, 10.1, 9.2)
opacity - c(4.4, 6.4, 3.0, 4.1, 0.8, 5.7, 2.0, 3.9, 1.9, 5.7,
  2.8, 4.1, 3.8, 1.6, 3.4, 8.4, 5.2, 6.9, 2.7, 1.9)
Y - cbind(tear, gloss, opacity)
rate - factor(gl(2,10), labels=c(Low, High))

## define number of permutations
nperm - 999
## allocate storage, here we want 999 + 1 for our observed stat
res - numeric(nperm+1)
## do the loop - the seq(along ...) bit avoids certain problems
for(i in seq(along = res[-1])) {
## here we replace the ith value in the vector res with the stat
res[i] - summary(manova(Y ~ sample(rate)), 
  test = Wilks)$stats[1,2]
}
## now we append the observed stat onto the end of the result vector res
## we also store this in 'obs' for convenience
res[nperm+1] - obs - summary(manova(Y ~ rate), test =  
   Wilks)$stats[1,2]

## this is the permutation p-value - the proportion of the nperm
## permutations + 1 that are  greater than or equal to the 
## observed stat 'obs'
sum(res = obs) / (nperm+1)

HTH,

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Duplicated non contiguous element in a list

2007-03-16 Thread Peter McMahan
Maybe not the most efficient, but here's a vector-level solution:

non.contig.dupes - function(x){
 is.dup - x %in% x[duplicated(x)]
 is.con - c(x[-length(x)]==x[-1],F) | c(F,x[-length(x)]==x[-1])
 is.dup  !is.con}


On Mar 16, 2007, at 11:14 AM, Bruno C.. wrote:


 Hello,

 Given a vector I would like to rapidly identify duplicated non  
 contiguous elements.
 Given for example
 c(1,1,2,   3,2,   4,   5,   6,4)
 I would like to get:
 FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE

 In fact I need to check this on the columns of a matrix!
 I can do that of couse with loops but is there any function already  
 available?

 Thanks


 --
 Passa a Infostrada. ADSL e Telefono senza limiti e senza canone  
 Telecom
 http://infostrada.it

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ideas to speed up code: converting a matrix of integers to a matrix of normally distributed values

2007-03-16 Thread jim holtman
Considering that the vast majority of your time is spent in the function
mvrnorm (on my system 5.7  out of 6.1 seconds).  In your example that is
12000 calls to the function.  To improve your speed you have to cut down the
number of calls to the function.  For example, how many unique integers do
you have and can to do the calls for those and then substitute matching
values.  Here is what Rprof showed:

  total.time total.pct self.time self.pct
system.time 6.12  99.7  0.00  0.0
as.vector   6.06  98.7  0.18  2.9
FUN 6.06  98.7  0.12  2.0
array   6.06  98.7  0.10  1.6
lapply  6.06  98.7  0.00  0.0
sapply  6.06  98.7  0.00  0.0
eval6.04  98.4  0.06  1.0
mvrnorm 5.72  93.2  0.34  5.5
eigen   2.58  42.0  0.52  8.5

or another way of looking at it:

  0   6.1 root
  1.6.1 system.time
  2. .6.0 eval
  3. . .6.0 eval
  4. . . .6.0 array
  5. . . . .6.0 as.vector
  6. . . . . .6.0 sapply
  7. . . . . . .6.0 lapply
  8. . . . . . . .6.0 FUN
  9. . . . . . . . .5.7 mvrnorm
 10. . . . . . . . . .2.6 eigen
 11. . . . . . . . . . .1.2 sort.list
 12. . . . . . . . . . . .1.0 match.arg
 13. . . . . . . . . . . . .0.7 eval




On 3/16/07, Matthew Keller [EMAIL PROTECTED] wrote:

 Hi all,

 [this is a bit hard to describe, so if my initial description is
 confusing, please try running my code below]

 #WHAT I'M TRYING TO DO
 I'd appreciate any help in trying to speed up some code. I've written
 a script that converts a matrix of integers (usually between 1-10,000
 - these represent allele names) into two new matrices of normally
 distributed values (representing genetic effects), such that a given
 number in the integer (allele name) matrix always corresponds to the
 *same* randomly distributed (genetic) effects.

 For example, every time my function sees the allele name 3782, it
 converts that integer into the same two effects (e.g., -.372  1.727),
 which follow normal distributions (these effects can be correlated;
 below I've set their correlation to .5). I have an entire matrix of
 integers, and am converting those into two entire matrices of effects.


 #HOW I'VE DONE IT SO FAR
 To get the correlations between the effects, I've used the mvrnorm
 function from MASS

 To convert the allele names to genetic effects, I've created a
 function (make.effect) that resets the set.seed() to the allele name
 each time its called.

 To get the matrix of genetic effects, I use sapply.


 #THE PROBLEM
 The problem is that I often need to convert matrices that have 500K
 cells, and do this over several hundred iterations, so it is quite
 slow. If anyone has ideas to speed this up (e.g., some clever way to
 convert a matrix of integers to a matrix of effects without using
 sapply), I would be forever indebted.


 ##MY CODE

 library(MASS)

 ##run this example to see what I'm talking about above

 make.effects - function(x,mn=0,var=1,cov=.5){
 set.seed(x)

 return(mvrnorm(1,mu=c(mn,mn),Sigma=matrix(c(var,cov,cov,var),nrow=2),empirical=FALSE))}

 (alleles -
 matrix(c(5400,3309,2080,1080,2080,,389,9362,6372,3787,2798,1009),ncol=4))

 a12 - array(sapply(alleles,make.effects
 ),dim=c(2,nrow(alleles),ncol(alleles)))
 (a1 - a12[1,,])
 (a2 - a12[2,,])

 #I've set the population correlation = .5; empirical corr=.4635
 cor(as.vector(a1),as.vector(a2))

 ##run this to see that the code begins to get pretty slow with even a
 3000x4 matrix

 system.time({

 alleles -
 matrix(rep(c(5400,3309,2080,1080,2080,,389,9362,6372,3787,2798,1009),1000),ncol=4)

 a12 - array(sapply(alleles,make.effects
 ),dim=c(2,nrow(alleles),ncol(alleles)))
 a1 - a12[1,,]
 a2 - a12[2,,]

 })




 --
 Matthew C Keller
 Postdoctoral Fellow
 Virginia Institute for Psychiatric and Behavioral Genetics

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Create coef.table as in summary.glm (with Signif. codes)

2007-03-16 Thread Benjamin Dickgiesser
Hi,

how do I create a coef.table as for example returned by summary.glm?
I don't see where the significance codes are assigned during summary.glm.

Thank you,
Benjamin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bad points in regression [Broadcast]

2007-03-16 Thread Liaw, Andy
(My turn on the soapbox ...)

I'd like to add a bit of caveat to Bert's view.  I'd argue (perhaps even
plead) that robust/resistant procedures be used with care.  They should
not be used as a shortcut to avoid careful analysis of data.  I recalled
that in my first course on regression, the professor made it clear that
we're fitting models to data, not the other way around.  When the model
fits badly to (some of the) the data,  do examine and think carefully
about what happened.  Verify that bad data are indeed bad, instead of
using statistical criteria to make that judgment.  A scientific
colleague reminded me of this point when I tried to sell him the idea of
robust/resistant methods:  Don't use these methods as a crutch to stand
on badly run experiments (or poorly fitted models).

Cheers,
Andy

From: Bert Gunter
 
 (mount soapbox...)
 
 While I know the prior discussion represents common practice, 
 I would argue
 -- perhaps even plead -- that the modern(?? 30 years old 
 now) alternative of robust/resistant estimation be used, 
 especially in the readily available situation of 
 least-squares regression. RSiteSearch(Robust) will bring up 
 numerous possibilities.rrcov and robustbase are at least two 
 packages devoted to this, but the functionality is available 
 in many others (e.g.
 rlm() in MASS).
 
 Bert Gunter
 Genentech Nonclinical Statistics
 South San Francisco, CA 94404
 650-467-7374
 
 
 
 
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Ted Harding
 Sent: Friday, March 16, 2007 6:44 AM
 To: r-help@stat.math.ethz.ch
 Subject: Re: [R] Bad points in regression
 
 On 16-Mar-07 12:41:50, Alberto Monteiro wrote:
  Ted Harding wrote:
  
  alpha - 0.3
  beta - 0.4
  sigma - 0.5
  err - rnorm(100)
  err[15] - 5; err[25] - -4; err[50] - 10 x - 1:100 y 
 - alpha + 
  beta * x + sigma * err ll - lm(y ~ x)
  plot(ll)
  
  ll is the output of a linear model fiited by lm(), and so 
 has several 
  components (see ?lm in the section Value), one of which is 
  residuals (which can be abbreviated to res).
  
  So, in the case of your example,
  
which(abs(ll$res)2)
15 25 50
  
  extracts the information you want (and the 2 was inspired by 
  looking at the residuals plot from your plot(ll)).
 
  Ok, but how can I grab those points _in general_? What is the 
  criterium that plot used to mark those points as bad points?
 
 Ahh ... ! I see what you're after. OK, look at the plot 
 method for lm():
 
 ?plot.lm
   ## S3 method for class 'lm':
   plot(x, which = 1:4,
 caption = c(Residuals vs Fitted, Normal Q-Q plot,
   Scale-Location plot, Cook's distance plot),
   panel = points,
   sub.caption = deparse(x$call), main = ,
   ask = prod(par(mfcol))  length(which)  dev.interactive(),
   ...,
   id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75)
 
 
 where (see further down):
 
   id.n: number of points to be labelled in each plot, starting with
 the most extreme.
 
 and note, in the default parameter-values listing above:
 
   id.n = 3
 
 Hence, the 3 most extreme points (according to the criterion 
 being plotted in each plot) are marked in each plot.
 
 So, for instance3, try
 
   plot(ll,id.n=5)
 
 and you will get points 10,15,25,28,50. And so on. But that 
 pre-supposes that you know how many points are exceptional.
 
 
 What is meant by extremeis not stated in the help page 
 ?plot.lm, but can be identified by inspecting the code for 
 plot.lm(), which you can see by entering
 
   plot.lm
 
 In your example, if you omit the line which assigns anomalous 
 values to err[15[, err[25] and err[50], then you are likely 
 to observe that different points get identified on different 
 plots. For instance, I just got the following results for the 
 default id.n=3:
 
 [1] Residuals vs Fitted:   41,53,59
 [2] Standardised Residuals:41,53,59
 [3] sqrt(Stand Res) vs Fitted: 41,53,59
 [4] Cook's Distance:   59,96,97
 
 
 There are several approaches (with somewhat different 
 outcomes) to identifying outliers. If you apply one of 
 these, you will probably get the identities of the points anyway.
 
 Again in the context of your example (where in fact you 
 deliberately set 3 points to have exceptional errors, thus 
 coincidentally the same as the default value 3 of id.n), you 
 could try different values for id.n and inspect the graphs to 
 see whether a given value of id.n marks some points that do 
 not look exceptional relative to the mass of the other points.
 
 So, the above plot(ll,id.n=5) gave me one point, 10 on the 
 residuals plot, which apparently belonged to the general 
 distribution of residuals.
 
 Hoping this helps,
 Ted.
 
 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 094 0861
 Date: 16-Mar-07   Time: 13:43:54
 -- XFMail 

[R] ERROR: 'latex' needed but missing on your system.

2007-03-16 Thread Joel J. Adamson
After successfully building R on Slackware Linux v11.0 I went to make
the documentation; the texi files went fine and then I hopefully issued

make dvi

after having gotten the warning to the effect of You cannot build the
DVI or PDF manuals during compilation.  And, as expected I got the
error

ERROR: 'latex' needed but missing on your system.

The problem is that latex is on my system and is in root's path:

/usr/src/R-2.4.1 Super-User  echo $PATH
/usr/share/texmf/bin/:/opt/kde/bin/:/uCsr/local/stata{sic}:/usr/local/sbin:/usr/local/bin:/sbin:/usr/sbin:/bin:/usr/bin

I can issue latex from the command line as root (su'd to root, that
is) and it will run successfully.  Also, whereis latex turns up
empty.

I did not have this problem on PCLinuxOS 0.93a.

Thanks for any suggestions,
Joel
-- 
Joel J. Adamson
Biostatistician
Pediatric Psychopharmacology Research Unit
Massachusetts General Hospital
Boston, MA  02114
(617) 643-1432
(303) 880-3109





The information transmitted in this electronic communication is intended only 
for the person or entity to whom it is addressed and may contain confidential 
and/or privileged material. Any review, retransmission, dissemination or other 
use of or taking of any action in reliance upon this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
information in error, please contact the Compliance HelpLine at 800-856-1983 
and properly dispose of this information.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create coef.table as in summary.glm (with Signif. codes)

2007-03-16 Thread Benjamin Dickgiesser
I have found the function printCoefmat().
Thank you,
Benjamin

On 3/16/07, Benjamin Dickgiesser [EMAIL PROTECTED] wrote:
 Hi,

 how do I create a coef.table as for example returned by summary.glm?
 I don't see where the significance codes are assigned during summary.glm.

 Thank you,
 Benjamin


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cumsum over varying column lengths

2007-03-16 Thread Murali Menon

Folks,

I have a matrix of historicalReturns, where entry (i, j) is the daily return 
corresponding to date i and equity j. I also have a matrix startOffset, 
where entry (1, k) is the row offset in historicalReturns where I entered 
into equity k.


So we have that NCOL(startOffset) = NCOL(historicalReturns).

Now I would like compute for each column in historicalReturns, the 
cumulative return 'returnsFromInception' for the equity starting from the 
startOffset date.


Is there a better way than as follows:


n - NROW(historicalReturns)
returnsFromInception - matrix(nrow = 1, ncol = 
NCOL(historicalInceptionDates))


for (i in 1 : NCOL(historicalReturns))
{
   cumReturn - cumsum(historicalReturns[startOffset[1, i] : n, i])
   returnsFromInception[1, i] - cumReturn[length(cumReturn)]
}

This works for me, but seems rather inelegant, and I don't like having to 
use a matrix for returnsFromInception and startOffset, where vectors would 
work.


Thanks for your help.

Murali

_
It’s tax season, make sure to follow these few simple tips

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ideas to speed up code: converting a matrix of integers to a matrix of normally distributed values

2007-03-16 Thread Matthew Keller
Hi,

Many thanks to Jim and Martin for their suggestions. Using your ideas,
I came up with a solution that indexes rather than uses sapply (and
therefore calling up mvrnorm separately for each cell in the matrix).
The trick is to create a key matrix once, and then to use the
match() function each time I need to take the results from the key
matrix and place them in the appropriate spots in an 'effects' matrix.
If anyone is interested, here is a solution which speeds up the
process by a factor of 200 (!) :

unique.allele.seq - 1:1

make.effects - function(allele.seq, seed, mn = 0, var=1, cov=.5) {
   set.seed(seed)
   return(mvrnorm(length(allele.seq),
mu=c(mn,mn),Sigma=matrix(c(var,cov,cov,var),nrow=2),
empirical=FALSE))}

effects.key - make.effects(unique.allele.seq, 123)

(alleles - 
matrix(c(15,3309,2080,1080,2080,,389,9362,6372,3787,2798,1009),ncol=4))
(indx - match(alleles,key))

(a1 - matrix(effects.key[indx,1],ncol=ncol(alleles)))
(a2 - matrix(effects.key[indx,2],ncol=ncol(alleles)))

#to check timing
system.time({
alleles - 
matrix(rep(c(5400,3309,2080,1080,2080,,389,9362,6372,3787,2798,1009),1),ncol=4)
indx - match(alleles,key)

a1 - matrix(effects.key[indx,1],ncol=ncol(alleles))
a2 - matrix(effects.key[indx,2],ncol=ncol(alleles))})




On 3/16/07, jim holtman [EMAIL PROTECTED] wrote:
 Considering that the vast majority of your time is spent in the function
 mvrnorm (on my system 5.7  out of 6.1 seconds).  In your example that is
 12000 calls to the function.  To improve your speed you have to cut down the
 number of calls to the function.  For example, how many unique integers do
 you have and can to do the calls for those and then substitute matching
 values.  Here is what Rprof showed:

   total.time total.pct self.time self.pct
 system.time 6.12  99.7  0.00  0.0
 as.vector   6.06  98.7  0.18  2.9
 FUN 6.06  98.7  0.12  2.0
 array   6.06  98.7  0.10  1.6
 lapply  6.06  98.7  0.00  0.0
 sapply  6.06  98.7  0.00  0.0
 eval6.04  98.4  0.06  1.0
 mvrnorm 5.72  93.2  0.34  5.5
 eigen   2.58  42.0  0.52  8.5

 or another way of looking at it:

   0   6.1 root
   1.6.1 system.time
   2. .6.0 eval
   3. . .6.0 eval
   4. . . .6.0 array
   5. . . . .6.0 as.vector
   6. . . . . .6.0 sapply
   7. . . . . . .6.0 lapply
   8. . . . . . . .6.0 FUN
   9. . . . . . . . .5.7 mvrnorm
  10. . . . . . . . . .2.6 eigen
  11. . . . . . . . . . .1.2 sort.list
  12. . . . . . . . . . . .1.0 match.arg
  13. . . . . . . . . . . . .0.7 eval




 On 3/16/07, Matthew Keller [EMAIL PROTECTED] wrote:
 
  Hi all,
 
  [this is a bit hard to describe, so if my initial description is
  confusing, please try running my code below]
 
  #WHAT I'M TRYING TO DO
  I'd appreciate any help in trying to speed up some code. I've written
  a script that converts a matrix of integers (usually between 1-10,000
  - these represent allele names) into two new matrices of normally
  distributed values (representing genetic effects), such that a given
  number in the integer (allele name) matrix always corresponds to the
  *same* randomly distributed (genetic) effects.
 
  For example, every time my function sees the allele name 3782, it
  converts that integer into the same two effects (e.g., -.372  1.727),
  which follow normal distributions (these effects can be correlated;
  below I've set their correlation to .5). I have an entire matrix of
  integers, and am converting those into two entire matrices of effects.
 
 
  #HOW I'VE DONE IT SO FAR
  To get the correlations between the effects, I've used the mvrnorm
  function from MASS
 
  To convert the allele names to genetic effects, I've created a
  function (make.effect) that resets the set.seed() to the allele name
  each time its called.
 
  To get the matrix of genetic effects, I use sapply.
 
 
  #THE PROBLEM
  The problem is that I often need to convert matrices that have 500K
  cells, and do this over several hundred iterations, so it is quite
  slow. If anyone has ideas to speed this up (e.g., some clever way to
  convert a matrix of integers to a matrix of effects without using
  sapply), I would be forever indebted.
 
 
  ##MY CODE
 
  library(MASS)
 
  ##run this example to see what I'm talking about above
 
  make.effects - function(x,mn=0,var=1,cov=.5){
  set.seed(x)
 
 return(mvrnorm(1,mu=c(mn,mn),Sigma=matrix(c(var,cov,cov,var),nrow=2),empirical=FALSE))}
 
  (alleles -
 matrix(c(5400,3309,2080,1080,2080,,389,9362,6372,3787,2798,1009),ncol=4))
 
  a12 -
 array(sapply(alleles,make.effects),dim=c(2,nrow(alleles),ncol(alleles)))
  (a1 - a12[1,,])
  (a2 - a12[2,,])
 
  #I've set the population correlation = .5; empirical corr=.4635
  cor(as.vector 

[R] online course - Using R for Basic Statistics

2007-03-16 Thread Peter Bruce
Dr. John Verzani will present his online course, Using R for Introductory 
Statistics April 6 - May 4 at statistics.com.  Participants can ask 
questions and exchange comments with Dr. Verzani via a private discussion 
board throughout the period.

This course covers the use of R to summarize and graph data, calculate 
confidence intervals, test hypotheses, assess goodness-of-fit, and perform 
linear regression.

John Verzani is a member of the faculty at the College of Staten Island of 
the City University of New York, and the author of Using R for 
Introductory Statistics (CRC Press), on which this course is based.  His 
research interests and publications are in the area of superprocesses.

There are no set hours when you must be online, and we estimate you will 
need about 10-15 hours per week.

Register:  http://www.statistics.com/content/courses/advanceddoe/

Peter Bruce
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast lookup in ragged array

2007-03-16 Thread Seth Falcon
Peter McMahan [EMAIL PROTECTED] writes:

 Well, I hadn't ever seen RBGL before, so that's great. I've been  
 using igraph and sna mainly, but there are a few points lacking  
 between these two. RBGL solves a lot of problems for me!

 But I'm not sure it will solve this specific problem. Are you  
 suggesting I use RBGL to do a depth-first search of all the  
 subgraphs? For this particular depth-first search I'm not searching  
 every subgraph, but just those that are constructed from a minimal  
 cutset of the parent subgraph. At each level of the search, I have to  
 compute graph cohesion (vertex connectivity), which can take  
 considerable time. A lot of computation time is saved by only  
 searching subgraphs obtained through cutsets. So a complete search of  
 all the subgraphs won't work, but the redundancy I come across is I  
 think unavoidable.

Perhaps you will need a combination of graph/RBGL and some custom
memoization code to keep track of which subgraphs have already been
searched.

Some suggestions on that front:

Don't use a list, use an environment.  

 searchedBranched = new.env(hash=TRUE, parent=emptyenv(), size=X)

where X is an estimate of the number of branches you will search.
Using an environment implies you will need unique character names for
each subgraph.  Do you have that?  If not, you could concatenate node
names.  For a 200 node graph, that should be ok.

Hope that helps some.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast lookup in ragged array

2007-03-16 Thread Peter McMahan
Thanks, I'll give it a try. does R have a limit on variable name  
length? Also, is it better to over-estimate or under-estimate the  
size parameter?
This won't be too hard to implement, either, as I'm already keeping  
the list in a specific environment so all the subprocesses can find  
the same one.

On Mar 16, 2007, at 1:37 PM, Seth Falcon wrote:

 Peter McMahan [EMAIL PROTECTED] writes:

 Well, I hadn't ever seen RBGL before, so that's great. I've been
 using igraph and sna mainly, but there are a few points lacking
 between these two. RBGL solves a lot of problems for me!

 But I'm not sure it will solve this specific problem. Are you
 suggesting I use RBGL to do a depth-first search of all the
 subgraphs? For this particular depth-first search I'm not searching
 every subgraph, but just those that are constructed from a minimal
 cutset of the parent subgraph. At each level of the search, I have to
 compute graph cohesion (vertex connectivity), which can take
 considerable time. A lot of computation time is saved by only
 searching subgraphs obtained through cutsets. So a complete search of
 all the subgraphs won't work, but the redundancy I come across is I
 think unavoidable.

 Perhaps you will need a combination of graph/RBGL and some custom
 memoization code to keep track of which subgraphs have already been
 searched.

 Some suggestions on that front:

 Don't use a list, use an environment.

  searchedBranched = new.env(hash=TRUE, parent=emptyenv(), size=X)

 where X is an estimate of the number of branches you will search.
 Using an environment implies you will need unique character names for
 each subgraph.  Do you have that?  If not, you could concatenate node
 names.  For a 200 node graph, that should be ok.

 Hope that helps some.

 + seth

 -- 
 Seth Falcon | Computational Biology | Fred Hutchinson Cancer  
 Research Center
 http://bioconductor.org

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast lookup in ragged array

2007-03-16 Thread jim holtman
I just tried it and there seems to be a limit of about 256 characters in a
variable name:

 # worked
 assign(paste(sample(letters,256,T), collapse=''),123,env=x)
# failed at 257 characters
 assign(paste(sample(letters,257,T), collapse=''),123,env=x)
Error in assign(paste(sample(letters, 257, T), collapse = ), 123, env = x)
:
symbol print-name too long



On 3/16/07, Peter McMahan [EMAIL PROTECTED] wrote:

 Thanks, I'll give it a try. does R have a limit on variable name
 length? Also, is it better to over-estimate or under-estimate the
 size parameter?
 This won't be too hard to implement, either, as I'm already keeping
 the list in a specific environment so all the subprocesses can find
 the same one.

 On Mar 16, 2007, at 1:37 PM, Seth Falcon wrote:

  Peter McMahan [EMAIL PROTECTED] writes:
 
  Well, I hadn't ever seen RBGL before, so that's great. I've been
  using igraph and sna mainly, but there are a few points lacking
  between these two. RBGL solves a lot of problems for me!
 
  But I'm not sure it will solve this specific problem. Are you
  suggesting I use RBGL to do a depth-first search of all the
  subgraphs? For this particular depth-first search I'm not searching
  every subgraph, but just those that are constructed from a minimal
  cutset of the parent subgraph. At each level of the search, I have to
  compute graph cohesion (vertex connectivity), which can take
  considerable time. A lot of computation time is saved by only
  searching subgraphs obtained through cutsets. So a complete search of
  all the subgraphs won't work, but the redundancy I come across is I
  think unavoidable.
 
  Perhaps you will need a combination of graph/RBGL and some custom
  memoization code to keep track of which subgraphs have already been
  searched.
 
  Some suggestions on that front:
 
  Don't use a list, use an environment.
 
   searchedBranched = new.env(hash=TRUE, parent=emptyenv(), size=X)
 
  where X is an estimate of the number of branches you will search.
  Using an environment implies you will need unique character names for
  each subgraph.  Do you have that?  If not, you could concatenate node
  names.  For a 200 node graph, that should be ok.
 
  Hope that helps some.
 
  + seth
 
  --
  Seth Falcon | Computational Biology | Fred Hutchinson Cancer
  Research Center
  http://bioconductor.org

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast lookup in ragged array

2007-03-16 Thread Seth Falcon
Peter McMahan [EMAIL PROTECTED] writes:

 Thanks, I'll give it a try. does R have a limit on variable name
 length?

If you are going to have very long names, you might be better off
computing a digest of some kind.  You could use the digest package to
compute an md5sum or the Ruuid package to generate a GUID.

 Also, is it better to over-estimate or under-estimate the
 size parameter?

The environment will grow as needed.  If you overestimate, you will
use more memory than you need to.  Whether this is a problem depends
if you have extra memory available.  Underestimating means that the
underlying hashtable will need to be resized and this has a
performance impact.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cumsum over varying column lengths

2007-03-16 Thread jim holtman
This might work for you:

returnsFromInception - sapply(1:NROW(historicalReturns), function(x){
cumsum(historicalReturns[startOffset[1, x] : n, x])
})



On 3/16/07, Murali Menon [EMAIL PROTECTED] wrote:

 Folks,

 I have a matrix of historicalReturns, where entry (i, j) is the daily
 return
 corresponding to date i and equity j. I also have a matrix startOffset,
 where entry (1, k) is the row offset in historicalReturns where I entered
 into equity k.

 So we have that NCOL(startOffset) = NCOL(historicalReturns).

 Now I would like compute for each column in historicalReturns, the
 cumulative return 'returnsFromInception' for the equity starting from the
 startOffset date.

 Is there a better way than as follows:


 n - NROW(historicalReturns)
 returnsFromInception - matrix(nrow = 1, ncol =
 NCOL(historicalInceptionDates))

 for (i in 1 : NCOL(historicalReturns))
 {
cumReturn - cumsum(historicalReturns[startOffset[1, i] : n, i])
returnsFromInception[1, i] - cumReturn[length(cumReturn)]
 }

 This works for me, but seems rather inelegant, and I don't like having to
 use a matrix for returnsFromInception and startOffset, where vectors would
 work.

 Thanks for your help.

 Murali

 _
 It's tax season, make sure to follow these few simple tips


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] online course - Using R for Basic Statistics

2007-03-16 Thread Peter Bruce
oops!  I gave the wrong link - the correct one is 
http://www.statistics.com/courses/Rstatistics/
pb




Dr. John Verzani will present his online course, Using R for Introductory 
Statistics April 6 - May 4 at statistics.com.  Participants can ask 
questions and exchange comments with Dr. Verzani via a private discussion 
board throughout the period.

This course covers the use of R to summarize and graph data, calculate 
confidence intervals, test hypotheses, assess goodness-of-fit, and perform 
linear regression.

John Verzani is a member of the faculty at the College of Staten Island of 
the City University of New York, and the author of Using R for 
Introductory Statistics (CRC Press), on which this course is based.  His 
research interests and publications are in the area of superprocesses.

There are no set hours when you must be online, and we estimate you will 
need about 10-15 hours per week.


Peter Bruce
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Duplicated non contiguous element in a list

2007-03-16 Thread Bruno C\.
Many thanks for the fast reply of Leeds
Here's all the anwer I got:

Here's the winner: Patrick Burns
with:
* any(duplicated(rle(x)$values))
followed by:

Peter McMahan:
non.contig.dupes - function(x){
is.dup - x %in% x[duplicated(x)]
is.con - c(x[-length(x)]==x[-1],F) | c(F,x[-length(x)]==x[-1])
is.dup  !is.con}


then mine (size was 6 just for test):
function (z) {zz-t(z);risp-matrix(FALSE,1,6);
for (i in c(1,2,3,4,5,6)){
if (
(zz[i]%in%zz[-c(max(0,i-1),i,min(i+1,6))]))

(
( (i1) (i6) ((zz[i]!=zz[i-1])|| (zz[i]!=zz[i+1]) ))
||
( (i==1)  (zz[i]!=zz[i+1]))
||
( (i==6)  (zz[i]!=zz[i-1]))
)
)
{
risp[1,i]=TRUE;
}
}
risp;
}

aex-equo with Mark Leeds whose script does not work for all the situations:
temp-diff(inmat)
temp-temp[temp != 0]
duplicated(temp)


Many thanks to All of You 


--
Passa a Infostrada. ADSL e Telefono senza limiti e senza canone Telecom
http://infostrada.it


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Probably simple function problem

2007-03-16 Thread John Kane
# I have a simple function problem. I thought that I
could write a function to modify a couple of vectors
but I am doing something wrong

#I have a standard cost vector called fuel and some
adjustments to the
#costs called adjusts.  The changes are completely
dependend on the length
#of the dataframe newdata  I then need to take the
modifed vectors and use
# them later. I need to do this several times and the
only change in the variables
# is the length of the data.frame.

# Can anyone suggest what I am doing wrong or am I
just misunderstanding what
# a function is supposed to do?

#Example:

adjusts - c(.50, .70, .29, .27 , .40 , .26 , 125)
coal - 1:6
newdata - 1:10

fuel.costing - function(fuel, utr, mydata) {
cppf - cppm - fuel ;
cppf[2] - fuel[2]*(1-utr[2])*length(mydata) + utr[7]*
utr[2]*utr[5] ;
cppf[4] - fuel[2]*(1-utr[4])*length(mydata) + utr[7]*
utr[4]*utr[6] ;
cppm[2] - fuel[2]*(1-utr[1])*length(mydata) ;
cppm[4] - fuel[2]*(1-utr[3])*length(mydata)
}

fuel.costing(coal, adjusts, newdata)


## original code for one place
cppf - cppm - coal ;
cppf[2] - coal[2]*(1-adjusts[2])*length(newdata) +
adjusts[7]* adjusts[2]*adjusts[5] ;
cppf[4] - coal[2]*(1-adjusts[4])*length(newdata) +
adjusts[7]* adjusts[4]*adjusts[6] ;
cppm[2] - coal[2]*(1-adjusts[1])*length(newdata) ;
cppm[4] - coal[2]*(1-adjusts[3])*length(newdata)

label(cppm) - cppm -  SW coal costs adjusted 
label (cppf) - cppf - WW coal costs adjusted 

# Any help or suggests would be greatly appreciated.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Discriminating between experiments with xyplot (lattice)

2007-03-16 Thread Christel u. Frank Sarfert
Hi, 

suppose I have data from 3 experiments which show conversion as a function of 
time for different boundary conditions, e.g. pressure, temperature. I want to 
plot all experiments as conversion over time grouped according to the 
temperature. However, since I have more than one experiment performed at the 
same temperature (but different pressures) I end up figuring out which curve 
belongs to which experiment. (An example with artificial data of the 
structure I use is given below. It shows three experiments where two 
experiments at temp = 250 but press = 1 and press = 0.5 are plotted within 
one group.)
My question is: Is there a way to identify which curve whithin the same group 
belongs to which experiment, e.g by plotting a label like the experiment 
number to the end point, or by choosing different symbols for the different 
experiments - while keeping the color encoding of the groups?

Your help is greatly appreciated!

Best regards 
  Frank


require(lattice)

## generating the data, I need
time - seq(0,50,1)
conversion - 2 * log(time + 1)
press - rep(1,51)
temp - rep(250,51)
experiment - rep(1, 51)
v1 = as.data.frame(cbind(experiment,time,conversion,press,temp))

conversion - 2.5 * log(time + 1)
press - rep(1, 51) 
temp - rep(270,51)
experiment - rep(2, 51)
v2 = as.data.frame(cbind(experiment,time,conversion,press,temp))

conversion - 1.25 * log(time + 1)
press - rep(0.5, 51) 
temp - rep(250,51)
experiment - rep(3, 51)
v3 = as.data.frame(cbind(experiment,time,conversion,press,temp))

d - rbind(v1,v2,v3)

## plotting
xyplot(conversion ~ time, data = d, groups = factor(temp), auto.key = T)

## result: the group for temp = 250 includes two experiments, which cannot be 
associated to the individual experiments 

-- 

C. u. F. Sarfert
Hauffstr. 8
70825 Korntal

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] scatterplot brushing

2007-03-16 Thread Richard Valliant
Is there a package (other than xgobi which requires an X server) that
will do scatterplot brushing?  I see a mention in the mail archive of
R-orca by Anthony Rossini but it is not in the current list of
packages.
 
My OS is Windows XP version 5.1, service pack 2
R version 2.4.1 (2006-12-18)
 
Thanks
 

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Discriminating between experiments with xyplot (lattice)

2007-03-16 Thread Deepayan Sarkar
On 3/16/07, Christel u. Frank Sarfert [EMAIL PROTECTED] wrote:
 Hi,

 suppose I have data from 3 experiments which show conversion as a function of
 time for different boundary conditions, e.g. pressure, temperature. I want to
 plot all experiments as conversion over time grouped according to the
 temperature. However, since I have more than one experiment performed at the
 same temperature (but different pressures) I end up figuring out which curve
 belongs to which experiment. (An example with artificial data of the
 structure I use is given below. It shows three experiments where two
 experiments at temp = 250 but press = 1 and press = 0.5 are plotted within
 one group.)
 My question is: Is there a way to identify which curve whithin the same group
 belongs to which experiment, e.g by plotting a label like the experiment
 number to the end point, or by choosing different symbols for the different
 experiments - while keeping the color encoding of the groups?

 Your help is greatly appreciated!

 Best regards
   Frank


 require(lattice)

 ## generating the data, I need
 time - seq(0,50,1)
 conversion - 2 * log(time + 1)
 press - rep(1,51)
 temp - rep(250,51)
 experiment - rep(1, 51)
 v1 = as.data.frame(cbind(experiment,time,conversion,press,temp))

 conversion - 2.5 * log(time + 1)
 press - rep(1, 51)
 temp - rep(270,51)
 experiment - rep(2, 51)
 v2 = as.data.frame(cbind(experiment,time,conversion,press,temp))

 conversion - 1.25 * log(time + 1)
 press - rep(0.5, 51)
 temp - rep(250,51)
 experiment - rep(3, 51)
 v3 = as.data.frame(cbind(experiment,time,conversion,press,temp))

 d - rbind(v1,v2,v3)

You want to use make.groups rather than rbind here, so that you retain
information on which experiment each row is coming from:

 dd - make.groups(v1,v2,v3)
 str(dd)
'data.frame':   153 obs. of  6 variables:
 $ experiment: num  1 1 1 1 1 1 1 1 1 1 ...
 $ time  : num  0 1 2 3 4 5 6 7 8 9 ...
 $ conversion: num  0.00 1.39 2.20 2.77 3.22 ...
 $ press : num  1 1 1 1 1 1 1 1 1 1 ...
 $ temp  : num  250 250 250 250 250 250 250 250 250 250 ...
 $ which : Factor w/ 3 levels v1,v2,v3: 1 1 1 1 1 1 1 1 1 1 ...

Now how you choose to plot this is up to you. One simple possibility
is to create a new factor encoding both experiment and temperature,
e.g.,

xyplot(conversion ~ time, data = dd,
  groups = (which:factor(temp))[,drop=TRUE],
  auto.key = T)

Another is to condition on one, e.g.,

xyplot(conversion ~ time | factor(temp), data = dd, groups = which,
auto.key = T)

It is possible to use one one variable for color and another for
plotting character, but the code involved would be somewhat less
elegant (mostly because the support functions are not already built
in). Let me know if you really want that; I can come up with some
sample code.

-Deepayan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Probably simple function problem

2007-03-16 Thread Jason Barnhart
You didn't specify the exact nature of the problem so I guess that you 
want it return the vector cppm.

Add return(cppm) as the final line in function.

Here's what I think you want; it replicates your original code.

 adjusts - c(.50, .70, .29, .27 , .40 , .26 , 125)
 coal - 1:6
 newdata - 1:10

 fuel.costing - function(fuel, utr, mydata) {
+ cppf - cppm - fuel ;
+ cppf[2] - fuel[2]*(1-utr[2])*length(mydata) + utr[7]*
+ utr[2]*utr[5] ;
+ cppf[4] - fuel[2]*(1-utr[4])*length(mydata) + utr[7]*
+ utr[4]*utr[6] ;
+ cppm[2] - fuel[2]*(1-utr[1])*length(mydata) ;
+ cppm[4] - fuel[2]*(1-utr[3])*length(mydata)
+ return(cppm)
+ }

 my.cppm - fuel.costing(coal, adjusts, newdata)
 my.cppm
[1]  1.0 10.0  3.0 14.2  5.0  6.0

HTH
-jason

- Original Message - 
From: John Kane [EMAIL PROTECTED]
To: R R-help r-help@stat.math.ethz.ch
Sent: Friday, March 16, 2007 2:00 PM
Subject: [R] Probably simple function problem


# I have a simple function problem. I thought that I
 could write a function to modify a couple of vectors
 but I am doing something wrong

 #I have a standard cost vector called fuel and some
 adjustments to the
 #costs called adjusts.  The changes are completely
 dependend on the length
 #of the dataframe newdata  I then need to take the
 modifed vectors and use
 # them later. I need to do this several times and the
 only change in the variables
 # is the length of the data.frame.

 # Can anyone suggest what I am doing wrong or am I
 just misunderstanding what
 # a function is supposed to do?

 #Example:

 adjusts - c(.50, .70, .29, .27 , .40 , .26 , 125)
 coal - 1:6
 newdata - 1:10

 fuel.costing - function(fuel, utr, mydata) {
 cppf - cppm - fuel ;
 cppf[2] - fuel[2]*(1-utr[2])*length(mydata) + utr[7]*
 utr[2]*utr[5] ;
 cppf[4] - fuel[2]*(1-utr[4])*length(mydata) + utr[7]*
 utr[4]*utr[6] ;
 cppm[2] - fuel[2]*(1-utr[1])*length(mydata) ;
 cppm[4] - fuel[2]*(1-utr[3])*length(mydata)
 }

 fuel.costing(coal, adjusts, newdata)


 ## original code for one place
 cppf - cppm - coal ;
 cppf[2] - coal[2]*(1-adjusts[2])*length(newdata) +
 adjusts[7]* adjusts[2]*adjusts[5] ;
 cppf[4] - coal[2]*(1-adjusts[4])*length(newdata) +
 adjusts[7]* adjusts[4]*adjusts[6] ;
 cppm[2] - coal[2]*(1-adjusts[1])*length(newdata) ;
 cppm[4] - coal[2]*(1-adjusts[3])*length(newdata)

 label(cppm) - cppm -  SW coal costs adjusted 
 label (cppf) - cppf - WW coal costs adjusted 

 # Any help or suggests would be greatly appreciated.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Rownames are always character?

2007-03-16 Thread Mike Prager
Gurus,

Can I rely on the rownames() function, when applied to a matrix,
always returning either NULL or an object of type character?  It
seems that row names can be entered as integers, but as of now
(R 2.4.1 on Windows) the function returns character vectors, not
numbers (which is my desired result).

I am using elements of the returned vector to index the matrix.
e.g.,

nams - rownames(mymat)
for (thisnam in nams) {
myvec - mymat[thisnam, ]
# ... more code ...
}


-- 
Mike Prager, NOAA, Beaufort, NC
* Opinions expressed are personal and not represented otherwise.
* Any use of tradenames does not constitute a NOAA endorsement.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Implementing trees in R

2007-03-16 Thread Yuk Lap Yip (Kevin)
Gabor,

Thanks. That helps a lot.

Gabor Grothendieck wrote:
 On 3/16/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 1. Here is your example redone using proto:

 library(proto)

 parent - proto()
 child - proto(a = 1)
 parent$child1 - child
 child$parent.env - parent

 This last line should have been:

 parent.env(child) - parent



 # also just for illustration lets change a

 parent$child1$a # 1
 child$a - 2
 parent$child1$a # 2

 2. To redefine $- use S3 or S4 but it can be done
 in conjunction with proto like this:

 # constructor
 node - function(. = parent.frame(), ...)
   structure(proto(...), class = c(node, proto))

 $-.node - function(this, s, value) {
if (s == .super)
parent.env(this) - value
if (is.function(value))
environment(value) - this
if (inherits(value, node))
parent.env(value) - this
this[[as.character(substitute(s))]] - value
this
 }


 p - node(a = 1)
 p$child - node(b = 2)
 p$child$parent.env()
 p # same



 On 3/16/07, Yuk Lap Yip (Kevin) [EMAIL PROTECTED] wrote:
  Hi Gabor,
 
 Thanks for the suggestions. I tried to look for the proto vignette
  document but could not find it, could you tell me how to reach it?
 
 Besides, is it possible to define my own node object type with a
  default behavior for the - operator of its member variables being
  referencing rather than copying? Any good reference material/ similar
  code examples?
 
 Thanks.
 
  Gabor Grothendieck wrote:
   Lists are not good for this.  There is an example in section 3.3 of
   the proto vignette of using proto objects for this.  That section
   also references an S4 example although its pretty messy with S4.
  
   You might want to look at the graph, RBGL and graphviz packages
   in Bioconductor and the dynamicgraph, mathgraph and sna packages
   on CRAN.
  
   On 3/16/07, Yuk Lap Yip (Kevin) [EMAIL PROTECTED] wrote:
   Hi all,
  
  I am rather new to R. Recently I have been trying to 
 implement some
   tree algorithms in R. I used lists to model tree nodes. I thought
   something like this would work:
  
  parent - list();
  child - list();
  parent$child1 - child;
  child$parent - parent;
  
  When I tried to check whether a node is its parent's first child
   using if (node$parent$child1 == node), it always returned 
 false. Then
   I realized that it does not really work because parent$child1 
 - child
   actually makes a copy of child instead of referencing it. I 
 think one
   possible fix is to keep a list of node objects, and make references
   using the positions in the list. For example, I think the following
   would work:
  
  parent - list();
  child - list();
  nodes - list(parent, child);
  parent$child1 - 2;
  child$parent - 1;
  
  Then the first child test can be rewritten as if
   (nodes[[nodes[[nodeId]]$parent]]$child1 == nodeId). However, I 
 would
   prefer not to implement trees in this way, as it requires the
   inconvenient and error-prone manipulations of node IDs.
  
  May I know if there is a way to make object references to 
 lists? Or
   are there other ways to implement tree data structures in R?
  
  BTW, I checked how hclust was implemented, and noticed that 
 it calls
   an external Fortran program. I would want a solution not 
 involving any
   external programs.
  
  Thanks.
  
   --
  
  
  God bless.
  
  Kevin
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
  --
 
 
 God bless.
 
 Kevin
 
 


-- 


God bless.

Kevin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rownames are always character?

2007-03-16 Thread Mike Prager
Mike Prager [EMAIL PROTECTED] wrote:

 Gurus,
 
 Can I rely on the rownames() function, when applied to a matrix,
 always returning either NULL or an object of type character?  It
 seems that row names can be entered as integers, but as of now
 (R 2.4.1 on Windows) the function returns character vectors, not
 numbers (which is my desired result).

(To clarify my point on this Friday afternoon:  the observed
behavior is my desired result.  I'm just asking, can I count on
it?)

 I am using elements of the returned vector to index the matrix.
 e.g.,
 
 nams - rownames(mymat)
 for (thisnam in nams) {
   myvec - mymat[thisnam, ]
   # ... more code ...
 }

-- 
Mike Prager, NOAA, Beaufort, NC
* Opinions expressed are personal and not represented otherwise.
* Any use of tradenames does not constitute a NOAA endorsement.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Revisiting multiple plots

2007-03-16 Thread Jonathan Wang
Suppose I create a multiple plot with zoo, using:

index - ISOdatetime(2004, rep(1:2, 5), sample(28, 10), 0, 0, 0)
foo - zoo(rnorm(10), index)
for (i in 1:9) {
  data - rnorm(10)
  z1 - zoo(data, index)
  foo - cbind(foo, z1)
}
plot(foo)

This creates 10 plots on one device, one for each column in foo.

Now I want to go back and use abline to draw a line at the mean on each of
my 10 plots. How do I select the appropriate set of axes to draw my line on?

Thanks,
Jonathan

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R and clinical studies

2007-03-16 Thread Cody_Hamilton

I agree that most problems arise in the data management / file derivation
phase.  From my reading of 21 CFR 11, it appears that this document focuses
primarily on data management (as well as on software directly involved in a
medical device) rather than on validation of statistical functions.  I
believe this point has been made previously on the R-help list.

With regards to validating functions, I have often wondered how one can
validate a function when one cannot see what it is doing.  You could
certainly compare calculations from one package to the same calculations
from another package, but then you must purchase (ouch!) and know how to
properly use two software packages instead of one.  And I suppose they
could both be wrong!  Is not peer-review the best form of validation?  . .
. I suspect I may be preaching to the choir here.

I would love nothing more than to migrate our stat group over to R from
SAS.  Based on my experience with R/Splus, the language seems more
extendable, flexible, and has much better graphics (as has been pointed out
many times on this list).  It also has available the many contributions of
generous R users.  However, it has been hard to win pure SAS users onto R
(even if it saves the company money!).  One can't send the biostat group
off to R training like one would to SAS classes.  Learning R requires
initiative from the user (which is not necessarily a bad thing).  I
considered encouraging the purchase of Splus as an intermediate step
(hoping that its proprietary nature would soothe fears regarding open
source software), but that option was not as cheap as I thought.

Regards,
-Cody




Delphine Fontaine wrote:
 Thanks for your answer which was very helpfull. I have another question:

 I have read in this document
 (http://cran.r-project.org/doc/manuals/R-intro.pdf) that most of the
 programs written in R are ephemeral and that new releases are not
 always compatible with previous releases. What I would like to know is
 if R functions are already validated and if not, what should we do to
 validate a R function ?


In the sense in which most persons use the term 'validate', it means to
show with one or more datasets that the function is capable of producing
the right answer.  It doesn't mean that it produces the right answer for
every dataset although we hope it does.  [As an aside, most errors are
in the data manipulation phase, not in the analysis phase.]  So I think
that instead of validating functions we should spend more effort on
validating analyses [and validating analysis file derivation].  Pivotal
analyses can be re-done a variety of ways, in R or in separate
programmable packages such as Stata.

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] CPU usage on Windows

2007-03-16 Thread Jonathan Wang
I'm using R with emacs  ESS on Windows. When I create a plot, sometimes R
will seem to get stuck in a busy loop, i.e. it will use 100% of my CPU.
However the system is still somewhat responsive, and the problem usually
goes away if I create a new device with windows(). If I then close this
device, making the first device active again, sometimes R will get stuck in
the busy loop again.

Has anybody heard of this behavior, or, better yet, have a solution?

Thanks,
Jonathan

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Revisiting multiple plots

2007-03-16 Thread Gabor Grothendieck
Try this (or use xyplot.zoo and write a panel function for that):

library(zoo)
set.seed(1)
tt - as.Date(paste(2004, rep(1:2, 5), sample(28, 10), sep = -))
foo - zoo(matrix(rnorm(100), 10), tt)

pnl - function(x, y, ...) {
lines(x, y, ...)
abline(h = mean(y))
}
plot(foo, panel = pnl)


On 3/16/07, Jonathan Wang [EMAIL PROTECTED] wrote:
 Suppose I create a multiple plot with zoo, using:

 index - ISOdatetime(2004, rep(1:2, 5), sample(28, 10), 0, 0, 0)
 foo - zoo(rnorm(10), index)
 for (i in 1:9) {
  data - rnorm(10)
  z1 - zoo(data, index)
  foo - cbind(foo, z1)
 }
 plot(foo)

 This creates 10 plots on one device, one for each column in foo.

 Now I want to go back and use abline to draw a line at the mean on each of
 my 10 plots. How do I select the appropriate set of axes to draw my line on?

 Thanks,
 Jonathan

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scatterplot brushing

2007-03-16 Thread hadley wickham
Have a look at iplots.  (and rggobi is the updated version of xggobi)

Hadley

On 3/16/07, Richard Valliant [EMAIL PROTECTED] wrote:
 Is there a package (other than xgobi which requires an X server) that
 will do scatterplot brushing?  I see a mention in the mail archive of
 R-orca by Anthony Rossini but it is not in the current list of
 packages.

 My OS is Windows XP version 5.1, service pack 2
 R version 2.4.1 (2006-12-18)

 Thanks


 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scatterplot brushing

2007-03-16 Thread hadley wickham
I should mention that it's very easy to install rggobi and ggobi on
windows these days.  GGobi has a nice installer,
http://www.ggobi.org/downloads, and rggobi is on cran.

Hadley

On 3/16/07, hadley wickham [EMAIL PROTECTED] wrote:
 Have a look at iplots.  (and rggobi is the updated version of xggobi)

 Hadley

 On 3/16/07, Richard Valliant [EMAIL PROTECTED] wrote:
  Is there a package (other than xgobi which requires an X server) that
  will do scatterplot brushing?  I see a mention in the mail archive of
  R-orca by Anthony Rossini but it is not in the current list of
  packages.
 
  My OS is Windows XP version 5.1, service pack 2
  R version 2.4.1 (2006-12-18)
 
  Thanks
 
 
  [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] CPU usage on Windows

2007-03-16 Thread Duncan Murdoch
On 3/16/2007 6:56 PM, Jonathan Wang wrote:
 I'm using R with emacs  ESS on Windows. When I create a plot, sometimes R
 will seem to get stuck in a busy loop, i.e. it will use 100% of my CPU.
 However the system is still somewhat responsive, and the problem usually
 goes away if I create a new device with windows(). If I then close this
 device, making the first device active again, sometimes R will get stuck in
 the busy loop again.
 
 Has anybody heard of this behavior, or, better yet, have a solution?

I've heard of a number of problems with Emacs on Windows.  I wouldn't 
recommend using it.  As far as I can see, it makes a number of 
assumptions about the OS that just aren't true about Windows.

If you can reproduce the behaviour outside of Emacs, I'll investigate.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast lookup in ragged array

2007-03-16 Thread Peter McMahan
That's a good point. What's the overhead on digests like that? Also,  
does that open up the possibility, exceedingly small though it may  
be, of misidentifying a branch as already searched and missing a  
qualifying subgraph?

On Mar 16, 2007, at 2:02 PM, Seth Falcon wrote:

 Peter McMahan [EMAIL PROTECTED] writes:

 Thanks, I'll give it a try. does R have a limit on variable name
 length?

 If you are going to have very long names, you might be better off
 computing a digest of some kind.  You could use the digest package to
 compute an md5sum or the Ruuid package to generate a GUID.

 Also, is it better to over-estimate or under-estimate the
 size parameter?

 The environment will grow as needed.  If you overestimate, you will
 use more memory than you need to.  Whether this is a problem depends
 if you have extra memory available.  Underestimating means that the
 underlying hashtable will need to be resized and this has a
 performance impact.

 + seth

 -- 
 Seth Falcon | Computational Biology | Fred Hutchinson Cancer  
 Research Center
 http://bioconductor.org

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Font in JavaGD() pdf()

2007-03-16 Thread Kubovy Michael
Dear r-helpers,

When I do an xYplot and display the result in a JavaGD() window, the  
font is sans-serif (presumably Helvetica). When I send the figure to  
a pdf, I get a serif font (presumably times). How do I insure that  
the font in the pdf is indeed the default sans serif?

  sessionInfo()
R version 2.4.1 (2006-12-18)
i386-apple-darwin8.8.1

locale:
C

attached base packages:
[1] grid  datasets  stats graphics  grDevices  
utils methods   base

other attached packages:
 coda  gmodels lme4   Matrix   HH  
multcomp  mvtnorm  vcd
 0.10-7 2.13.1  0.9975-13  0.9975-11 1.18-1 
0.991-8  0.7-5  1.0-2
   colorspaceHmisc   xtable latticeExtra  lattice  
gridBase MASS  JGR
   0.95  3.2-1  1.4-3  0.1-4 
0.14-16  0.4-3 7.2-32 1.4-15
   iplots   JavaGDrJava
  1.0-5  0.3-6 0.4-14



_
Professor Michael Kubovy
University of Virginia
Department of Psychology
USPS: P.O.Box 400400Charlottesville, VA 22904-4400
Parcels:Room 102Gilmer Hall
 McCormick RoadCharlottesville, VA 22903
Office:B011+1-434-982-4729
Lab:B019+1-434-982-4751
Fax:+1-434-982-4766
WWW:http://www.people.virginia.edu/~mk9y/



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Discriminating between experiments with xyplot (lattice)

2007-03-16 Thread Deepayan Sarkar
Here's one possibility:


d - make.groups(v1,v2,v3)

## create interaction dropping unused levels
d$intg - with(d, which:factor(temp))[, drop=TRUE]

## extract experiment and temp information from levels
intg.expt - sapply(strsplit(levels(d$intg), :, fixed = TRUE), [[, 1)
intg.temp - sapply(strsplit(levels(d$intg), :, fixed = TRUE), [[, 2)

## find a suitable color vector (where colors are repeated when the
## temperature is)
temp.unique - unique(intg.temp)
col.temp -
rep(trellis.par.get(superpose.symbol)$col,
length = length(temp.unique))
col.intg - col.temp[match(intg.temp, temp.unique)]


xyplot(conversion ~ time, data = d, groups = intg,
   intg.expt = intg.expt,
   panel = panel.superpose,
   panel.groups = function(x, y, ..., group.number, intg.expt) {
   panel.xyplot(x, y, ...)
   panel.text(tail(x, 1), tail(y, 1), intg.expt[group.number], pos = 4)
   },
   col = col.intg,
   key = list(text = list(temp.unique), points = list(col =
col.temp, pch = 1)))


Hope that helps,

Deepayan


On 3/16/07, Frank Sarfert [EMAIL PROTECTED] wrote:
 Hi Deepayan,

 many thanks for your quick reply.
 I actually cannot condition whith the experiment number, because in my real 
 life data, I would like to condition with pressure and I have many more 
 experiments (not just 3). So ideally, I would have a plot with a label near 
 the end point of the curve which says which experiment it comes from.  So, if 
 you could really write  some code  -  that'd be great! Many thanks in advance!

 Best regards
 Frank


 -Ursprüngliche Nachricht-
 Von: Deepayan Sarkar [EMAIL PROTECTED]
 Gesendet: 16.03.07 22:47:47
 An: Christel u. Frank Sarfert [EMAIL PROTECTED]
 CC: r-help@stat.math.ethz.ch
 Betreff: Re: [R] Discriminating between experiments with xyplot (lattice)


 On 3/16/07, Christel u. Frank Sarfert [EMAIL PROTECTED] wrote:
  Hi,
 
  suppose I have data from 3 experiments which show conversion as a function 
  of
  time for different boundary conditions, e.g. pressure, temperature. I want 
  to
  plot all experiments as conversion over time grouped according to the
  temperature. However, since I have more than one experiment performed at the
  same temperature (but different pressures) I end up figuring out which curve
  belongs to which experiment. (An example with artificial data of the
  structure I use is given below. It shows three experiments where two
  experiments at temp = 250 but press = 1 and press = 0.5 are plotted within
  one group.)
  My question is: Is there a way to identify which curve whithin the same 
  group
  belongs to which experiment, e.g by plotting a label like the experiment
  number to the end point, or by choosing different symbols for the different
  experiments - while keeping the color encoding of the groups?
 
  Your help is greatly appreciated!
 
  Best regards
Frank
 
 
  require(lattice)
 
  ## generating the data, I need
  time - seq(0,50,1)
  conversion - 2 * log(time + 1)
  press - rep(1,51)
  temp - rep(250,51)
  experiment - rep(1, 51)
  v1 = as.data.frame(cbind(experiment,time,conversion,press,temp))
 
  conversion - 2.5 * log(time + 1)
  press - rep(1, 51)
  temp - rep(270,51)
  experiment - rep(2, 51)
  v2 = as.data.frame(cbind(experiment,time,conversion,press,temp))
 
  conversion - 1.25 * log(time + 1)
  press - rep(0.5, 51)
  temp - rep(250,51)
  experiment - rep(3, 51)
  v3 = as.data.frame(cbind(experiment,time,conversion,press,temp))
 
  d - rbind(v1,v2,v3)

 You want to use make.groups rather than rbind here, so that you retain
 information on which experiment each row is coming from:

  dd - make.groups(v1,v2,v3)
  str(dd)
 'data.frame':   153 obs. of  6 variables:
  $ experiment: num  1 1 1 1 1 1 1 1 1 1 ...
  $ time  : num  0 1 2 3 4 5 6 7 8 9 ...
  $ conversion: num  0.00 1.39 2.20 2.77 3.22 ...
  $ press : num  1 1 1 1 1 1 1 1 1 1 ...
  $ temp  : num  250 250 250 250 250 250 250 250 250 250 ...
  $ which : Factor w/ 3 levels v1,v2,v3: 1 1 1 1 1 1 1 1 1 1 ...

 Now how you choose to plot this is up to you. One simple possibility
 is to create a new factor encoding both experiment and temperature,
 e.g.,

 xyplot(conversion ~ time, data = dd,
   groups = (which:factor(temp))[,drop=TRUE],
   auto.key = T)

 Another is to condition on one, e.g.,

 xyplot(conversion ~ time | factor(temp), data = dd, groups = which,
 auto.key = T)

 It is possible to use one one variable for color and another for
 plotting character, but the code involved would be somewhat less
 elegant (mostly because the support functions are not already built
 in). Let me know if you really want that; I can come up with some
 sample code.

 -Deepayan



 _
 Der WEB.DE SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
 http://smartsurfer.web.de/?mc=100071distributionid=0066




Re: [R] Rownames are always character?

2007-03-16 Thread Charilaos Skiadas
On Mar 16, 2007, at 6:26 PM, Mike Prager wrote:

 Mike Prager [EMAIL PROTECTED] wrote:

 Gurus,

 Can I rely on the rownames() function, when applied to a matrix,
 always returning either NULL or an object of type character?  It
 seems that row names can be entered as integers, but as of now
 (R 2.4.1 on Windows) the function returns character vectors, not
 numbers (which is my desired result).

 (To clarify my point on this Friday afternoon:  the observed
 behavior is my desired result.  I'm just asking, can I count on
 it?)

I would venture to guess that rownames() would always be returning  
something that you would then be able to use for indexing, to  
retrieve particular entries. The help page also implies that the  
return value will always be a character vector, or NULL:

If do.NULL is FALSE, a character vector (of length NROW(x) or NCOL 
(x)) is returned in any case, prepending prefix to simple numbers, if  
there are no dimnames or the corresponding component of the dimnames  
is NULL.

I would think you can count on this about as much as you can count  
the sum function to always add up its arguments, or something of that  
sort.

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Re: CPU usage on Windows

2007-03-16 Thread Richard M. Heiberger
I can't imagine using Windows without Emacs.
In particular, the Windows ports of Emacs are very aware
of the operating system and usually make the right assumptions.

The type of behavior you are noticing can probably be cured by typing C-g in the
*R* buffer in emacs.  The most likely cause is that the R process in Emacs
is waiting for the plot to finish and is querying the plotting device.
Most of that excess CPU usage is from the query loop.  The C-g tells Emacs and R
to stop waiting.

If C-g doesn't stop the 100% CPU utilization, then it is most likely something
about the specific plot you are drawing.  We will need to see a reproducible
example to say more.

Rich

 Original message 
Date: Fri, 16 Mar 2007 19:37:14 -0400
From: Duncan Murdoch [EMAIL PROTECTED]  
Subject: Re: [R] CPU usage on Windows  
To: Jonathan Wang [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch

On 3/16/2007 6:56 PM, Jonathan Wang wrote:
 I'm using R with emacs  ESS on Windows. When I create a plot, sometimes R
 will seem to get stuck in a busy loop, i.e. it will use 100% of my CPU.
 
 Has anybody heard of this behavior, or, better yet, have a solution?

I've heard of a number of problems with Emacs on Windows.  I wouldn't 
recommend using it.  As far as I can see, it makes a number of 
assumptions about the OS that just aren't true about Windows.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.