from:"Kjetil Brinchmann Halvorsen"

Re: [R] AICc vs AIC for model selection

2006-07-15 Thread Kjetil Brinchmann Halvorsen

Spencer Graves wrote:
  Regarding AIC.c, have you tried RSiteSearch(AICc) and 
 RSiteSearch(AIC.c)?  This produced several comments that looked to me 
 like they might help answer your question.   Beyond that, I've never 
 heard of the forecast package, and I got zero hits for 
 RSiteSearch(best.arima), so I can't comment directly on your question.

The forecast package is at
http://www-personal.buseco.monash.edu.au/~hyndman/Rlibrary/forecast/

Kjetil

 
 Do you have only one series or multiple?  If you have only one, I 
 think it would be hard to justify more than a simple AR(1) model. 
 Almost anything else would likely be overfitting.
 
 If you have multiple series, have you considered using 'lme' in the 
 'nlme' package?  Are you familiar with Pinheiro and Bates (2000) 
 Mixed-Effects Models in S and S-Plus (Springer)?  If not, I encourage 
 you to spend some quality time with this book.  My study of it has been 
 amply rewarded, and I believe yours will likely also.
 
 Best Wishes,
 Spencer Graves
 
 Sachin J wrote:
 Hi,

   I am using 'best.arima' function from forecast package 
 to obtain point forecast for a time series data set. The
 documentation says it utilizes AIC value to select best ARIMA
 model. But in my case the sample size very small - 26
 observations (demand data). Is it the right to use AIC value for
 model selection in this case. Should I use AICc instead of AIC.
 If so how can I modify best.arima function to change the selection
 creteria? Any pointers would be of great help.

   Thanx in advance.

   Sachin



  
 -

  [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] combinatorial programming problem

2006-05-26 Thread Kjetil Brinchmann Halvorsen

Hola!

I am programming a class (S3) symarray for
storing the results of functions symmetric in its
k arguments. Intended use is for association indices
for more than two variables, for instance coresistivity
against antibiotics.

There is one programming problem I haven't solved, making an inverse
of the index function indx() --- se code below. It could for instance 
return the original k indexes in strictly increasing order, to make 
indx() formally invertible.

Any ideas?

Kjetil


Code:


# Implementing an S3 class for symarrays with array rank r for dimension
# [k, k, ..., k] with k=r repeated r times. We do not store the diagonal.

# Storage requirement is given by {r, k}=  choose(k, r)
# where r=array rank, k=maximum index

symarray - function(data=NA, dims=c(1,1)){
  r - dims[1]
  k - dims[2]
  if(r  k) stop(symarray needs dimension larger than array rank)
  len - choose(k, r)
  out - data[1:len]
  attr(out, dims) - dims
  class(out) - symarray
  out
}

# Index calculation:

indx - function(inds, k){
r - length(inds)
if(r==1) return(inds) else {
   if(inds[1]==1) {
  return( indx(inds[-1]-1, k-1 ) ) } else {
  return( indx(c(inds[1]-1, seq(from=k-r+2, by=1, to=k)), k) +
  indx( inds[-1]-inds[1], k-inds[1] ))
  }
}
   } # end indx

# Methods for assignment and indexing:

[.symarray - function(x, inds, drop=FALSE){
 dims - attr(x, dims)
 k - dims[2]
 inds - indx(inds, k)
 res - NextMethod([, x)
 res
}

[-.symarray - function(x, inds, value){
 dims - attr(x, dims)
 k - dims[2]
 inds - indx(inds, k)
 res - NextMethod([-, x)
 res
}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] access list component names with lapply

2006-04-19 Thread Kjetil Brinchmann Halvorsen

Thomas Girke wrote:
 I have a question regarding accessing the names of list
 components when applying a function with lapply.
 
 Here is an example that demonstrates what I'd like to do.
 
 I have a list like this one:
   
   mylist - list(a=letters[1:10], b=letters[10:1], c=letters[1:3])

This is how I am doing this:
mapply(functuion(x,y)myfun(x, y),x, names(x)))

Kjetil

 
 Now I would like to append the names of the list components to their
 corresponding vectors with the c() function. I thought this could
 be done like in the following command, but it doesn't:
 
   lapply(mylist, function(x) { c(names(x), x)  } )
 
 I know how to do this in a for loop, but lapply runs so much faster over
 large lists.
   
 Any help on this simple problem will be highly appreciated.
 
 Thomas
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Intercepts in linear models.

2006-03-23 Thread Kjetil Brinchmann Halvorsen

Rolf Turner wrote:
 A colleague asked me if there is a way to specify with a
 ***variable*** (say ``cflag'') whether there is an intercept in a
 linear model.
 
 She had in mind something like
 
   lm(y ~ x - cflag)

something like
lm( if (cflag) y ~ x-0 else y ~ x, ...

Kjetil

 
 where cflag could be 0 or 1; if it's 0 an intercept should
 be fitted, if it's 1 then no intercept.
 
 This doesn't work ``of course''.  The cflag just gets treated
 as another predictor and because it is of the wrong length
 an error is generated.
 
 The best I could come up with was
 
   lm(as.formula(paste(y ~ x -,cflag)))
 
 Which is kludgy.  It (of course!) also be done using an
 if statement.
 
 Is there a sexier way?
 
   cheers,
 
   Rolf Turner
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Classifying time series by shape over time

2006-03-21 Thread Kjetil Brinchmann Halvorsen

Andreas Neumann wrote:
 Dear all,
 
 I have hundreds of thousands of univariate time series of the form:
 character seriesid, vector of Date, vector of integer
 (some exemplary data is at the end of the mail)
 
 I am trying to find the ones which somehow have a shape over time that
 looks like the histogramm of a (skewed) normal distribution:
  hist(rnorm(200,10,2))
 The mean is not interesting, i.e. it does not matter if the first
 nonzero observation happens in the 2. or the 40. month of observation.
 So all that matters is: They should start sometime, the hits per month
 increase, at some point they decrease and then they more or less
 disappear.
 
 Short Example (hits at consecutive months (Dates omitted)):
 1. series: 0 0 0 2 5 8 20 42 30 19 6 1 0 0 0- Good
 2. series: 0 3 8 9 20 6 0 3 25 67 7 1 0 4 60 20 10 0 4  - Bad
 
 Series 1 would be an ideal case of what I am looking for.
 
 Graphical inspection would be easy but is not an option due to the huge
 amount of series.
 

Does function turnpoints)= in package pastecs help_

Kjetil

 Questions:
 
 1. Which (if at all) of the many packages that handle time series is
 appropriate for my problem?
 
 2. Which general approach seems to be the most straightforward and best
 supported by R?
 - Is there a way to test the time series directly (preferably)?
 - Or do I need to type-cast them as some kind of histogram
   data and then test against the pdf of e.g. a normal distribution (but
   how)?
 - Or something totally different?
 
 
 Thank you for your time,
 
  Andreas Neumann
 
 
 
 
 Data Examples (id1 is good, id2 is bad):
 
 id1
 dates   hits
 1  2004-12-01 3
 2  2005-01-01 4
 3  2005-02-0110
 4  2005-03-01 6
 5  2005-04-0135
 6  2005-05-0114
 7  2005-06-0133
 8  2005-07-0113
 9  2005-08-01 3
 10 2005-09-01 9
 11 2005-10-01 8
 12 2005-11-01 4
 13 2005-12-01 3
 
 
 id2
 dates   hits
 1  2001-01-01 6
 2  2001-02-01 5
 3  2001-03-01 5
 4  2001-04-01 6
 5  2001-05-01 2
 6  2001-06-01 5
 7  2001-07-01 1
 8  2001-08-01 6
 9  2001-09-01 4
 10 2001-10-0110
 11 2001-11-01 0
 12 2001-12-01 3
 13 2002-01-01 6
 14 2002-02-01 5
 15 2002-03-01 1
 16 2002-04-01 2
 17 2002-05-01 4
 18 2002-06-01 4
 19 2002-07-01 0
 20 2002-08-01 1
 21 2002-09-01 0
 22 2002-10-01 2
 23 2002-11-01 2
 24 2002-12-01 2
 25 2003-01-01 2
 26 2003-02-01 3
 27 2003-03-01 7
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] reading in only one column from text file

2006-03-07 Thread Kjetil Brinchmann Halvorsen

mark salsburg wrote:
 How do I manipulate the read.table function to read in only the 2nd
 column???

Se the colClasses argument of read.table()

Kjetil

 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] what is scale function? Is it for variable transformation?

2006-03-05 Thread Kjetil Brinchmann Halvorsen

Kum-Hoe Hwang wrote:
 HOwdy
 
 I read R books about scale function for variable transformation.
 Acoording to this  book
 scale function leads me to better regression results. Or am I worng?
 
 I hope somebody tell me about a scale function?
 Is it for variable transformation?

Did you try to read
?scale
?

Look at:
  x - rnorm(100, 5, 10)
  mean(x)
[1] 4.304616
  sd(x)
[1] 9.926883
  x - scale(x)
  mean(x)
[1] 5.39499e-17
  sd(x)
[1] 1

Kjetil


 
 
 
 
 --
 Kum-Hoe Hwang, Phone : 82-31-250-3516Email : [EMAIL PROTECTED]
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] error function

2006-03-05 Thread Kjetil Brinchmann Halvorsen

Nongluck Klibbua wrote:
 hi all,
 Does anyone know which command to use for error function(erf)?
 Thanks
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 
function erf  if package (CRAN)  NORMT3,
as help.search(error function)
could have told yoy.

Kjetil

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Npmc for doing post-hoc after Kruskal

2006-03-04 Thread Kjetil Brinchmann Halvorsen

Farrel Buchinsky wrote:
 I followed the threads that enquired about doing post-hoc tests after doing
 Kruskal testing. It took me to npmc. But npmc is giving an output I do not
 understand. 
 I noticed a thread entitled npmc function: 'x' must be atomic but there
 never appeared to be a resolution.
 
 npmc(npmcinput)
 Error in sort(unique.default(x), na.last = TRUE) : 
 'x' must be atomic
 
 Here is the dataframe I created and called npmcinput
 To prove that I have the data set up correctly I ran
 kruskal.test(OtoMax~grp)
 
 Kruskal-Wallis rank sum test
 
 data:  OtoMax by grp 
 Kruskal-Wallis chi-squared = 59.4028, df = 3, p-value = 7.885e-13
 
grp OtoMax
 11  0
 21  1
 31  0
 41  1
 51  1
 61  1
 71  1
 81  1
 91  1
 10   1  2
 11   1  1
 12   1  1
 13   1  1
 14   1  1
 15   1  2
 16   1  1
 17   1  1
 18   1  1
 19   1  2
 20   1  2
 21   1  0
 22   1  1
 23   1  1
 24   1  2
 25   1  1
 26   1  1
 27   1  1
 28   1  2
 29   3  4
 30   3  4
 31   3  2
 32   3  2
 33   3  2
 34   3  3
 35   3  2
 36   3  2
 37   3  2
 38   3  3
 39   3  2
 40   3  2
 41   3  4
 42   3  3
 43   3  2
 44   3  4
 45   3  3
 46   3  4
 47   3  2
 48   3  2
 49   3  2
 50   3  3
 51   3  2
 52   3  3
 53   3  0
 54   3  4
 55   3  2
 56   3  2
 57   2  2
 58   2  2
 59   2  2
 60   2  2
 61   2  2
 62   2  2
 63   2  3
 64   2  2
 65   2  2
 66   2  3
 67   2  2
 68   2  2
 69   2  2
 70   2  2
 71   2  2
 72   2  2
 73   2  3
 74   2  2
 75   2  3
 76   2  3
 77   2  3
 78   2  2
 79   2  3
 80   2  4
 81   2  2
 82   2  2
 83   2  2
 84   2  2
 85   0  1
 86   0  1
 87   0  0
 88   0  1
 89   0  0
 90   0  0
 91   0  0
 92   0  0
 93   0  1
 94   0  0
 
 Subsequently I renamed grp to be class and OtoMax to be var and
 everything seemed to work.
 Is that really the way one has to do it? As a general pointer to my overall
 R capabilities- does the documentation say that one has to name them class
 and var? 

Yes, the documentation says so!

It is easier to help if you can post your example data in a way which 
can be cutpasted into R, for instance by using dput and posting the result.

Kjetil


Has my e-mail transgressed R etiquette?

Don't think so --- except for the missing output from dput()!

 
 Farrel Buchinsky, MD --- Mobile (412) 779-1073
 Pediatric Otolaryngologist
 Allegheny General Hospital
 Pittsburgh, PA 
 
 
 
 **
 This email and any files transmitted with it are confidentia...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] CCF and Lag questions

2006-03-02 Thread Kjetil Brinchmann Halvorsen

Pallavi Naresh wrote:
 I am new to R and new to time series modeling.
 
 I have a set of variables (var1, var2, var3, var4, var5) for which I have
 historical yearly data.
 I am trying to use this data to produce a prediction of var1, 3 years into
 the future.
 
 I have a few basic questions:
 
 1) I am able to read in my data, and convert it to a time series format
 using 'ts.'
 
 data_ts - ts(data, start = 1988, end = 2005, frequency = 1, deltat = 1)
 
 However, I am not able to refer to the individual columns or variables of my
 new time series object.  For example, I am able to reference 'var1' by
 typing, data$var1, but I can not do the same by using data_ts$var1.  I don't
 see how I can use data_ts without being able to reference the individual
 columns in my dataset.

So you have a multiple time series. Type
class(data_ts)  to see.
This is a matrix, not a data.frame so you need:

data_ts[,var1]   in place of data_ts$var1

 
 2) Since I'm trying to build a multivariate time series model, I want to
 find the correlations of var1 with my other variables (var1, var2, ...etc.)
 and their lagged values.
 
 But since I'm trying to produce a forecast for 3 years into the future, I
 want to find the ccf between var1 and my other variables lagged 3 years.  I
 tried doing:
 
 ccf(var1, lag(var2, 3))

Try

ccf( data_ts[,var1], data_ts[,var2])

 
 but I get the following error:
 
 Error in na.fail.default(ts.union(as.ts(x), as.ts(y))) :
 missing values in object

So if you have missing values in the object try:

ccf( data_ts[,var1], data_ts[,var2], na.action=na.pass)

 
 Does anyone know how to use the lag funciton and ccf together?

Is it necessary?

Kjetil

 
 3)  Suppose var1 and var2 are both of length 20.
 I would expect the correlation of the fourth lag of ccf(var1, var2) to be
 the same as lag zero of:
 ccf(var1[1:17], var2[4:20]), but they are not.  Can someone explain why not?
 
 
 4)  How do I interpret the negative lags produced from the ccf function?
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Increase Decimal Accuracy?

2006-03-02 Thread Kjetil Brinchmann Halvorsen

Ann Hess wrote:
 Is there any way to increase the decimal accuracy for probability 
 distributions.  For example (in R):
 
 1-pchisq(90,5)
 [1] 0
 
 But in Maple, I find that the value is 0.67193x10-17.

Look at this:

  1-pchisq(90,5)
[1] 0
  pchisq(90,5, lower=FALSE)
[1] 6.71932e-18

Kjetil

 
 I need to compare some really small p-values...is there a way to increase 
 the decimal place accuracy beyond 1x10-16 (which seems to be the limit)?
 
 Any help would be appreciated.
 
 Ann
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] linear lists, creation, insertion, deletion, traversal someone?

2006-03-01 Thread Kjetil Brinchmann Halvorsen

Christian Hoffmann wrote:
 Hi,
 
 In a second try I will ask this list to give me some useful pointers.
 
 Linear lists, as described e.g. by N.Wirth in Algorithms and Data
 Structures, seem not to be implemented in S/R, although in lisp we have 
 cons, car, cdr.
 

This has been discussed on the list a (long) time ago, so you can
search the archives. I remember L Tierney opined they should be
implementes as a (S4) class.

Kjetil

 Nevertheless I want to implement an algorithm using such linear lists, 
 porting a trusted program to R (S). I need:
 
 (from Modula/Oberon)
   pSC* = POINTER TO SC;
   SC* = RECORDNext*: pSC; ..
 ... END;
 
 # generate and fill a linked list:
 VAR   p, pA, Kol: pSC;
   NEW( Kol);
   pA := Kol;
   while () {
   NEW( p );
   pA.Next := p;
   pA := p;
   ...
   }
 
 Mapping this list management to R's list() is possible, but may not be 
 so transparent. Insertions and deletions from a list may be more convoluted.
 
 Does anybody have experience in this? I could not find any reference in
 R resources, also the Blue Book is mute here.
 
 Thank you
 Christian

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How to do a proc summary in R?

2006-03-01 Thread Kjetil Brinchmann Halvorsen

Emilie Berthiaume wrote:
 Hi,
 
 I'm a SAS user trying to convert myself to R but I still have problems with 
 some 

pretty simple commands.
 
 First I wanted to add up a number of red-tailed hawks seen per day (

julian day) per year.  So I tried:
 
 RTyrday - tapply(RThr,list(year,julian),sum)

Try:

RT - as.data.frame.table(RTyrday)


 
 And then I tried the following regression:
 
 mod1 -  glm(RTyrday~julian+year, family=gaussian (link=identity),data=RT)

here it is simpler with lm()

Kjetil

 
 Wich didn't work since my vector RTyrday and julian don't have the same 
 length.  My question is: How can I create a new data sheet with the output of 
 my function tapply ?  Something I could have done in SAS by giving an 
 output out to my proc summary
 
 Thank you,
 
 Emilie
 
 
 
 Emilie Berthiaume
 Graduate Student
 Biology Department
 Sherbooke University
 Sherbrooke, Québec
 CANADA
 
 [EMAIL PROTECTED]
   [[alternative HTML version deleted]]
 
 
 
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Compute a correlation matrix from an existing covariance matrix

2006-02-21 Thread Kjetil Brinchmann Halvorsen

Marc Bernard wrote:
 Dear All,

   I am wondering if there is an R function to convert a covariance matrix to 
 a correlation matrix. I have a covariance matrix sigma and I want to compute 
 the corresponding correlation matrix R from sigma.


Does cov2cor  do what you want?

Kjetil

   Thank you very much,

   Bernard
 
   
 -
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How to get around heteroscedasticity with non-linear least squares in R?

2006-02-21 Thread Kjetil Brinchmann Halvorsen

Quin Wills wrote:
 I am using nls to fit dose-response curves but am not sure how to approach
 more robust regression in R to get around the problem of the my error
 showing increased variance with increasing dose.  
 

package sfsmisc  has rnls (robust nls)
which might be of use.

Kjetil

  
 
 My understanding is that rlm or lqs would not be a good idea here.
 'Fairly new to regression work, so apologies if I'm missing something
 obvious.
 
  
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How to read more than 1 table?

2006-02-20 Thread Kjetil Brinchmann Halvorsen

Atte Tenkanen wrote:
 Question 1) I want to read many csv-tables and run the same commands for
 all of them. Is there some simpler solution than this below to solve this
 problem?
 
 
 for (g in 1:6)
 {
 
 if (g==1){k=kt1_0057}
 if (g==2){k=kt1_0101}
 if (g==3){k=kt1_0613}
 if (g==4){k=staten}
 if (g==5){k=tenpenny}
 if (g==6){k=fiddrunk}
 
 TABLE=read.table(paste(/home/user/,k,.csv,sep=),sep = ,,
 na.strings=.,header=F,fill=TRUE);

put your filenames in a character vector filenames:
filenames - paste( c(kt1_0057, ...), .csv, sep=)
tables  - lapply( filenames, function(x) read.table(file=x, na.strings= 
..., ...) )

and if you want to repeat the same task for your six tables do it with 
lapply()  or  sapply()

 
 print(TABLE)
 
 }
 
 Question 2) Is it possible to create new variables for example with the
 assistance of for-loop without initialising them beforehand?

This way:
sapply(1:10, function(i) your.task(i) )

sapply() will do the initialization for you!

Kjetil

 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] function for prediting garch

2006-02-17 Thread Kjetil Brinchmann Halvorsen

oliver wee wrote:
 hello,
 
 In my time series data, I was able to successfully fit
 its ARIMA model (Box-Jenkins) and its GARCH model and
 estimate their parameters. I was also able to forecast
 future values of the time series based on my fitted
 ARIMA model using the predict() function call. 
 
 However, I'm not sure what is the correct function
 command to call in order to forecast  future values of
 my time series using both the fitted ARIMA model and
 the fitted GARCH model. Using predict() didn't give me
 the result I was looking for. And I can't find any
 documentation using help.search,

You should have given reproducible code!

In my understanding, (g)arch is applied to an
uncorrelated series without autocorrelastions,
as the residuals from a properly estimated ARIMA
model. So to get the predictions for the original
series, you need to
1) predict with the ARIMA model
2) estimate a garch model to the residuals
3) predict the residuals
4) modify the prediction from 1) with the prediction from 3)

Kjetil

 
 I think what I am looking for is akin to the garchsim
 and garchpred commands in Mathlab.
 
 Any help is appreciated. Thanks!
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] read.table

2006-02-14 Thread Kjetil Brinchmann Halvorsen

Diethelm Wuertz wrote:
 Thanks a lot that works fine!
 
 Next problem, if I would have my own package, and the file test.csv
 would be located in the data directory
 
 How to use the function data to get
 
   data(test)

Also put in the data subdirectory the file test.R
with the commands to read  test.csv

Kjetil

 
 resulting in:
 
   test
 
 %y-%m-%d VALUE
 1 1999-01-01   100
 2 2000-12-31   999
 
 
 Again Thanks in advance Diethelm Wuertz
 
 
 
 
 Phil Spector wrote:
 
 Look at the check.names= argument to read.table -- you want to set it
 to FALSE.  But rememeber that you'l have to put quotes around the name
 whenever you use it, as in x$'%y-%m-%d'

- Phil Spector
  Statistical Computing Facility
  Department of Statistics
  UC Berkeley
  [EMAIL PROTECTED]

 On Tue, 14 Feb 2006, Diethelm Wuertz wrote:


 I have a file named test.csv with the following 3 lines:

 %y-%m-%d;VALUE
 1999-01-01;100
 2000-12-31;999


 read.table(test.csv, header = TRUE, sep = ;)
 delivers:

   X.y..m..d VALUE
 1 1999-01-01   100
 2 2000-12-31   999


 I would like to see the following ...

%y-%m-%d VALUE
 1 1999-01-01   100
 2 2000-12-31   999


 Note,

 readLines(test.csv, 1)
 delivers

 [1] %y-%m-%d;VALUE


 Is this possible ???


 Thanks DW

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] how I can perform Multivariate Garch analysis in R

2006-02-14 Thread Kjetil Brinchmann Halvorsen

Arun Kumar Saha wrote:
 Dear aDVISOR,
 Hope I am not disturbing you. Can you tell me how I can perform Multivariate
 Garch analysis in R. Also please, it is my humble request let me know some
 resource materials on Multivariate Garch analysis itself.

You could tryhelp.search(garch)   or
RSiteSearch(garch)

But for me this only leads to univariate garch (at least two 
implementations).

Kjetil

 
 Sincerely yours,
 
 
 
 --
 Arun Kumar Saha, M.Sc.[C.U.]
 S T A T I S T I C I A N[Analyst]
 Transgraph Consulting [www.transgraph.com]
 Hyderabad, INDIA
 Contact # Home: +91-033-25558038
Office: +91-040-55755003
Mobile: 919849957010
 E-Mail: [EMAIL PROTECTED]
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] LCA in e1071

2006-02-13 Thread Kjetil Brinchmann Halvorsen

Lisa Wang wrote:
 Hello there,
 
 In the e1071 package,'lca' function only works for binary data for
 Latent class analysis. Am I right?
 
  I would like to test the agreement amongst 6 raters for nominal data
 ona scale from 1-4, and conduct a latent class analysis. What package

package irr (on CRAN) might help.

Kjetil

 and what function may I use?
 
 Thank you very much
 
 Lisa Wang
 Princess Margaret Hospital
 tel: 416 946 4501
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] R: mean from list

2006-02-12 Thread Kjetil Brinchmann Halvorsen

[EMAIL PROTECTED] wrote:
 great!! thanks very much, mean(unlist(lapply(listdata, function(z) z$c)))

WHY unlist(lapply(...   when sapply(...
is simpler?

Kjetil


 works well.
 and what about getting the average table $a (displaying the average elements
 across all 1000 matrix)? could you please help me? I am struggling with
 this...
 
 thanks in advance
 Roberto
 
 
 
 mean(unlist(lapply(x, function(z) z$d))) should do the trick
 
 On Sun, 12 Feb 2006 20:06:12 +, [EMAIL PROTECTED] wrote:
 
 hi all,
 I have a simple problem that i am not able to solve. I've a list called
 datalist with the following structure:

 [...]

 [[10]]
 [[10]]$a

  -1  0  1
   -1 31  5  2
   0   6  7  5
   1   1  7 36

 [[10]]$b

  -1  0  1
   -1 31  5  2
   0   6  7  5
   1   1  7 36

 [[10]]$c
 [1] 0.855

 [[10]]$d
 [1] 0.855

 [...]

 with [[1]] ... [[100]]. How can i get the mean value of datalist[[x]]$d,
 where x represents all elements from 1 to 1000 ?

 thanks in advance

 Roberto Furlan
 University of Turin

 
 La mia Cartella di Posta in Arrivo è protetta da SPAMfighter
 205 messaggi contenenti spam sono stati bloccati con successo.
 Scarica gratuitamente SPAMfighter!

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


 
 Rajarshi Guha
 [EMAIL PROTECTED]
 http://jijo.cjb.net
 
 
 La mia Cartella di Posta in Arrivo è protetta da SPAMfighter
 205 messaggi contenenti spam sono stati bloccati con successo.
 Scarica gratuitamente SPAMfighter!
 
 
 
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] putting text in the corner

2006-02-09 Thread Kjetil Brinchmann Halvorsen

Fernando Mayer wrote:
 You can get the x and y position of any place of an open graphic device 
 with the mouse cursor, using the function locator(), and even assing 
 this values to an object, as in:
 
 xy-locator()
 
 You will have a list with x and y positions. Then you can use:
 
 text(xy$x,xy$y,...)

or even
text(locator(), test, ...)

Kjetil


 
 See ?locator.
 
 HTH,
 Fernando Mayer.
 
 Thomas Steiner escreveu:
 
 I want to write some text in a corner of my plot.
 Is it possible to get the xlim and ylim of an open window?
 Or is there anything similar like
 legend(x=bottomright, inset=0.01,legend=...)
 for
 text(x=1,y=2, test)

 Thomas

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

  

 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] New user: Custom probability distribution

2006-02-09 Thread Kjetil Brinchmann Halvorsen

Jan Danielsson wrote:
 Hello,
 
Given a probability function: p(x) = 12 / (25(x+1)) , x=0, 1, 2, 3 we
 generate the following values:
 
   C1  C2
   0   0.48
   1   0.24
   2   0.16
   3   0.12
 
Now, I'm supposed to create 50 random values using this table. In
 MiniTab, I simply selected Calc - Random Data - Discrete, and selected
 the columns, and it created 50 random values in a new column.[1]
 
 How do I do the same thing in R?

sample( 0:3, 50, prob=c(0.48, 0.24, 0.26, 0.12))

Kjetil

 
[1] You may be wondering why I'm telling you this. Well, it's because
 if I were in your shoes, I'd think Oh, someone wants me to solve his
 homework.. I have already solved it using MiniTab, but I want to be
 able to use R instead (since I prefer NetBSD).
 
 
 
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] rob var/cov + LAD regression

2006-02-08 Thread Kjetil Brinchmann Halvorsen

Angelo Secchi wrote:
 Hi,
 after looking around I was not able to get info about these two questions:
 
 1. Is there a function to have a  jackknifed corrected  var/cov estimate 
 (as described in MacKinnon 

and White 1985) in a standard OLS regression?

Not sure, but look at package  sandwich  (on CRAN).

 
 2. Does R possess a LAD (Least Absolute Deviation) regression function?

This can be done with package quantreg  (on CRAN).

Kjetil

 
 Any help?
 Thanks


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] power and sample size for a GLM with poisson response variable

2006-02-06 Thread Kjetil Brinchmann Halvorsen

Craig A Faulhaber wrote:
 Hi all,
 
 I would like to estimate power and necessary sample size for a GLM with 
 a response variable that has a poisson distribution.  Do you have any 
 suggestions for how I can do this in R?  Thank you for your help.
 
 Sincerely,
 Craig
 
package asypow (on CRAN) or simulation.

Kjetil

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] can I do this with read.table??

2006-01-26 Thread Kjetil Brinchmann Halvorsen

Douglas Grove wrote:
 Hi,
 
 I'm trying to figure out if there's an automated way to get
 read.table to read in my data and *not* convert the character
 columns into anything, just leave them alone.  What I'm referring

?Did you read the help page?
What about argument as.is=TRUE?
See also argument colClasses

Kjetil

 to as 'character columns' are columns in the data that are quoted.
 For columns of alphabetic strings (that aren't TRUE or FALSE) I can
 suppress conversion to factor with as.is=TRUE, but what I'd like to
 stop is the conversion of quoted numbers of the form 01,02,..., 
 into numeric form.
  
 By an 'automated way', I mean one that does not involve me having
 to know which columns in the data are the ones I want kept as
 they are.
 
 This doesn't seem like an unreasonable thing to want to do.
 After all, say I've got the data.frame:
 
   A - data.frame(a=1:3, b=I(c(01,02,03)))
 
 I can export this to a text file with the simple command
 
   write.table(A, A.txt, sep=\t, row.names=FALSE, quote=TRUE)
 
 but I cannot find an equally simple mechanism for reading this
 data back in from A.txt that allows me to reconstruct my
 data.frame 'A'.  Is this an unreasonable thing to expect?
 
 Thanks,
 Doug
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] R vs. Excel (R-squared)

2006-01-25 Thread Kjetil Brinchmann Halvorsen

Cleber N. Borges wrote:
 
 
 I was quite interested in this thread (discussion),
 once that I am chemistry student and I work with Mixtures Designs that are
 models without intercept.
 
 I thought quite attention the follow afirmation:
 
 ' Thus SST, the corrected total
 sum of squares, should be used when you have a model with an intercept
 term but the uncorrected total sum of squares should be used when you
 do not have an intercept term. ' (Douglas Bates)
 
 
 I have as reference a book called:
 
 Experiments with Mixtures: Designs, Models, and the Analysis of Mixture 
 Data
 second edition
 
 John A. Cornell
 (Professor of Statistics in University Of Florida)
 
 
 In this book, pg 42: item 2.7 - THE ANALYSIS OF VARIANCE TABLE,
 I have the model below:
 
 
 y(x) = 11.7x1 + 9.4x2 + 16.4x3 + 19.0x1x2 + 11.4x1x3 - 9.6x2x3
 
 
 with the follow ANOVA Table:
 
 
 source of variationD.F.SSMS
 
 Regressionp-1SSR=\sum( y_{pred} - y_{mean} )^2ssR/(p-1)
 
 ResidualN-pSSE=\sum( y_{exp} - y_{pred} )^2ssE/(N-p)   
 
 TotalN-1SSE=\sum( y_{exp} - y_{mean} )^2
 
 
 pred = predicted
 exp = experimental
 
 and in many others books.
 
 I always see the ANOVA Table of Mixtures systems with SST, the 
 corrected total
 sum of squares ( N-1 degrees freedom ).
 
 
 
 I would like to ask:
 
 1) What is approach ( point view ) more adequate ?

With a mixture model, although you do not have a intercept term
directly in the model, it is there, occulted,as the sum of the design
variables representing the mixture is 1! So it is correct to use the
corrected sum of squares.

Kjetil

 
 2) Could someone indicate some reference about this subject ?
 
 
 Thanks a lot.
 Regards
 
 
 Cleber N. Borges
 
 
 
 
 
  Dados
 x1  x2  x3  y
 10011
 10012.4
 0.50.5015
 0.50.5014.8
 0.50.5016.1
 0108.8
 01010
 00.50.510
 00.50.59.7
 00.50.511.8
 00116.8
 00116
 0.500.517.7
 0.500.516.4
 0.500.516.6
 
 ## Model
 
 d.lm - lm( y ~ -1 + x1*x2*x3 - x1:x2:x3, data = Dados )
 
 
 
 ### Anova like in the book
 d.aov - aov( y ~  x1*x2*x3 - x1:x2:x3, data = Dados )
  SSR (fitted Model) = 128.296
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Douglas Bates wrote:
 
 On 1/24/06, Lance Westerhoff [EMAIL PROTECTED] wrote:
  

 Hi-

 On Jan 24, 2006, at 12:08 PM, Peter Dalgaard wrote:



 Lance Westerhoff [EMAIL PROTECTED] writes:

  

 Hello All-

 I found an inconsistency between the R-squared reported in Excel vs.
 that in R, and I am wondering which (if any) may be correct and if
 this is a known issue.  While it certainly wouldn't surprise me if
 Excel is just flat out wrong, I just want to make sure since the R-
 squared reported in R seems surprisingly high.  Please let me know if
 this is the wrong list.  Thanks!


 Excel is flat out wrong. As the name implies, R-squared values cannot
 be less than zero (adjusted R-squared can, but I wouldn't think
 that is what Excel does).
  

 I had thought the same thing, but then I came across the following
 site which states: Note that it is possible to get a negative R-
 square for equations that do not contain a constant term. If R-square
 is defined as the proportion of variance explained by the fit, and if
 the fit is actually worse than just fitting a horizontal line, then R-
 square is negative. In this case, R-square cannot be interpreted as
 the square of a correlation. Since

 R^2 = 1 - (SSE/SST)

 I guess you can have SSE  SST which would result in a R^2 of less
 then 1.0.  However, it still seems very strange which made me wonder
 what is going on in Excel needless to say!

 http://www.mathworks.com/access/helpdesk/help/toolbox/curvefit/
 ch_fitt9.html


 This seems to be a case of using the wrong formula.  R^2 should
 measure the amount of variation for which the given model accounts
 relative to the amount of variation for which the *appropriate* null
 model does not account.  If you have a constant or intercept term in a
 linear model then the null model for comparison is one with the
 intercept only.  If you have a linear model without an intercept term
 then the appropriate null model for comparison is the model that
 predicts all the responses as zero.  Thus SST, the corrected total
 sum of squares, should be used when you have a model with an intercept
 term but the uncorrected total sum of squares should be used when you
 do not have an intercept term.

 It is disappointing to see the MathWorks propagating such an
 elementary misconception.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 .

Re: [R] finding index of maximum value in vector

2005-12-18 Thread Kjetil Brinchmann Halvorsen

Jean-Luc Fontaine wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 I found:
   max.col(matrix(c(1,3,2),nrow=1))
 Is there a more concise/elegant way?

which.max

 Thanks,
 
 - --
 Jean-Luc Fontaine  http://jfontain.free.fr/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.1 (GNU/Linux)
 Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org
 
 iD8DBQFDpbfAkG/MMvcT1qQRAjGnAJsH+9MWuiv+K+US0McW2fLzRx2LHwCfYpNG
 gWKLuxnnlNde8WEr7wU5NUM=
 =7/gr
 -END PGP SIGNATURE-
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Age of an object?

2005-12-14 Thread Kjetil Brinchmann Halvorsen

Philippe Grosjean wrote:
 Martin Maechler wrote:
 Trevor == Trevor Hastie [EMAIL PROTECTED]
on Tue, 13 Dec 2005 12:51:34 -0800 writes:

 Trevor It would be nice to have a date stamp on an object.

One way to do this with important objects is to use the comment function
(in package base)

comment(myobj) - made last sunday of 2005

Kjetil



 Trevor In S/Splus this was always available, because objects were files.

[are you sure about always available? 
 In any case, objects were not single files anymore for a
 long time, at least for S+ on windows, and AFAIK also on
 unixy versions recently ]

 This topic has come up before.
 IIRC, the answer was that for many of us it doesn't make sense
 most of the time: 
 
 I remember it was discussed several times. I don't remember why it was 
 considered too difficult to do.
 
 If you work with *.R files ('scripts') in order to ensure
 reproducibility, you will rerun -- often source() -- these files,
 and the age of the script file is really more interesting.
 Also, I *always* use the equivalent of  q(save = no) and
 almost only use save() to particularly save the results of
 expensive  computations {often, simulations}.
 
 OK, now let me give examples where having such an information would ease 
 the work greatly: you have a (graphical) view of the content of an 
 object (for instance, the one using the view button in R commander), 
 or you have a graphical object explorer that has a cache to speed up 
 display of information about objects in a given workspace (for instance, 
 the SciViews-R object explorer). What a wonderful feature it will be to 
 tell if an object was changed since last query. In the view, one could 
 have a visual clue if it is up-to-date or not. In the object explorer, I 
 could update information only for objects that have changed...
 
 Trevor I have looked around, but I presume this information is not 
 available.

 I assume you will get other answers, more useful to you, which
 will be based on a class of objects which carry an
 'creation-time' attribute.  
 
 Yes, but that would work only for objects designed that way, and only if 
 the methods that manipulate that object do the required housework to 
 update the 'last-changed' attribute (the question was about last access 
 of an object, not about its creation date, so 'last-changed' is a better 
 attribute here). If you access the object directly with, let's say, 
 [EMAIL PROTECTED] - newvalue, that attribute is not updated, isn't it?
 
 Best,
 
 Philippe Grosjean
 
 Martin Maechler, ETH Zurich

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Age of an object?

2005-12-14 Thread Kjetil Brinchmann Halvorsen

Philippe Grosjean wrote:
 Martin Maechler wrote:
 Trevor == Trevor Hastie [EMAIL PROTECTED]
on Tue, 13 Dec 2005 12:51:34 -0800 writes:

 Trevor It would be nice to have a date stamp on an object.  

Following up on my post of a few minutes ago, I tried to write an
timestamp function

timestamp - function(obj, moretext){
comment(obj) - paste(Sys.time(), moretext, sep=\n)
}

but this does'nt work.

  myobj - 1:10
  timestamp(myobj, test)
Error in timestamp(myobj, test) : object obj not found
 

Kjetil


 Trevor In S/Splus this was always available, because objects were files.

[are you sure about always available? 
 In any case, objects were not single files anymore for a
 long time, at least for S+ on windows, and AFAIK also on
 unixy versions recently ]

 This topic has come up before.
 IIRC, the answer was that for many of us it doesn't make sense
 most of the time: 
 
 I remember it was discussed several times. I don't remember why it was 
 considered too difficult to do.
 
 If you work with *.R files ('scripts') in order to ensure
 reproducibility, you will rerun -- often source() -- these files,
 and the age of the script file is really more interesting.
 Also, I *always* use the equivalent of  q(save = no) and
 almost only use save() to particularly save the results of
 expensive  computations {often, simulations}.
 
 OK, now let me give examples where having such an information would ease 
 the work greatly: you have a (graphical) view of the content of an 
 object (for instance, the one using the view button in R commander), 
 or you have a graphical object explorer that has a cache to speed up 
 display of information about objects in a given workspace (for instance, 
 the SciViews-R object explorer). What a wonderful feature it will be to 
 tell if an object was changed since last query. In the view, one could 
 have a visual clue if it is up-to-date or not. In the object explorer, I 
 could update information only for objects that have changed...
 
 Trevor I have looked around, but I presume this information is not 
 available.

 I assume you will get other answers, more useful to you, which
 will be based on a class of objects which carry an
 'creation-time' attribute.  
 
 Yes, but that would work only for objects designed that way, and only if 
 the methods that manipulate that object do the required housework to 
 update the 'last-changed' attribute (the question was about last access 
 of an object, not about its creation date, so 'last-changed' is a better 
 attribute here). If you access the object directly with, let's say, 
 [EMAIL PROTECTED] - newvalue, that attribute is not updated, isn't it?
 
 Best,
 
 Philippe Grosjean
 
 Martin Maechler, ETH Zurich

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Polytopic Vector Analysis (PVA)

2005-12-13 Thread Kjetil Brinchmann Halvorsen

Melanie Edwards wrote:
 I am curious whether anybody has or is developing this capability within
 R.  I have not found any software yet that has this capability and I am
 not sure whether it is too new a method and nobody is actually using it,

googling around I found that this (PVA) is a variant of factor analysis
restricted to find non-negative factors. It is not a new method, 
although maybe the name is. This has been/is used for instance in
air quality monitoring to identify sources of pollution, and if you have
some prior information about possible sources that maybe could be used 
to. I guess this could be called 'source unmixing' or something similar,
which indicatyes a similarity with independent component analysis. Maybe 
enough to restrict optimization to non-negative values?

help.search(factor analysis)  shows that factor analysis is multiply 
implemented for R, so maybe there is something, and if not, maybe
simple to adapt.

Kjetil Halvorsen

 or if there are other means to get the same analysis that I do not know
 of.  Any information regarding developments or use of this method would
 be helpful.
 
 Melanie Edwards
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Generation of missiing values in a time serie...

2005-12-13 Thread Kjetil Brinchmann Halvorsen

Gabor Grothendieck wrote:
 Yes, this is the definition of a time series and therefore of a zoo object.
 A time series is a mathematical function, i.e. it assigns a single element
 of the range to each element of the domain. This data does not describe
 a time series.

Since nobody else has mentiones it on this thread: Tha CRAN package
pastecs  has function `regul'  to regularize irregular time series.

maybe that is what the original poster want.

Kjetil


 
 Also it has no underlying regularity as the warning message states.
 To use as.ts one wants a series with an underlying regularity that has
 gaps and then as.ts will fill in the gaps with NAs.
 
 If we don't have an underlying regularity the question is not well posed
 but its likely we want to discretize time.  The  zoo command itself is
 somewhat forgiving, at least in this case, i.e. it allows one to specify
 this illegal zoo object with non-unique times for purposes of discretization;
 however, such a zoo object should not be used other than to get a legal
 zoo object out.
 
 For example, in the following we round the times to one decimal place
 and then within each set of values at the same discretized time take the
 last one.  (Alternately specify mean instead of tail, 1 if the average
 is prefered.)  Then we convert that to a ts object:
 
 as.ts(aggregate(z, round(time(z), 1), tail, 1))
 Time Series:
 Start = c(123, 2)
 End = c(123, 8)
 Frequency = 10
   time flow seq   ts x  rtt size
 123.1 123.12570 967 123.1257 13394 0.798205 1472
 123.2 123.24110 969 123.2411 12680 0.796258 1472
 123.3   NA   NA  NA   NANA   NA   NA
 123.4   NA   NA  NA   NANA   NA   NA
 123.5 123.47260 970 123.4726 12680 0.796258 1472
 123.6 123.58860 971 123.5886 12680 0.796258 1472
 123.7 123.70460 972 123.7046 12680 0.796258 1472
 
 On 12/13/05, Alvaro Saurin [EMAIL PROTECTED] wrote:
 I think I have found the error. It appears when there are two entries
 with the same time. Using as input file:

 - CUT 
 # Output format for PCKs:
 # TIME FLOW P [+-] SEQ TS X RTT SIZE
 #
 123.125683 0 P + 967 123.125683 13394 0.798205 1472
 123.241137 0 P + 968 123.241137 12680 0.796258 1472
 123.241137 0 P + 969 123.241137 12680 0.796258 1472
 123.472631 0 P + 970 123.472631 12680 0.796258 1472
 123.588613 0 P + 971 123.588613 12680 0.796258 1472
 123.704594 0 P + 972 123.704594 12680 0.796258 1472
 - CUT 

 I run fhe following code:

 - CUT 
 h_types - list (0, 0, NULL, NULL, 0, 0, 0, 0, 0)
 h_names - list (time, flow,  seq, ts, x, rtt, size)

 pcks_file- pipe (grep ' P ' data, r)
 pcks  - scan (pcks_file, what = h_types, comment.char = '#',
 fill = TRUE)
 mat_df  - data.frame (pcks[1:2], pcks[5:9])
 mat   - as.matrix (mat_df)
 colnames (mat)  - h_names
 z - zoo (mat, mat [,time])
 - CUT 

 The dput of 'z' shows:

 - CUT 
 structure(c(123.125683, 123.241137, 123.241137, 123.472631, 123.588613,
 123.704594, 0, 0, 0, 0, 0, 0, 967, 968, 969, 970, 971, 972, 123.125683,
 123.241137, 123.241137, 123.472631, 123.588613, 123.704594, 13394,
 12680, 12680, 12680, 12680, 12680, 0.798205, 0.796258, 0.796258,
 0.796258, 0.796258, 0.796258, 1472, 1472, 1472, 1472, 1472, 1472
 ), .Dim = c(6, 7), .Dimnames = list(c(1, 2, 3, 4, 5,
 6), c(time, flow, seq, ts, x, rtt, size)), index =
 structure(c(123.125683,
 123.241137, 123.241137, 123.472631, 123.588613, 123.704594), .Names =
 c(1,
 2, 3, 4, 5, 6)), class = zoo)
 - CUT 

 If I try a 'as.ts(z)', it fails. If I remove the duplicate entry, I
 can convert it to a TS with no problem. Is this made intentionally?
 Because then I have to filter the input matrix... But, anyway, the
 output matrix, after filtering, doesn't seem regular:

 - CUT 
   as.ts (z)
 Time Series:
 Start = 1
 End = 5
 Frequency = 1
   time flow seq   ts x  rtt size
 1 123.12570 967 123.1257 13394 0.798205 1472
 2 123.24110 969 123.2411 12680 0.796258 1472
 3 123.47260 970 123.4726 12680 0.796258 1472
 4 123.58860 971 123.5886 12680 0.796258 1472
 5 123.70460 972 123.7046 12680 0.796258 1472
 Warning message:
 'x' does not have an underlying regularity in: as.ts.zoo(z)
 - CUT 

 Weird...


 On 13 Dec 2005, at 16:33, Gabor Grothendieck wrote:

 Please provide a reproducible example.  Note that dput(x) will output
 an R object in a way that can be copied and pasted into another
 session.

 On 12/13/05, Alvaro Saurin [EMAIL PROTECTED] wrote:
 On 13 Dec 2005, at 13:08, Gabor Grothendieck wrote:

 Your variable mat is not a matrix; its a data frame.  Check it with:

class(mat)

 Here is an example:

 x - cbind(A = 1:4, B = 5:8)
 tt - c(1, 3:4, 6)

 library(zoo)
 x.zoo - zoo(x, tt)
 x.ts - as.ts(x.zoo)
 Fixed, but anyway it fails:

  h_types - list (0, 0, NULL, NULL, 0, 0, 0, 0, 0)
  h_names - list (time, flow, seq, ts, x,

Re: [R] store and retrieve object names in a vector

2005-12-12 Thread Kjetil Brinchmann Halvorsen

zhihua li wrote:
 hi netters,
 
 suppose i have a series of objects X1, X2, B1,C1... they all 
 have the same dimensions. i want to combine into one by using cbind:
 y-cbind(X1,X2,B1,C1.)
 
 but i don't want to type the names of these objects one by one. instead, 
 i've put their names into a vector: x-c(X1,X2,B1,C1,)

Something like:

do.call(rbind, lapply(x,get)) # not tested
should work!

Kjetil

 
 i used y-cbind(x). but what i got is a matrix of the names, not a 
 combination of matrices.
 
 anybody know how to handle this?
 
 thanks a lot!
 
 
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Help: chisq.test

2005-12-10 Thread Kjetil Brinchmann Halvorsen

Thanjavur Bragadeesh wrote:
 I have two groups of patients (improved or not improved) named x and y group 
 respectively after being treated with 5 different drugs
 
 X-c(43,52,25,48,57) and
 
 Y-c(237,198,245,212,233)
 
 when I run
 
 chisq.test(cbind(x,y))
 

This takes cbind(X,Y) as a contingency table,which is what you want.

 I get a p value of 0.0024
 
 but if I run
 
 chisq.test(x,y)   I get a p value of 0.22 not significant at 5%

This is the same as chisq.test(table(X,Y)), which is the test on the 
contingency table

  table(X,Y)
 Y
X198 212 233 237 245
   25   0   0   0   0   1
   43   0   0   0   1   0
   48   0   1   0   0   0
   52   1   0   0   0   0
   57   0   0   1   0   0

which is not what you want.

Kjetil

 
 
 what is the difference between the two
 
 thanks
 
 bragadeesh
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Broken links on CRAN

2005-12-05 Thread Kjetil Brinchmann Halvorsen

Doran, Harold wrote:
 Sorry, didn't think about that. The mirror I used was
 
 http://lib.stat.cmu.edu/R/CRAN/
 

I just tried what you did, but with firefox, and there were no
problems.

Kjetil

 I checked other mirrors and they did work fine. 
 
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Peter Dalgaard
 Sent: Monday, December 05, 2005 10:11 AM
 To: Doran, Harold
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] Broken links on CRAN
 
 Doran, Harold [EMAIL PROTECTED] writes:
 
 Dear List:

 When I click on the link to download a reference manual for a package 
 on cran, I get an error message that the file is damaged and could not 
 be repaired. I randomly chose various packages and the same error 
 message appears.

 Are the links actually broken? I have also restarted my machine and 
 closed and re-opened acrobat.

 I am using Windows XP, Acrobat Professional 6.0.0.5, and Explorer 
 6.0.2900.2180

 Harold
 
 
 One mirror or all of them? If one, which one?
 
 Two randomly chosen package PDFs on CRAN master (Vienna) gave no problems for 
 me.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] plotting question

2005-12-05 Thread Kjetil Brinchmann Halvorsen

Ed Wang wrote:
 Yes, I have gone through the manual.  My best reference for plotting has
 been examples either through the list archive or searches on the internet.
 Nothing in the introductory manual could get me to what I have been
 able to do so far, but that is limited to plotting componenets of the time
 series returned from an STL call.
 
 This is why I am asking for example or references to examples from anyone
 who would be willing to share them.  For some of us not very familiar with
 S+, etc. the documentation with R is not enough.  While I can plot two
 time series one above another using the mfrow() function I'd prefer to
 put two time series in one plot in different colours and using two different
 symbols, which I cannot do using calls to plot().

What about making the two time series into an mts (multiple time series)
object, with
my.mts -  cbind(ts.1, ts.2)   or maybe
my.ts  -  ts.union(ts.1, ts.2)This latest command does not assume a 
commomtime base.   Then
plot(my.ts,plot.type=single, col=c(red, blue))

Kjetil

 
 Thanks.
 
A man is not old until regrets take the place of dreams.
  Actor John Barrymore
 
 
 
 
 From: Berton Gunter [EMAIL PROTECTED]
 To: 'Ed Wang' [EMAIL PROTECTED], r-help@stat.math.ethz.ch
 Subject: RE: [R] plotting question
 Date: Mon, 5 Dec 2005 14:12:47 -0800
 ?lines ?points
 
 An Introduction to R (and numerous other books on R) explains this. Have you
 read it?
 
 
 -- Bert Gunter
 Genentech Non-Clinical Statistics
 South San Francisco, CA
 
 The business of the statistician is to catalyze the scientific learning
 process.  - George E. P. Box
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] squared coherency and cross-spectrum

2005-12-05 Thread Kjetil Brinchmann Halvorsen

Spencer Graves wrote:
 I haven't seen a reply, so I will comment even though I've never used 
 coherency / coherence nor spectrum.  RSiteSearch(coherence) 
 produced 13 hits, the third of which looked like it might be relevant to 
 your question 
 (http://finzi.psych.upenn.edu/R/Rhelp02a/archive/37640.html). 
 RSiteSearch(coherency) produced 12 hits, at least some of which look 
 like they might help you.  In my cursory review, it looked like at least 
 one of the coherence / coherency hits also mentioned the 
 co-spectrum.  Whether that's true or not, the examples with ?spectrum 
 includes the statement, for multivariate examples see the help for 
 spec.pgram.  If you still have a question for this listserve after 
 reviewing these references, PLEASE do read the posting guide! 
 'www.R-project.org/posting-guide.html'.  I believe that people who 
 follow more closely that posting guide tend to receive quicker, more 
 useful answers than those who don't.
 
 I hope you won't mind if I now ask you a question:  What can you get 
 from coherency and co-spectrum that you can't get as easily from 
 autocorrelation and partial autocorrelation functions, including the 
 cross-correlations?

Although the question is not to me,I try to answer,as I am planning to 
try to use this techniques!  What I hope to get from them is on what 
time scale to time series is correlated, or more succinctly, the high 
frequency variation we can se in both series (ground and sattelite 
measurement of same phenomenon), are they correlated?

Kjetil

 
 hope this helps.
 spencer graves
 
 Ling Jin wrote:
 
 Hi All,

 I have two time series, each has length 354. I tried to calculate the 
 coherency^2 between them, but the value I got is always 1. On a website, 
 it says:  Note that if the ensemble averaging were to be omitted, the 
 coherency (squared) would be 1, independent of the data. Does any of 
 you know how to specify properly in R in order to get more useful 
 coherency? The examples in the help do give coherencies that are not 1s, 
 but I did not notice any special specification.

 Next question is on co-spectrum. When I supply spectrum function with 
 multiple time series, it only gives me spectrum (smoothed periodogram) 
 of individual time series. Is there any way I can get the 
 cross-spectrum? I believe R has calculated it, but I could not find in 
 the returned values.

 Attached is the smoothed periodogram of the two time series.

 Thanks a lot!

 Ling


 


 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Impaired boxplot functionality - mean instead of median

2005-12-01 Thread Kjetil Brinchmann Halvorsen

P Ehlers wrote:
 I'd like to add two comments to Martin's sensible response.
 
 1. I've seen several intro-stats textbooks that define a
 boxplot to have whiskers to the extreme data values
 and then define Tukey's boxplot as a modified boxplot.
 I wish authors wouldn't do that.
 
 2. I've also seen boxplots used for sample sizes as small
 as -- are you ready for it? -- n = 2!! (Admittedly, only in
 plots comparing several groups.) The help page for
 stripchart() points out that stripcharts are a good
 alternative to boxplots when sample sizes are small.
 My own rule-of-thumb: n  20 for single boxplots, n  12
 for multiple boxplots.

Woul've it make sense to have an option to replace boxes with dotplots
for only those groups with number of observations lesser tahn nmin=20 (say)

Kjetil

 
 Peter Ehlers
 
 Martin Maechler wrote:
 
 Boxplots were invented by John W. Tukey and I think should be
 counted among the top small but smart achievements from the
 20th century.  Very wisely he did *not* use mean and standard deviations.

 Even though it's possible to draw boxplots that are not boxplots
 (and people only recently explained how to do this with R on this
  mailing list), I'm arguing very strongly against this.

 If I see a boxplot - I'd want it to be a boxplot and not have
 the silly (please excuse)  10%90% whiskers  which
 declare 20% of the points as outliers {in the boxplot sense}.

 If you want the mean +/- sd plot, do *not* misuse boxplots
 for them, please! 

 Martin Maechler, ETH Zurich


 Evgeniy == Evgeniy Kachalin [EMAIL PROTECTED]
on Thu, 01 Dec 2005 19:04:47 +0300 writes:

 Evgeniy Hello to all users and wizards.
 Evgeniy I am regulary using 'boxplot' function or its analogue - 
 'bwplot' from 
 Evgeniy the 'lattice' library. 

  [there's the lattice *package*  !]

 Evgeniy But they are, as far as I understand, totally 
 Evgeniy flawed in functionality: they miss ability to select what they 
 would 
 Evgeniy draw 'in the middle' - median, mean. What the box means - 
 standard 
 Evgeniy error, 90% or something else. What the whiskers mean - 100%, 
 99% or 
 Evgeniy something else.
 Evgeniy Is there any way to realize it? Or is there any other good data 
 Evgeniy visualization function for comparing means of various data 
 groups? 
 Evgeniy Ideally I would like to have a bit more customised function for 
 doing 
 Evgeniy that. For example, 'boxplot(a~b,data=d,mid='mean').


 Evgeniy -- 
 Evgeniy Evgeniy, ICQ 38317310.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] help with R

2005-12-01 Thread Kjetil Brinchmann Halvorsen

Ed Wang wrote:
 Morning,
 
 I've downloaded the precompiled R 2.1.1 version and am using Windows XP
 on my office workstation.  As mentioned previously, I've resorted to batch
 jobs to avoid the hanging that occurs when I try to plot the 3690 length
 vector of data.  If it's warranted, I can do a build from the source and 
 change
 specific parameters in the makefile if people feel it is warranted.
 
 Based on Berton's suggestion to look at the range of packages available
 I think stl() might be as appropriate a package to use to identify all three
 components of the time series data I have: underlying trend, seasonality
 over a full year period (periodicity of one year, or 246 days in my case),
 and residual (which I have no expectation that it will necessarily be
 ~N(0,\sigma^2)).
 
 For the following dataset (15 years, 246 days/year = 3690 days of data)
 what reasonal parameters for running stl() would folks suggest?  I've not
 had any luck with getting stl() to return any useful information.  It 
 continues
 to stop with the statement
 
 series is not periodic or has less than two periods
 
 using
 
 stl(zHO, s.window=1, s.degree=2, l.window=246)
 
 or the obvious ways I might try running stl() (i.e. plot(stl(zHO))).  It's
 possible I've not properly specified the length of expected periodicity as
 a parameter (246 days in my case).

In creating the timeseries 9ts) object you need
myts -  ts(mydata,,   frequency=246)


Kjetil

 
 All suggestions are welcome!  I'm trying to avoid going back and fitting
 a linear model with 245 dummy variables.
 
 Thanks.
 
 Ed
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Question on KalmanSmooth

2005-11-28 Thread Kjetil Brinchmann Halvorsen

I am trying to use KalmanSmooth to smooth a time series
fitted by arima (and with missing values), but the $smooth component
of the output baffles me.  Look at the following example:

testts - arima.sim(list(ar=0.9),n=100)
testts[6:14] - NA
testmod - arima(testts, c(1,0,0))
testsmooth - KalmanSmooth(testts, testmod$model)
par(mfrow=c(2,1))
plot(testsmooth$smooth, type=l)
plot(testsmooth$var, type=l)

Look at the lower panel plot, how the uncertainty of the
smoothed values first is lowered, then being the highest
at the end ( of the smoothed part, indexes 6:14).
Anybody can explain this,or is this an error?


Kjetil Halvorsen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] coherency-Time Series

2005-11-27 Thread Kjetil Brinchmann Halvorsen

[EMAIL PROTECTED] wrote:
 hello!
 My name is Stefanos, from Athens.
 I'm a new user of R and I'm studying multivariate time series. I can't find 
 in the help menu how to calculate the cross spectrum and the coherency of 2 
 Time Series. Would you like to help me?
 Thanks
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 
See
?spectrum,and especially component $coh  of output.

Kjetil

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] multilevel models and sample size

2005-11-27 Thread Kjetil Brinchmann Halvorsen

ronggui wrote:
 It is not a pure  R question,but I hope some one can give me advices.
 
 I want to use analysis my data with the multilevel model.The data has 2 
 levels the second level has 52 units and each second level unit has 19-23 
 units.I think the sample size is quite small,but just now I can't make the 
 sample size much bigger.So I want to ask if I use the multilevel model to 
 analysis the data set,will it be acceptable?  or  unacceptable because of the 
 small sample size?
 

This kind of question I usually try to answer by
simulation, which is very easy in R.

Kjetil


 Thank you very much!
 
 ronggui 
 
 2005-11-28
 
 --
 Deparment of Sociology
 Fudan University
 
 My new mail addres is [EMAIL PROTECTED]
 Blog:http://sociology.yculblog.com
 
 
 
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] 'For each file in folder F do....'

2005-11-27 Thread Kjetil Brinchmann Halvorsen

Ron Piccinini wrote:
 Hello,
 
 I have 2700 text files in a folder and need to apply
 the same program/procedure to each individually. I'm
 trying to find how to code something like:
 
 For each file in Folder do {Procedure}
 
 is there an easy way to do this? other suggestions? 

files - listfiles()
results - lapply(files, yourprocessing())

where yourprocessing is a function taking as argument a file name and 
returning whatever you want.

Kjetil


 
 I have tried to list all the files names in a vector
 e.g.
 
 listfiles[1:10,1] 
 
 1   H:/Rtest/AXP.txt
 2H:/Rtest/BA.txt
 3 H:/Rtest/C.txt
 4   H:/Rtest/CAT.txt
 5H:/Rtest/DD.txt
 6   H:/Rtest/DIS.txt
 7H:/Rtest/EK.txt
 8H:/Rtest/GE.txt
 9H:/Rtest/GM.txt
 10   H:/Rtest/HD.txt
 
 but R doesn't like statements of type
 
 read.table(file=listfiles[1,1])
 
 since 'file' must be a character string or
 connection...
 
 Any thoughts?
 
 Many thanks in advance,
 
 Ron Piccinini.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] R: pp plot

2005-11-24 Thread Kjetil Brinchmann Halvorsen

Dr. Herwig Meschke wrote:
 To construct a nonparametric (1-alpha) confidence set for an arbitrary 
 CDF F, you can use the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality 
 (e.g., see Wasserman, L. (2005). All of Statistics. 2nd, corr. printing. 
 NY: Springer, p. 99).
 With n=sample size and eps=sqrt(log(2/alpha)/(2*n)),
 the lower and upper limits are pmax(F-eps ,0) and pmin(F+eps, 1).
 Disadvantage: the sample size must be large.
 
 Herwig
 
There is also ecdf.ksCI in CRAN package sfsmisc.


Kjetil

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Can't figure out warning message

2005-11-21 Thread Kjetil Brinchmann Halvorsen

Ravi Varadhan wrote:
 Hi,
 
  
 
 I apologize for the previous posting, where the message was not formatted
 properly.  Here is a better version:
 
  
 
 I have written the following function to check whether a vector has elements
 satisfying monotonicity.
 
  
 
 is.monotone - function(vec, increase=T){
 
 ans - TRUE
 
 vec.nomis - vec[!is.na(vec)]
 
 if (increase  any(diff(vec.nomis,1)  0, na.rm=T)) ans - FALSE
 
 if (!increase  any(diff(vec.nomis,1)  0, na.rm=T)) ans - FALSE
 
 ans
 
 }
 
  
 
 This works correctly, but I get this error message as below.
 
  
 
 x - 2:10
 
 is.monotone(x)
 
 [1] TRUE
 
 Warning messages:
 
 1: the condition has length  1 and only the first element will be used in:
 if (increase  any(diff(vec.nomis, 1)  0, na.rm = T)) ans - FALSE
 
 2: the condition has length  1 and only the first element will be used in:
 if (!increase  any(diff(vec.nomis, 1)  0, na.rm = T)) ans - FALSE 
 
 

Try to double the :  in place of 

Kjetil


  
 
 I am unable to see why the condition should have a length greater than 1,
 since any should give me a single logical value.  
 
  
 
 Can any one tell me what is going on here?  (I am using version 2.1.1 on
 Windows).
 
  
 
 Thanks very much,
 
 Ravi.
 
  
 
 --
 
 Ravi Varadhan, Ph.D.
 
 Assistant Professor,  The Center on Aging and Health
 
 Division of Geriatric Medicine and Gerontology
 
 Johns Hopkins University
 
 Ph: (410) 502-2619
 
 Fax: (410) 614-9625
 
 Email:   mailto:[EMAIL PROTECTED] [EMAIL PROTECTED]
 
 --
 
  
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Use of paste with apply()

2005-11-06 Thread Kjetil Brinchmann halvorsen

I was surprised by:

  test - matrix( as.character(1:4), 2)
  test
  [,1] [,2]
[1,] 1  3
[2,] 2  4
  apply(test, 1, paste, sep=+)
  [,1] [,2]
[1,] 1  2
[2,] 3  4
  apply(test, 1, paste, sep=*)
  [,1] [,2]
[1,] 1  2
[2,] 3  4
  te - matrix(1:4, 2)
  te
  [,1] [,2]
[1,]13
[2,]24
  apply(te, 1, sum)
[1] 4 6

Why doesn't paste behave in apply as sum?

Kjetil


-- 

Checked by AVG Free Edition.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] OLS variables

2005-11-06 Thread Kjetil Brinchmann halvorsen

John Fox wrote:
 Dear Leaf,
 
 I assume that you're using lm() to fit the model, and that you don't really
 want *all* of the interactions among 20 predictors: You'd need quite a lot
 of data to fit a model with 2^20 terms in it, and might have trouble
 interpreting the results. 
 
 If you know which interactions you're looking for, then why not specify them
 directly, as in lm(y ~  x1*x2 + x3*x4*x5 + etc.)? On the other hand, it you
 want to include all interactions, say, up to three-way, and you've put the
 variables in a data frame, then lm(y ~ .^3, data=DataFrame) will do it.  

This is nice with factors, but with continuous variables, and need of a
response-surface type, of model, will not do. For instance, with 
variables x, y, z in data frame dat
lm( y ~ (x+z)^2, data=dat )
gives a model mwith the terms x, z and x*z, not the square terms.
There is a need for a semi-automatic way to get these, for instance,
use poly() or polym() as in:

lm(y ~ polym(x,z,degree=2), data=dat)

Kjetil

 There are many terms in this model, however, if not quite 2^20.
 
 The introductory manual that comes with R has information on model formulas
 in Section 11.
 
 I hope this helps,
  John 
 
 
 John Fox
 Department of Sociology
 McMaster University
 Hamilton, Ontario
 Canada L8S 4M4
 905-525-9140x23604
 http://socserv.mcmaster.ca/jfox 
  
 
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Leaf Sun
 Sent: Sunday, November 06, 2005 3:11 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] OLS variables

 Dear all,

 Is there any simple way in R that can I put the all the 
 interactions of the variables in the OLS model?

 e.g.

 I have a bunch of variables, x1,x2, x20... I expect then 
 to have interaction (e.g. x1*x2, x3*x4*x5... ) with some 
 combinations(2 way or higher dimensions). 

 Is there any way that I can write the model simpler?

 Thanks!

 Leaf


 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 
 
 



-- 

Checked by AVG Free Edition.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] FDR analyses: minimum number of features

2005-09-22 Thread Kjetil Brinchmann Halvorsen

Spencer Graves wrote:

 Two thoughts on this:

 1.  Your FDR (Not Franklin Delano Roosevelt) sounds like another name 
for Type I error rate. 

It is certainly not the same as type I error rate. Type I error rate is 
the proportion of true
nulls which are rejected, while the FDR is the proportion of rejected 
null hypothesis
which really are true nulls!

To me FDR seems like a more promising avenue to multiple testing than 
the old
familywise error rate. Who knows what is a family?

Kjetil

 The definition of reasonably reliable FDRs 
would seem to relate to the status of the literature on this issue among 
researchers in genotyping.  As more reports of FRDs in genotyping are 
published, I would expect that methodology for estimation and the 
standard for accuracy would similarly evolve.

 2.  Have you tried the Bioconductor (www.bioconductor.org/) 
listserve?  They might be able to say something more useful than a 
general list like this.

 spencer graves

Dupont, William wrote:

  

Dear List,

We are planning a genotyping study to be analyzed using false discovery
rates (FDRs) (See Storey and Tibshirani PNAS 2003; 100:9440-5).  I am
interested in learning if there is any consensus as to how many
features (ie. how many P values) need to be studied before reasonably
reliable FDRs can be derived.  Does anyone know of a citation where
this is discussed?

Bill Dupont 

William D. Dupont  phone: 615-343-4100  URL
http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/WilliamDupont

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



  



-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
   --  Mahdi Elmandjra





-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] significance of spectal peak with spectrum()

2005-09-16 Thread Kjetil Brinchmann Halvorsen

Sebastian Leuzinger wrote:

thanks a lot. I am interested in the more complex case where the interest is 
about a specific frequency being significant, not at least one frequency 
being significantly different from the backgrond white noise.

I have discussed this issue with very knowledgable people in the field who 
could not help me either. I would be interested in any references as well.


  


http://bayes.wustl.edu/

where you can download Bretthorst's book Bayesian Spectrum Analysis and 
Parameter
Estimation

Kjetil


On Friday 16 September 2005 10:36, you wrote:
  

Sebastian Leuzinger wrote:


the null hypothesis would be: one particular frequency peak is not
significantly different from the background noise.
  

So you want to know, e.g., whether there is something going on at 1000
Hz? This is difficult: If you are considering the periodogram to be a
density, then you do not know the distribution of the value of a single
frequency, because it depends on the stuff going on at other frequencies.

Second point is (and already asked): Kind of [background] noise?

The only really easy test is for the Null signal is white noise, hence
H1 is at least one non-white-noisy frequency.

[If somebody knows a really good book or papers that cover other cases
than the trivial one mentioned above, I am very interested to hear about
them, BTW.]

If you have another kind of noise (such as blue or pink noise), things
become even worse.

Uwe Ligges



On Friday 16 September 2005 09:28, you wrote:
  

Sebastian Leuzinger wrote:


Hello, has anybody got a simple recepie to test the significance level
of the peaks after using spectrum() ?
  

What is you null hypothesis?

- Kind of noise?
- One particular frequency is noisy or all noisy?
- ...

Uwe Ligges



(R-version 2.0.1, linux SuSE9.3)
  


  



-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
   --  Mahdi Elmandjra




-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Multivariate Skew Normal distribution

2005-09-02 Thread Kjetil Brinchmann Halvorsen

(Ted Harding) wrote:

On 01-Sep-05 Caio Lucidius Naberezny Azevedo wrote:
  

Hi all,
 
Could anyone tell me if there is any package (or function) that
generates values from a multivariate skew normal distribution?
 
Thanks all,
 
Caio



Hello, Caio

Please tell us what you mean by skew normal distribution.

Since normal (i.e. gaussian) distributions are not skew, you
  

Well, but then somebody (Azzalini?) coined the term skew-normal, which 
you can
read about athttp://azzalini.stat.unipd.it//SN
or simply do
library(help=sn) # after installing from CRAN.
This family is obtained by skewing a normal family, hence the name.
You can also skew a t -family or whatever other symmetric family you like.

I found this usefull for modelling.

Kjetil

presumably mean something different from what you said, so
unless we understand this more clearly  it is unlikely that
anyone can make a suggestion which would meet your needs.

With best wishes,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 01-Sep-05   Time: 15:02:37
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



  



-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
   --  Mahdi Elmandjra





-- 
Internal Virus Database is out-of-date.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] coercing created variables into a new data frame using na.omit()

2005-08-12 Thread Kjetil Brinchmann Halvorsen

Prof Brian Ripley wrote:

I don't know if you can read your message, but I find it exceedingly 
difficult and there seem to be several typos.  Please use the space and 
return keys ... and only send a message once.

You problem is perhaps that you are not looking at the data frame, but at 
the variable in the workspace.  attach()ing data frames is convenient but 
error-prone (as you have found).  rm(new.variable) should solve this, but 
it is better to cultivate a different style.  For example

with(data.frame1, {
# commands to create value
data.frame1$new.variable - value
})
data.frame3 - na.omit(data.frame1)

  

That cannot possible work, as assignment within with is local to
with's environment. I have used superassigmnent for this (-), but that 
cannot possible
be a good style?

Look at the following:

  test - data.frame( a=1:5, b=1:5)
  test
  a b
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
  with(test, test$c - 1:5)
  test
  a b
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
  with(test, test$c - 1:5)
  test
  a b c
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5

So what is the best style her?

Kjetil

I think too that the creation of the value can be vectorized simply, 
generalizing something like

value - (age1 - 7)*(weight - mw)


On Fri, 12 Aug 2005, sp219 wrote:

  

Hi,
I am an R newbie and one thing I am having trouble with binding variables that
I have created within one data frame into a new data frame when using
na.omit(). To illustrate this problem I will give the example I am working on
and the approah I have been using:-
data.frame1-filepath
attach(data.frame1)
#create a new variable using a function
new.variable-rep(1,length(weight3))
for (x in 1:length(new.variable))
{f-age1[x]-7)*(weight[x]-mw))+((age2[x]-7)*(weight2[x]-mw))+((age3[x]-7)*
(weight3[x]-mw)))/(((age1[x]-7)^2)+((age2[x]-7)^2)+((age3[x]-7)^2)));
new.variable[x]-f}
#then bind it into the existing old data frame
data.frame2-cbind(data.frame1,newvariable)
rm(dat.frame1)
attach(data.frame2)
#everything o.k. so far but now the problem part... I basically want to remove
all the rows with NA in the new data frame including corresponding rows in the
new variable
data.frame3-na.omit(data.frame2)
rm(data.frame2)
attach(data.frame3)
length of new.variable has not changed but the length of all the other
variables in data.frame2 has?
Could someone please provide an explanation or an alternative route if
possible?
Any suggestions much appreciated,
Thankyou, Simon Pickett

Simon Pickett
Centre for Ecology and Conservation Biology
University of Exeter in Cornwall
Tremough Campus
Penryn
Cornwall
TR10 9EZ UK
Tel: 01326371852

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




  



-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
   --  Mahdi Elmandjra





-- 
Internal Virus Database is out-of-date.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] computationally singular

2005-08-10 Thread Kjetil Brinchmann Halvorsen

Weiwei Shi wrote:

I think the problem might be caused two variables are very correlated.
Should I check the cov matrix and try to delete some?
But i am just not quite sure of your reply. Could you detail it with some 
steps?

thanks,
  

Why not do principal component analysis? To identify the zero variance
linear combination(s) look at the nzero eigenvalues.  Also, it *might* 
make sense
to calculate a  mahalanobis distance replacing the matrix inverse with a
pseudoinverse.

Kjetil


weiwei

On 8/8/05, Christian Hennig [EMAIL PROTECTED] wrote:
  

Once I had a situation where the reason was that the variables were
scaled to extremely different magnitudes. 1e-25 is a *very* small number
but still there is some probability that it may help to look up standard
deviations and to multiply the
variable with the smallest st.dev. with 1e20 or something.

Best,
Christian

On Mon, 8 Aug 2005, Weiwei Shi wrote:



Hi,
I have a dataset which has around 138 variables and 30,000 cases. I am
trying to calculate a mahalanobis distance matrix for them and my
procedure is like this:

Suppose my data is stored in mymatrix
  

S-cov(mymatrix) # this is fine
D-sapply(1:nrow(mymatrix), function(i) mahalanobis(mymatrix, mymatrix[i,], 
S))


Error in solve.default(cov, ...) : system is computationally singular:
reciprocal condition number = 1.09501e-25

I understand the error message but I don't know how to trace down
which variables caused this so that I can sacrifice them if there
are not a lot. Again, not sure if it is due to some variables and not
sure if dropping variables is a good idea either.

Thanks for help,

weiwei


--
Weiwei Shi, Ph.D

Did you always know?
No, I did not. But I believed...
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

  

*** NEW ADDRESS! ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche





  



-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
   --  Mahdi Elmandjra





-- 
Internal Virus Database is out-of-date.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] use of NA's

2005-08-06 Thread Kjetil Brinchmann Halvorsen

Robert Kinley wrote:

with NAs it's always safest to use the construction 

if(is.na(foo))

rather than

if(foo==NA)

cheers  Bob Kinley


  

And if it is not clear why, try:

  NA == NA
[1] NA
 
 Kjetil


tom wright [EMAIL PROTECTED] 
Sent by: [EMAIL PROTECTED]
05/08/2005 12:30

To
r-help@stat.math.ethz.ch
cc

Subject
[R] use of NA's






Can someone please explain why this works: 

  

d-c(0,2,3,2,0,3,4,0,0,0,0,0)
d.mat-matrix(data=d,nrow=4,ncol=3,byrow=TRUE)
for(i in 1:length(d.mat[1,])){
  

+ d.mat[,i][d.mat[,i]==0]-mean(d.mat[,i][d.mat[,i]0])
+ }



Whereas: 

  

d-c(0,2,3,2,0,3,4,0,0,0,0,0)
d.mat-matrix(data=d,nrow=4,ncol=3,byrow=TRUE)
d.mat[d.mat==0]-NA
for(i in 1:length(d.mat[1,])){


+ d.mat[,i][d.mat[,i]==NA]-mean(d.mat[,i],na.rm=TRUE)
+ }
dosnt

Thanks
Tom

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


   [[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



  



-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
   --  Mahdi Elmandjra





-- 
Internal Virus Database is out-of-date.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Binary outcome with non-absorbing outcome state

2005-07-29 Thread Kjetil Brinchmann Halvorsen

John Sorkin wrote:

I am trying to model data in which subjects are followed through time to
determine if they fall, or do not fall. Some of the subjects fall once,
some fall several times. Follow-up time varies from subject to subject.
I know how to model time to the first fall (e.g. Cox Proportional
Hazards, Kaplan-Meir analyses, etc.) but I am not sure how I can model
the data if I include the data for those subjects who fall more than
once. I would appreciate suggestions about a models that I could use,
how I would quantify the follow-up time, how I account for the imbalance
in the data (some subjects would contribute one outcome measure, others
multiple measures), etc.
 
Many thanks,
John   
 
John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC and
University of Maryland School of Medicine Claude Pepper OAIC
 
University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
 
410-605-7119 
-- NOTE NEW EMAIL ADDRESS:
[EMAIL PROTECTED]

   [[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



  

help.search(recurrent)
leads to CRAN pakage   survrec.
You couls also have a look at CRAN package eha and at 
Lindsey's package event

-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
   --  Mahdi Elmandjra





-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] comparing strength of association instead of strength of evidence?

2005-07-10 Thread Kjetil Brinchmann Halvorsen

Weiwei Shi wrote:

Dear all:
I still need some further help since I think the question itself might
be very interesting (i hope so:) :
the question is on chisq.test, my concern is which criteria should be
used here to evaluate the independence. The reason i use this old
subject of the email is, b/c I think the problem here is about how to
look at p.value, which evaluate the strength of evidence instead of
association. If i am wrong, please correct me.

the result looks like this:
   index   word.comb id in.class0 in.class1  p.value odds.ratio
1  1  TOTAL|LAID 54|241 2 4 0.0004997501 0.00736433
2  2 THEFT|RECOV  52|53 40751   146 0.0004997501 4.17127643
3  3  BOLL|ACCID  10|21 36825  1202 0.0004997501 0.44178546
4  4  LAB|VANDAL   8|55 24192   429 0.0004997501 0.82876099
5  5 VANDAL|CAUS  55|59   80164 0.0004997501 0.18405918
6  6AI|TOTAL   9|54  194945 0.0034982509 0.63766766
7  7AI|RECOV   9|53  238561 0.0004997501 0.57547012
8  8 THEFT|TOTAL  52|54 33651   110 0.0004997501 4.56174408

the target is to look for best subset of word.comb to differentiate
between class0 and class1. p.value is obtained via
p.chisq.sim[i] - as.double(chisq.test(tab, sim=TRUE, B=myB)$p.value)
and B=2 (I increased B and it won't help. the margin here is
class0=2162792
class1=31859
)

So, in conclusion, which one I should use first, odds.ratio or p.value
to find the best subset.

  

If your goal is to discriminate between two different classes, why not 
calculate directly
a measure of discriminative ability, such as probability of correct 
classification?

Kjetil

I read some on feature selection in text categorization (A comparative
study on feature selection in text categorization might be a good
ref.). Anyone here has other suggestions?

thanks,

weiwei


On 6/24/05, Gabor Grothendieck [EMAIL PROTECTED] wrote:
  

On 6/24/05, Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] wrote:


Weiwei Shi wrote:

  

Hi,
I asked this question before, which was hidden in a bunch of
questions. I repharse it here and hope I can get some help this time:

I have 2 contingency tables which have the same group variable Y. I
want to compare the strength of association between X1/Y and X2/Y. I
am not sure if comparing p-values IS the way  even though the
probability of seeing such weird observation under H0 defines
p-value and it might relate to the strength of association somehow.
But I read the following statement from Alan Agresti's An
Introduction to Categorical Data Analysis :
Chi-squared tests simply indicate the degree of EVIDENCE for an
associationIt is sensible to decompose chi-squared into
components, study residuals, and estimate parameters such as odds
ratios that describe the STRENGTH OF ASSOCIATION.





Here are some things you can do:

  tab1-array(c(11266, 125, 2151526, 31734), dim=c(2,2))

  tab2-array(c(43571, 52, 2119221, 31807), dim=c(2,2))
  library(epitools) # on CRAN
  ?odds.ratio
Help for 'odds.ratio' is shown in the browser
  library(help=epitools) # on CRAN
  tab1
 [,1][,2]
[1,] 11266 2151526
[2,]   125   31734
  odds.ratio(11266, 125, 2151526, 31734)
Error in fisher.test(tab) : FEXACT error 40.
Out of workspace. # so this are evidently for tables
with smaller counts
  library(vcd) # on CRAN

  ?oddsratio
Help for 'oddsratio' is shown in the browser
  oddsratio( tab1)  # really is logodds ratio
[1] 0.2807548
  plot(oddsratio( tab1) )
  library(help=vcd) # on CRAN  Read this for many nice functions.
  fourfoldplot(tab1)
  mosaicplot(tab1) # not really usefull for this table

Also has a look at function Crosstable in package gmodels.

To decompose the chisqure you can program yourselves:

decomp.chi - function(tab) {
   rows -  rowSums(tab)
   cols -   colSums(tab)
   N -   sum(rows)
E - rows %o% cols / N
contrib - (tab-E)^2/E
contrib }


  decomp.chi(tab1)
 [,1] [,2]
[1,] 0.1451026 0.0007570624
[2,] 9.8504915 0.0513942218
 

So you can easily see what cell contributes most to the overall chisquared.

Kjetil





  

Can I do this decomposition in R for the following example including
2 contingency tables?





tab1-array(c(11266, 125, 2151526, 31734), dim=c(2,2))
tab1


  

 [,1][,2]
[1,] 11266 2151526
[2,]   125   31734





tab2-array(c(43571, 52, 2119221, 31807), dim=c(2,2))
tab2


  

 [,1][,2]
[1,] 43571 2119221
[2,]52   31807



Here are a few more ways of doing this using chisq.test,
glm and assocplot:



## chisq.test ###
  

tab1.chisq - chisq.test(tab1)
  

# decomposition of chisq
resid(tab1.chisq)^2
  

  [,1] [,2]
[1,] 0.1451026 0.0007570624
[2,] 9.8504915 0.0513942218



# same
with(tab1.chisq, (observed - expected)^2/expected)
  

  [,1] [,2]
[1,] 0.1451026

Re: [R] What method I should to use for these data?

2005-07-07 Thread Kjetil Brinchmann Halvorsen

luan_sheng wrote:

-Original Message-
From: luan_sheng [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 07, 2005 9:46 PM
To: (r-help@stat.math.ethz.ch)
Cc: ([EMAIL PROTECTED])
Subject: What method I should to use for these data?

Dear R user:

I am studying the allele data of two populations.
the following is the data:

   a1  a2  a3  a4  a5  a6  a7  a8  a9
a10a11 a12 a13 a14 a15 a16 a17
pop1   0.0217  0.  0.0109  0.0435  0.0435  0.  0.0109  0.0543
0.1739 0.0761  0.1413  0.1522  0.1087  0.0870  0.0435  0.0217  0.0109 
pop2   0.0213  0.0213  0.  0.  0.  0.0426  0.1702  0.2128
0.1596 0.1809  0.0957  0.0745  0.0106  0.0106  0.  0.  0. 

a1,a2,a3 .. a17 is the frequency of 17 alleles , the sum is 1. I want to
test  the significance of the distribution of 17 alleles between two
populations. How can I do? I want to use chisquare, is is right for these
data ?

If you want to use chisquare, you need the counts and not only the 
proportions.
If that is right can be answered only if we know your hypothesis.

Kjetil

can anyone  help me ? Thanks!!

luan
 Yellow Sea Fisheries Research Institute , Chinese Academy of Fishery
Sciences , Qingdao , 266071

__

ÑÅ»¢Ãâ·ÑGÓÊÏä£ÖÐ¹úµÚÒ»¾øÎÞÀ¬»øÓÊ¼þÉ§ÈÅ³¬´óÓÊÏä

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
   --  Mahdi Elmandjra

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] rlm/M/MM-estimator questions

2005-07-06 Thread Kjetil Brinchmann Halvorsen

Christian Hennig wrote:

Hi list,

1) How can the MM-estimator method=MM in function rlm be tuned to 85%
efficiency? It seems that there is a default tuning to 95%. I presume, but
am not sure, that the MM-estimator uses phi=phi.bisquare as default and
the tuning constant could be set by adding a parameter c=...
Is this true? Which value to use for 85%?
(In principle I should be able to figure that out theoretically, but it
would be much easier if somebody already knew the constant or a
straightforward way to compute it.)
  

I have done this once, but cannot find the code or remember the 
constant. But given the constant, it is easy to do this
in R. rlm has an argument psi with default psi huber:

  psi.huber
function (u, k = 1.345, deriv = 0)
{
if (!deriv)
return(pmin(1, k/abs(u)))
abs(u) = k
}
environment: namespace:MASS

Use this argument with   psi=function(u, k= your.value, deriv=0) 
psi.huber(u,k,deriv)

2) The M-estimator with bisquare uses rescaled MAD of the residuals as
scale estimator according to the rlm help page. Does this mean that a
scale estimator is used which is computed from least squares residuals? Are
M-estimator and residual scale estimator iterated until convergence of
them both? (Does this converge?)


Not sure about this at the moment.

 Or what else? What does rescaled mean?
  


rescaled means multiplied with the constant which makes it a 
consistent estimator
under the normal model, default in the R mad function

Kjetil

Thank you,
Christian


*** NEW ADDRESS! ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



  



-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
   --  Mahdi Elmandjra





-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] comparing strength of association instead of strength of evidence?

2005-06-24 Thread Kjetil Brinchmann Halvorsen

Weiwei Shi wrote:

Hi,
I asked this question before, which was hidden in a bunch of
questions. I repharse it here and hope I can get some help this time:

I have 2 contingency tables which have the same group variable Y. I
want to compare the strength of association between X1/Y and X2/Y. I
am not sure if comparing p-values IS the way  even though the
probability of seeing such weird observation under H0 defines
p-value and it might relate to the strength of association somehow.
But I read the following statement from Alan Agresti's An
Introduction to Categorical Data Analysis :
Chi-squared tests simply indicate the degree of EVIDENCE for an
associationIt is sensible to decompose chi-squared into
components, study residuals, and estimate parameters such as odds
ratios that describe the STRENGTH OF ASSOCIATION.

  

Here are some things you can do:

  tab1-array(c(11266, 125, 2151526, 31734), dim=c(2,2))

  tab2-array(c(43571, 52, 2119221, 31807), dim=c(2,2))
  library(epitools) # on CRAN
  ?odds.ratio
Help for 'odds.ratio' is shown in the browser
  library(help=epitools) # on CRAN
  tab1
  [,1][,2]
[1,] 11266 2151526
[2,]   125   31734
  odds.ratio(11266, 125, 2151526, 31734)
Error in fisher.test(tab) : FEXACT error 40.
Out of workspace. # so this are evidently for tables 
with smaller counts
  library(vcd) # on CRAN

  ?oddsratio
Help for 'oddsratio' is shown in the browser
  oddsratio( tab1)  # really is logodds ratio
[1] 0.2807548
  plot(oddsratio( tab1) )
  library(help=vcd) # on CRAN  Read this for many nice functions.
  fourfoldplot(tab1)
  mosaicplot(tab1) # not really usefull for this table

Also has a look at function Crosstable in package gmodels.

To decompose the chisqure you can program yourselves:

decomp.chi - function(tab) {
rows -  rowSums(tab)
cols -   colSums(tab)
N -   sum(rows)
 E - rows %o% cols / N
 contrib - (tab-E)^2/E
 contrib }


  decomp.chi(tab1)
  [,1] [,2]
[1,] 0.1451026 0.0007570624
[2,] 9.8504915 0.0513942218
 

So you can easily see what cell contributes most to the overall chisquared.

Kjetil





Can I do this decomposition in R for the following example including
2 contingency tables?

  

tab1-array(c(11266, 125, 2151526, 31734), dim=c(2,2))
tab1


  [,1][,2]
[1,] 11266 2151526
[2,]   125   31734

  

tab2-array(c(43571, 52, 2119221, 31807), dim=c(2,2))
tab2


  [,1][,2]
[1,] 43571 2119221
[2,]52   31807


BTW, is there some good forum on the theory of statistics? r-help is a
good one but I don't want to bother people by asking some questions
weakly associated with R here.

Thanks,

  



-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
   --  Mahdi Elmandjra





-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] chisq test and fisher exact test

2005-06-22 Thread Kjetil Brinchmann Halvorsen

Weiwei Shi wrote:

Hi,
I have a text mining project and currently I am working on feature
generation/selection part.
My plan is selecting a set of words or word combinations which have
better discriminant capability than other words in telling the group
id's (2 classes in this case) for a dataset which has 2,000,000
documents.

One approach is using contrast-set association rule mining while the
other is using chisqr or fisher exact test.

An example which has 3 contingency tables for 3 words as followed
(word coded by number):
  

tab[,,1:3]


, , 1

  [,1][,2]
[1,] 11266 2151526
[2,]   125   31734

, , 2

  [,1][,2]
[1,] 43571 2119221
[2,]52   31807

, , 3

 [,1][,2]
[1,]  427 2162365
[2,]5   31854


I have some questions on this:
1. What's the thumb of rule to use chisq test instead of Fisher exact
test. I have a  vague memory which said for each cell, the count needs
to be over 50 if chisq instead of fisher exact test is going to be
used. In the case of word 3,  I think I should use fisher test.
However, running chisq like below is fine:
  

tab[,,3]


 [,1][,2]
[1,]  427 2162365
[2,]5   31854
  

chisq.test(tab[,,3])



Pearson's Chi-squared test with Yates' continuity correction

data:  tab[, , 3]
X-squared = 0.0963, df = 1, p-value = 0.7564

but running on the whole set of words (including 14240 words) has the
following warnings:
  

p.chisq-as.double(lapply(1:N, function(i) chisq.test(tab[,,i])$p.value))


There were 50 or more warnings (use warnings() to see the first 50)
  

warnings()


Warning messages:
1: Chi-squared approximation may be incorrect in: chisq.test(tab[, , i])
2: Chi-squared approximation may be incorrect in: chisq.test(tab[, , i])
3: Chi-squared approximation may be incorrect in: chisq.test(tab[, , i])
4: Chi-squared approximation may be incorrect in: chisq.test(tab[, , i])


2. So, my second question is, is this warning b/c I am against the
assumption of using chisq. But why Word 3 is fine? How to trace the
warning to see which word caused this warning?

3. My result looks like this (after some mapping treating from number
id to word and some words are stemmed here, like ACCID is accident):
  of[1:50,]
  map...2.  p.fisher
21   ACCID  0.00e+00
30  CD  0.00e+00
67ROCK  0.00e+00
104  CRACK  0.00e+00
111   CHIP  0.00e+00
179  GLASS  0.00e+00
84BACK 4.199878e-291
395   DRIVEABL 5.335989e-287
60 CAP 9.405235e-285
262 WINDSHIELD 2.691641e-254
13  IV 3.905186e-245
110 HZ 2.819713e-210
11CAMP 9.086768e-207
2  SHATTER 5.273994e-202
297ALP 1.678521e-177
162BED 1.822031e-173
249BCD 1.398391e-160
493   RACK 4.178617e-156
59CAUS 7.539031e-147

3.1 question: Should I use two-sided test instead of one-sided for
fisher test? I read some material which suggests using two-sided.

3.2 A big question: Even though the result looks very promising since
this is case of classiying fraud cases and the words selected by this
approach make sense. However, I think p-values here just indicate the
strength to reject null hypothesis, not the strength of association
between word and class of document. So, what kind of statistics I
should use here to evaluate the strength of association? odds ratio?

Any suggestions are welcome!

Thanks!
  

You can use chisq.test with sim=TRUE, or call it as usual first, see if 
there is a warning, and then recall
with sim=TRUE.

Kjetil

-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
   --  Mahdi Elmandjra




-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] glm with variance = mu+theta*mu^2?

2005-06-11 Thread Kjetil Brinchmann Halvorsen

Spencer Graves wrote:

   How might you fit a generalized linear model (glm) with variance 
 = mu+theta*mu^2 (where mu = mean of the exponential family random 
 variable and theta is a parameter to be estimated)?

   This appears in Table 2.7 of Fahrmeir and Tutz (2001) 
 Multivariate Statisticial Modeling Based on Generalized Linear Models, 
 2nd ed. (Springer, p. 60), where they compare log-linear model fits 
 to cellular differentiation data based on quasi-likelihoods between 
 variance = phi*mu (quasi-Poisson), variance = phi*mu^2 
 (quasi-exponential), and variance = mu+theta*mu^2.  The quasi 
 function accepted for the family argument in glm generates functions 
 variance, validmu, and dev.resids.  I can probably write 
 functions  to mimic the quasi function.  However, I have two 
 questions in regard to this:

   (1) I don't know what to use for dev.resids.  This may not 
 matter for fitting.  I can try a couple of different things to see if 
 it matters.

   (2) Might someone else suggest something different, e.g., using 
 something like optim to solve an appropriate quasi-score function?

   Thanks,
   spencer graves

Since nobody has answerd this I will try. The variance function 
mu+theta*mu^2 is the variance function
of the negative binomial family. If this variance function is used to 
construct a quasi-likelihood, the resulting quasi-
likelihood is identical to the negative binomial likelihood, so for 
fitting we can simly use glm.nb from MASS, which
will give the correct estimated values. However, in a quasi-likelihood 
setting the (co)varince estimation from
glm.nb is not appropriate, and from the book (fahrmeir ..) it seems that 
the estimation method used is a
sandwich estimator, so we can try the sandwich package.  This works but 
the numerical results are somewhat different from  the book. Any 
comments on this?

my code follows:

  library(Fahrmeir)
  library(help=Fahrmeir)
  library(MASS)
   cells.negbin - glm(y~TNF+IFN+TNF:IFN, data=cells,
 family=negative.binomial(1/0.215))
  summary(cells.negbin)

Call:
glm(formula = y ~ TNF + IFN + TNF:IFN, family = negative.binomial(1/0.215),
data = cells)

Deviance Residuals:
Min   1Q   Median   3Q  Max 
-1.6714  -0.8301  -0.2153   0.4802   1.4282 

Coefficients:
   Estimate  Std. Error t value Pr(|t|)   
(Intercept)  3.39874495  0.18791125  18.087  4.5e-10 ***
TNF  0.01616136  0.00360569   4.482  0.00075 ***
IFN  0.00935690  0.00359010   2.606  0.02296 * 
TNF:IFN -0.5910  0.7002  -0.844  0.41515   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(4.6512) family taken to be 
1.012271)

Null deviance: 46.156  on 15  degrees of freedom
Residual deviance: 12.661  on 12  degrees of freedom
AIC: 155.49

Number of Fisher Scoring iterations: 5

  confint(cells.negbin)
Waiting for profiling to be done...
2.5 %   97.5 %
(Intercept)  3.0383197319 3.7890206510
TNF  0.0091335087 0.0238915483
IFN  0.0023292566 0.0170195707
TNF:IFN -0.0001996824 0.960427
  library(sandwich)
Loading required package: zoo
  vcovHC( cells.negbin )
   (Intercept)  TNF  IFN   
TNF:IFN
(Intercept)  0.01176249372 -0.0001279740135 -0.0001488223001  
0.0212541999
TNF -0.00012797401  0.039017282  0.021242875 
-0.0019793137
IFN -0.00014882230  0.021242875  0.054314079 
-0.0013277626
TNF:IFN  0.0212542 -0.001979314 -0.001327763  
0.0002370104
  cov2cor(vcovHC( cells.negbin ))
(Intercept)TNFIFNTNF:IFN
(Intercept)   1.000 -0.5973702 -0.5887923  0.1272950
TNF  -0.5973702  1.000  0.4614542 -0.6508822
IFN  -0.5887923  0.4614542  1.000 -0.3700671
TNF:IFN   0.1272950 -0.6508822 -0.3700671  1.000
  cells.negbin2 - glm.nb( y~TNF+IFN+TNF:IFN, data=cells)
  summary(cells.negbin)

Call:
glm(formula = y ~ TNF + IFN + TNF:IFN, family = negative.binomial(1/0.215),
data = cells)

Deviance Residuals:
Min   1Q   Median   3Q  Max 
-1.6714  -0.8301  -0.2153   0.4802   1.4282 

Coefficients:
   Estimate  Std. Error t value Pr(|t|)   
(Intercept)  3.39874495  0.18791125  18.087  4.5e-10 ***
TNF  0.01616136  0.00360569   4.482  0.00075 ***
IFN  0.00935690  0.00359010   2.606  0.02296 * 
TNF:IFN -0.5910  0.7002  -0.844  0.41515   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(4.6512) family taken to be 
1.012271)

Null deviance: 46.156  on 15  degrees of freedom
Residual deviance: 12.661  on 12  degrees of freedom
AIC: 155.49

Number of Fisher Scoring iterations: 5

  confint( cells.negbin2 )
Waiting for profiling to be done...
2.5 %97.5 %
(Intercept)  3.0864669072

Re: [R] Replacing for loop with tapply!?

2005-06-10 Thread Kjetil Brinchmann Halvorsen

Sander Oom wrote:

Dear all,

We have a large data set with temperature data for weather stations 
across the globe (15000 stations).

For each station, we need to calculate the number of days a certain 
temperature is exceeded.

So far we used the following S code, where mat88 is a matrix containing 
rows of 365 daily temperatures for each of 15000 weather stations:

   m - 37
   n - 2
   outmat88 - matrix(0, ncol = 4, nrow = nrow(mat88))
   for(i in 1:nrow(mat88)) {
   # i - 3
   row1 - as.data.frame(df88[i,  ])
   temprow37 - select.rows(row1, row1  m)
   temprow39 - select.rows(row1, row1  m + n)
   temprow41 - select.rows(row1, row1  m + 2 * n)
   outmat88[i, 1] - max(row1, na.rm = T)
   outmat88[i, 2] - count.rows(temprow37)
   outmat88[i, 3] - count.rows(temprow39)
   outmat88[i, 4] - count.rows(temprow41)
   }
   outmat88

  

What you need is not tapply but apply. Something like
   apply(mat88, 1, function(x) sum(x  30))

where your treshold should replace 30 and the `1' refers to rows. For 
multiple tresholds:

apply(mat88, 1, function(x) c( sum(x20), sum(x25), sum(x30)))

Kjetil

We have transferred the data to a more potent Linux box running R, but 
still hope to speed up the code.

I know a for loop should be avoided when looking for speed. I also know 
the answer is in something like tapply, but my understanding of these 
commands is still to limited to see the solution. Could someone show me 
the way!?

Thanks in advance,

Sander.
  



-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
   --  Mahdi Elmandjra





-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Random seed problem in MCMC coupling of chains

2005-06-08 Thread Kjetil Brinchmann Halvorsen


Gorjanc Gregor wrote:


Thanks to Paul and Gabor for additional tips/examples. Actually, I find
Pauls suggestion with setRNG also nice and is exactly what I wanted. 
Paul, if I understand this correctly, your suggestion with setRNG does not

alter RNG flow, it just takes care that chains really have equal seeds.
I remember that I have read somewhere that destroying RNG flow over and
over to get real randomness is not a good idea. Can someone confirm this?
 


That's in Brian Ripley's Simulation book, and certainly in other places.

Kjetil


niter - 3
nchain - 2
for (i in 1:niter) { # iterations
 tmpSeed - setRNG()
 for (j in 1:nchain) { # chains
   setRNG(tmpSeed)
   a - runif(1)
   cat(iter:, i, chain:, j, runif:, a, \n)
 }
}

iter: 1 chain: 1 runif: 0.8160078
iter: 1 chain: 2 runif: 0.8160078
iter: 2 chain: 1 runif: 0.4909793
iter: 2 chain: 2 runif: 0.4909793
iter: 3 chain: 1 runif: 0.4425924
iter: 3 chain: 2 runif: 0.4425924

[... removed other stuff ...]

Lep pozdrav / With regards,
   Gregor Gorjanc

--
University of Ljubljana
Biotechnical FacultyURI: http://www.bfro.uni-lj.si/MR/ggorjan
Zootechnical Department mail: gregor.gorjanc at bfro.uni-lj.si
Groblje 3   tel: +386 (0)1 72 17 861
SI-1230 Domzale fax: +386 (0)1 72 17 888
Slovenia, Europe
--
One must learn by doing the thing; for though you think you know it,
you have no certainty until you try. Sophocles ~ 450 B.C.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



 




--

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra




--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] R and MLE

2005-06-07 Thread Kjetil Brinchmann Halvorsen


Ajay Narottam Shah wrote:


I learned R  MLE in the last few days. It is great! I wrote up my
explorations as

 http://www.mayin.org/ajayshah/KB/R/mle/mle.html

I will be most happy if R gurus will look at this and comment on how
it can be improved.



I have a few specific questions:

* Should one use optim() or should one use stats4::mle()?

 I felt that mle() wasn't adding much value compared with optim, and
 in addition, I wasn't able to marry my likelihood functions to it.

* One very nice feature of mle() is that you can specify a few
 parameters which should be fixed in the estimation. How can one
 persuade optim() to behave like that?

 

give optim()  a function to optimize which do not depend on those 
parameters ...



* Can one use deriv() and friends to get analytical derivatives of
 these likelihood functions? I found I wasn't able to make headway
 when I was using vector/matrix notation. I think the greatness of R
 lies in a lovely vector/matrix notation, and it seems like a shame
 to have to not use that when trying to do deriv().

* For iid problems, the computation of the likelihood function and
 it's gradient vector are inherently parallelisable. How would one go
 about doing this within R?

 


Kjetil

--

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra





--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] function and apply

2005-06-05 Thread Kjetil Brinchmann Halvorsen


jose silva wrote:


Dear all


 

Its very difficult to read your post due to all the blank lines, so I 
cannot really

read it very well, but what about writing your function as (not tried)
fun - function(x) c(x[1], 
or if that doesn't work look into
?mapply

But you should really look into the settings of your mail program!

Kjetil




 





I think my problem is not complicated but I'm having difficulties to solve it.




v is a vector: v=c(p1 , p2 , p3 , p4), and f  is a function: f : v - w , where 





w=c(p1 , p2*(1-p1) , p3*(1-p2)*(1-p1) , p4*(1-p3)*(1-p2)*(1-p1))









I write the function f as:




f- function(w,x,y,z) {c(w,x*(1-w),y*(1-x)*(1-w),z*(1-y)*(1-x)*(1-w))}




f(a,b,c,d) it works well.









But now I want to apply f to each row of a data frame with 4 columns:




d-data.frame(a=seq(1,10), b=seq(11,20), c=seq(21,30),d=seq(31,40))




t(apply(d,1,f)) is not working?




I think each element of each row is not corresponding to w,x,y,z and now I'm 
lost?









Can someone help me?









thks









J. Silva
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



 




--

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra




--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] geometric mean regression

2005-06-03 Thread Kjetil Brinchmann Halvorsen


Poizot Emmanuel wrote:


Hi,

is it possible to perform a geometric mean regression with R ?
Thanks.

As has been said on this list before, This is R, there is no if, only 
how,


but if you actually wanted to ask how it is possible, it would help if
you explained what is geometric mean regression.

Kjetil



Emmanuel Poizot
Cnam/Intechmer
B.P. 324
50103 Cherbourg Cedex

Phone (Direct) : (00 33)(0)233887342
Fax : (00 33)(0)233887339




__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



No virus found in this incoming message.
Checked by AVG Anti-Virus.
Version: 7.0.322 / Virus Database: 267.4.0 - Release Date: 01/06/2005
 




--

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra




--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Dynamic Dictionary Data Type?

2005-06-02 Thread Kjetil Brinchmann Halvorsen


Prof Brian Ripley wrote:


On Thu, 2 Jun 2005, hadley wickham wrote:

An environment is a hash table, and copying occurs only on 
modification,

when any language would have to copy in this context.




Caveat: the default is new.env(hash=FALSE), so an environment is a 
hash table in one sense but not necessarily so in the strict sense.



Yes, I'm aware that copying only occurs on modification.  However, it
was my understanding that

a - list(a =1)
a$b - 4

would create a new copy of a,



Thomas was talking about environments, not lists (and $ works 
differently for them).



whereas in Java say

HashMap a = new HashMap();
a.put(a, 1);
a.put(b, 2);

wouldn't create a copy of a. (please bear in mind my java syntax is 
very rusty!)


Caching data implies updating at each step, thus possibly creating n
copies of the list.  Is that wrong?



It depends what you mean by `copy'.  If you expand a hash table you at 
some point need to re-arrange the hashing of the entries.  That's what 
will happen with an R environment or an R list too.  The possible 
difference is that you might expect the table to have some room for 
expansion, and in your list example you did not give any.


R's operations on lists make more copies than are strictly necessary, 
but it is not clear that this is more costly than the housekeeping 
that would otherwise be necessary.  In a$b - 4, the wrapper VECSXP is 
recreated and the pointers copied across, but the list elements are 
not copied.  For
a$a - 4 it is probable that no copying is done (although if a2 - a 
had been done previously, the pending recursive copy would then be done).
(It is also possible that I have overlooked something in the rather 
complex code used to do these subassignments.)


The original poster asked for caching of results. here is an example 
using new.env():


memo - function(fun) {
  mem - new.env() 
  function(x) {
  if (exists(as.character(x), envir=mem)) get(as.character(x), 
envir=mem, inherits=FALSE) else {

  val - fun(x)
  assign(as.character(x), value=val, envir=mem)
  val }
  }
}

 fib - memo(function(n) if(n=1) 1 else fib(n-1)+fib(n-2))
 system.time( fib(300) )
[1] 0.01 0.00 0.02   NA   NA

ls(get(mem, env=environment(fib)))
  *output supressed*

To compare:
system.time( {fib2 - function(n)if(n=1)1 else 
fib2(n-1)+fib2(n-2);fib2(30)})

[1]  8.07  0.08 12.75NANA


(there is (at least) one problem with this solution: if the global 
workspace contains a

variable `6`, it gives error. Why?)

Kjetil

--

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra




--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Dynamic Dictionary Data Type?

2005-06-02 Thread Kjetil Brinchmann Halvorsen


Duncan Murdoch wrote:


Gabor Grothendieck wrote:


On 6/2/05, Prof Brian Ripley [EMAIL PROTECTED] wrote:


On Thu, 2 Jun 2005, hadley wickham wrote:


An environment is a hash table, and copying occurs only on 
modification,

when any language would have to copy in this context.




Caveat: the default is new.env(hash=FALSE), so an environment is a hash
table in one sense but not necessarily so in the strict sense.




Can you expand on this?  When would one use hash = TRUE vs.
the default?



It's an optimization question.  hash = TRUE uses more memory and takes 
longer to set up, but will give faster searches in big environments. 
The current default assumes that environments are small so the effort 
of building the hash table is not worthwhile, and it does linear 
searches.


I suspect that we might have the default wrong (or perhaps should make 
the transition from linear to hash search automatically at a certain 
threshold size), but I haven't done the testing necessary to show this.


Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html





An example, with my example in a post some m inutes ago:

memo - function(fun, hash) {
  mem - new.env(hash=hash) 
  function(x) {
  if (exists(as.character(x), envir=mem)) get(as.character(x), 
envir=mem, inherits=FALSE) else {

  val - fun(x)
  assign(as.character(x), value=val, envir=mem)
  val }
  }
}

 fib1 - memo( function(n) if(n=1)1 else fib1(n-1)+fib1(n-2), TRUE)
 fib2 - memo( function(n) if(n=1)1 else fib2(n-1)+fib2(n-2), FALSE)
 system.time( fib1(400) )
[1] 0.06 0.00 1.06   NA   NA
 system.time( fib2(400) )
[1] 0.00 0.00 1.44   NA   NA

Kjetil

--

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra




--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Incompatibility with VGAM

2005-05-29 Thread Kjetil Brinchmann Halvorsen


I just discovered that when the VGAM package (not on CRAN) is loaded,
glm() doesn't work. This is because VGAM defines a family function() 
which gets found

by glm() in place of the family function from stats.
Then VGAM:::family returns an object which doesn't have a $family 
component, (it has a component

$vfamily).

I thought namespaces should protect us from this happening?

Kjetil

--

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra





--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] how to make legends on histograms

2005-05-28 Thread Kjetil Brinchmann Halvorsen


Andreas Zankl wrote:

I have a histogram with histograms for several datasets superimposed. 
How can I add a legend that indicates which dataset uses which linetype?


Thanks
Andreas


?legend
?locator

--

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra




--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] comparing glm models - lower AIC but insignificant coefficients

2005-05-23 Thread Kjetil Brinchmann Halvorsen


Constantinos Antoniou wrote:


Hello,

I am a new R user and I am trying to estimate some generalized linear  
models (glm). I am trying to compare a model with a gaussian  
distribution and an identity link function, and a poisson model with  
a log link function. My problem is that while the gaussian model has  
significantly lower (i.e. better) AIC (Akaike Information  
Criterion) most of the coefficients are not significant. On the other  
hand, the poisson model has a higher (i.e. worse) AIC, but almost  
all the coefficients are extremely significant (expect for one that  
still has p=0.07).


Summary output of the two models follows... [sorry for the large  
number of independent variables, but the issue is less pronounced  
with fewer covariates].


My question is two-fold:
- AIC supposedly can be used to compare non-nested models (although  
there are concerns and I have also seen a couple in this list's  
archives). Is this a case where AIC is not a good measure to compare  
the two models? If so, is there another measure (besides choosing the  
model with the significant coefficients)? [These are time-series  
data, so I am also looking at acf/pacf of the residuals].


The topic of using AIC to compare non-nested models have been discussed 
on the list, please search. But even
if AIC can be used to compare non-nested models, the AIC as calculated 
by R is not suited. The AIC
includes an arbitrary additive constant, as the log-likelihood does. And 
this additive constant
depend usually on constants in the density which are inconsequential for 
AIC, and may be omitted.


And even if they were included, it seem doubtfull to me that this would 
help for comparision of Poisson and normal
models, since the underlying measure is different! The experts can 
comment on that.


That said, I would tend to use Poisson if I had count data and a poisson 
model looks remotely sensible. That
will give a more interpretable model, which seems more important than 
purely data-analytic considerations.
And lasstly, if the poisson assumptions seems reasonable, there will be 
a non-constant variance, and if you use
a normal model you should use weighted least squares or tran sform the 
response (square root). If you try that, maybe you will
see that the normal model give lower p-values for the coefficients. Also 
make a plot of residuals versus

fitted value!

- Could the very high significance of the coefficients in the poisson  
model hint at some issue?


Maybe that the model fits better than the normal?

Kjetil



Thanking you in advance,

Costas


+++
POISSON - LOG LINK
+++


Call:
glm(formula = TotalDeadInjured[3:48] ~ -1 + Month[3:48] + sin(pi *
Month[3:48]/6) + cos(pi * Month[3:48]/6) + sin(pi * Month[3:48]/ 
12) +

cos(pi * Month[3:48]/12) + ThousandCars[3:48] + monthcycle[3:48] +
TotalDeadInjured[1:46] + I((TotalDeadInjured[1:46])^2) +
I((TotalDeadInjured[1:46])^3), family = poisson(link = log))

Deviance Residuals:
Min   1Q   Median   3Q  Max
-3.6900  -1.1901  -0.1847   0.9477   4.3967

Coefficients:
Estimate Std. Error z value Pr(|z|)
Month[3:48]   -7.712e-02  5.530e-03 -13.947   2e-16 ***
sin(pi * Month[3:48]/6)   -1.419e-01  2.759e-02  -5.144 2.68e-07 ***
cos(pi * Month[3:48]/6)   -8.407e-02  1.799e-02  -4.672 2.99e-06 ***
sin(pi * Month[3:48]/12)  -2.776e-02  1.558e-02  -1.782 0.074702 .
cos(pi * Month[3:48]/12)   5.195e-02  1.608e-02   3.232 0.001231 **
ThousandCars[3:48] 2.733e-02  2.255e-03  12.118   2e-16 ***
monthcycle[3:48]   6.307e-02  6.546e-03   9.635   2e-16 ***
TotalDeadInjured[1:46]-2.925e-02  8.460e-03  -3.457 0.000546 ***
I((TotalDeadInjured[1:46])^2)  1.218e-04  3.613e-05   3.370 0.000750 ***
I((TotalDeadInjured[1:46])^3) -1.640e-07  4.961e-08  -3.306 0.000946 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 78694.70  on 46  degrees of freedom
Residual deviance:   130.03  on 36  degrees of freedom
AIC: 476.08

Number of Fisher Scoring iterations: 4

+
GAUSSIAN
++

Call:
glm(formula = TotalDeadInjured[3:48] ~ -1 + Month[3:48] + sin(pi *
Month[3:48]/6) + cos(pi * Month[3:48]/6) + sin(pi * Month[3:48]/ 
12) +

cos(pi * Month[3:48]/12) + ThousandCars[3:48] + monthcycle[3:48] +
TotalDeadInjured[1:46] + I((TotalDeadInjured[1:46])^2) +
I((TotalDeadInjured[1:46])^3), family = gaussian(link = identity))

Deviance Residuals:
Min   1Q   Median   3Q  Max
-61.326  -12.012   -1.756   14.204   78.991

Coefficients:
Estimate Std. Error t value Pr(|t|)
Month[3:48]   -8.111e+00  2.115e+00  -3.835 0.000487 ***
sin(pi * Month[3:48]/6)   -2.639e+01  1.095e+01  -2.409 0.021246 *
cos(pi * Month[3:48]/6)

Re: [R] Suppressing warning messages

2005-05-15 Thread Kjetil Brinchmann Halvorsen

Tolga Uzuner wrote:
How do I suppress the following ?
Warning messages:
1: the condition has length  1 and only the first element will be used
in: if (strike == forward) atmvol(forward, t, alpha, beta, rho, upsilon)
else {
2: the condition has length  1 and only the first element will be used
in: if (x(z) == 0) 1 else z/x(z)
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


Maybe better to understand what generates the warning!
To assure you are uninformed, say
options(warn=-1)
Kjetil
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] eigenvalues of a circulant matrix

2005-05-03 Thread Kjetil Brinchmann Halvorsen

Globe Trotter wrote:
Good point: the Bellman reference is a book:
Introduction to Matrix Analysis by Bellman (1960). McGraw-Hill Series in Matrix
Theory.
 

--- and republished much later by SIAM.
Kjetil
--- Robin Hankin [EMAIL PROTECTED] wrote:
 

Hi everyone.
The following webpage gives a definition of circulant matrix, which 
agrees with the
definition given in the magic package.

http://mathworld.wolfram.com/CirculantMatrix.html
best  wishes
rksh

On May 3, 2005, at 08:06 am, Mulholland, Tom wrote:
   

Well since I know nothing about this topic I have lurked so far, but 
here's my two bob's worth.

Firstly I tried to make sense of Brian's initial reply. I have got no 
idea who Bellman is and you have not referenced (his/her) work in a 
way I can access the issues you refer to. So I assumed that's exactly 
what Brian was talking about.

Secondly.
toeplitz(1:4)
[,1] [,2] [,3] [,4]
[1,]1234
[2,]2123
[3,]3212
[4,]4321
require(magic)
circulant(4)
[,1] [,2] [,3] [,4]
[1,]1234
[2,]4123
[3,]3412
[4,]2341
So they are obviously two different things. Although I think you may 
have implied (not stated) that the particular combination you were 
using resulted in both being exactly the same.

It does appear as if in this case the (X) matrix is circulant. But 
then I'm no expert in even such simple things.

Then I had no idea where I was going. So I tried the variations in 
eigen.

I ran you code
x-scan(h:/t.txt)
y-x[c(109:216,1:108)]
X-toeplitz(y)
and then
 

X[is.na(X)]
   

numeric(0)
So I didn't get any NAs
t1 - eigen(X)$vectors
t2 - eigen(X,symmetric = TRUE)$vectors
 

identical(t1,t2)
   

[1] TRUE
 

Then
t2 - eigen(X,symmetric = TRUE,EISPACK = TRUE)$vectors
 

identical(t1,t2)
   

[1] FALSE
 

So there'e obviously more than one way of getting the vectors. Does 
the second one make more sense to you?

I also noticed in the eigen help that there are references to issues 
such as IEEE 754 arithmetic,(They may also differ between methods 
and between platforms.) and or Hermitian if complex. All of these 
are out of my competence but they do signal to me that there are 
issues which may relate to hardware, digital arithmetic and other 
things of that ilk.

I added the comment about complex because I have a vague idea that 
they are related to imaginary parts that you refer to.

So not coming to any conclusion that makes sense to me, and given that 
there are often threads about supposed inaccuracies that have answers 
such as the digits you see are not always what are held by the machine 
I set my options(digits = 22) and noticed that some of the numbers are 
still going at the 22 decimal place suggesting that the machine might 
be incapable of producing perfectly accurate results using digital 
arithmetic.

My other big sphere of ignorance is complex numbers.
So I tried
X-toeplitz(complex(real = y))
t1 - eigen(X)$vectors
 

t1[1:20]
   

[1]  0.068041577278880341+0i -0.068041577140546913+0i  
0.068041576864811659+0i -0.068041576452430155+0i
[5]  0.068041575907139579+0i -0.068041575231135451+0i  
0.068041574435267163+0i -0.068041573525828514+0i
[9]  0.068041572538722991+0i -0.068041571498323253+0i  
0.068041570619888622+0i -0.068041570256170081+0i
[13]  0.068041568759931989+0i -0.068041566476633147+0i  
0.068041563560502477+0i -0.06804156305007+0i
[17]  0.06804138765813+0i -0.068041549792984865+0i  
0.068041544123969511+0i -0.068041537810956801+0i
 

t2[1:20]
   

[1]  0.068041381743976906 -0.068041381743976850  0.068041381743976781 
-0.068041381743976753  0.068041381743976587
[6] -0.068041381743976725  0.068041381743976920 -0.068041381743976836 
0.068041381743976892 -0.068041381743976781
[11]  0.068041381743976781 -0.068041381743977392  0.068041381743976725 
-0.068041381743976753  0.068041381743976753
[16] -0.068041381743976698  0.068041381743976587 -0.068041381743976642 
0.068041381743976698 -0.068041381743976490
 

Which is again different. I have no idea what I'm doing but you do 
seem to get slightly different answers depending upon which method you 
use. I do not know if one is superior to the others or where one draws 
the line in terms of accuracy.

Tom
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Globe Trotter
Sent: Tuesday, 3 May 2005 10:51 AM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] eigenvalues of a circulant matrix
OK, here we go:
I am submitting two attachments. The first is the datafile
called kinv used to
create my circulant matrix, using the following commands:
x-scan(kinv)
y-x[c(109:1,0:108)]
X=toeplitz(y)
eigen(X)
write(X,ncol=216,file=test.dat)
reports the following columns full of NaN's: 18, 58, 194,
200. (Note that
eigen(X,symmetric=T) makes no difference and I get the same as above).
The second attachment contains only the eigenvectors obtained

Re: [R] How to prove R as good (Was: (no subject))

2005-05-03 Thread Kjetil Brinchmann Halvorsen

rene.raupp wrote:
Does anybory knows any work comparing R with other (charged) statistical softwares (like Minitab, SPSS, SAS)?
I work in a brasilian government bureau and I intend to use R as our preferable statistical software, but I have to show it's as good as the others. 

Sorry. That will be difficult. Could'nt it do to prove it is better?
Kjetil
I also intend to use Weka, and for this one I have the same problem.
Can anyone help me?
Thanks
René M. Raupp
e-mail: [EMAIL PROTECTED]
   [EMAIL PROTECTED]
[[alternative HTML version deleted]]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 


--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra


--
Internal Virus Database is out-of-date.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Roots of quadratic system.

2005-05-01 Thread Kjetil Brinchmann Halvorsen

John Janmaat wrote:
Hello,
I have a system of quadratic equations (results of a Hamiltonian optimization)
which I need to find the roots for.  Is there a package and/or function which
will find the roots for a quadratic system? 

Certainly you cxould use solve, see
?solve
Alternatively you could go for a computer algebra system with an 
implemantation
of groebner basis, and use an symbolic method.

Kjetil
Note that I am not opimizing, but
rather solving the first order conditions which come from a Hamiltonian.  I am
basically looking for something in R that will do the same thing as fsolve in
Matlab.
Thanks,
John.
==
Dr. John Janmaat
Department of Economics
Acadia University
Tel: 902-585-1461
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 


--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Warning from Rcmd check - data could not find data set

2005-04-29 Thread Kjetil Brinchmann Halvorsen

This is rw2010 from CRAN.
When running Rcmd check
on a package I get:
Warning in utils::data(list = al, envir = data_env) :
data set 'vowel.test' not found
Warning in utils::data(list = al, envir = data_env) :
data set 'vowel.train' not found
Warning in utils::data(list = al, envir = data_env) :
data set 'waveform.test' not found
Warning in utils::data(list = al, envir = data_env) :
data set 'waveform.train' not found
However, I have no problem with this when using the package.
This datasets are loaded, multiple datasets at a time, under another name.
data(vowel)  loads the two first in the list above. Could it be this
(which should be allowed, is mentioned in writing R extensions)
or is it something else, or a bug?
Kjetil
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] can test the if relationship is significant in cancor?

2005-04-16 Thread Kjetil Brinchmann Halvorsen

ronggui wrote:
i have try hard to find the answer by google,but i can not find any solution.
so i wan to ask:
1,can we test the if canonical relationship is significant after using cancor?
 

One reference is T. W. Anderson: An Introduction to Multivariate
   Statistical Analysis, second edition, pages 497-498.
2,if it can,how? 
 

Following the reference above:
cancor.test - function(obj, N){
  # obj is object returned from cancor
  # N is sample size, which is not contained in the cancor object!
  p1 - NROW(obj$xcoef)
  p2 - NROW(obj$ycoef)
  p - p1 + p2
  r - length(obj$cor)
  # Calculating Bartlett modification of minus twice log likelihood:
  bartlett -   -(N-0.5*(p+3))*sum( log( 1-obj$cor^2))
  # which is approximately chi-squared with p1p2 degrees of freedom:
  list(bartlett=bartlett, p.value=pchisq(bartlett, df=p1*p2, 
lower.tail=FALSE))
}

This tests if ALLl the canonical correlations are zero.  Anybody knows 
how good this approximation is,
and how dependent on multivariate normality?

Kjetil

3,if not,is it under-developed or there is not need to do it?or there is no 
good way to do it?
i hope my question is not too silly.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 


--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Data set for loglinear analysis

2005-04-04 Thread Kjetil Brinchmann Halvorsen

Warfield Jr., Joseph D. wrote:
Dear users
I need to perform a loglinear analysis of a real data set for a course
project.  I need a real data set with contingency tables in at least 3
dimensional, each with 
more than 2 levels.

Thanks
Joe Warfield  

[[alternative HTML version deleted]]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 

Do
data(package=datasets)
and look.
maybe data(UCBAdmissions)
Kjetil
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra

--
Internal Virus Database is out-of-date.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] custom loss function + nonlinear models

2005-04-04 Thread Kjetil Brinchmann Halvorsen

Christian Mora wrote:
Hi all;
I'm trying to fit a reparameterization of the
assymptotic regression model as that shown in
Ratkowsky (1990) page 96. 

Y~y1+(((y2-y1)*(1-((y2-y3)/(y3-y1))^(2*(X-x1)/(x2-x1/(1-((y2-y3)/(y3-y1))^2))
where y1,y2,y3 are expected-values for X=x1, X=x2, and
X=average(x1,x2), respectively.
I tried first with Statistica v7 by LS and
Gauss-Newton algorithm without success (no
convergence: predictors are redundant). Then I
tried with the option CUSTOM LOSS FUNCTION and several
algorithms like Quasi-Newton, Simplex, Hookes-Jeeves,
among others. In all these cases the model converged
to some values for the parameters in it.
My question is (after searching the help pages) : Is
there such a thing implemented in R or can it be
easily implemented? In other words, is it possible to
define which loss function to use and the algorithm to
find the parameters estimates? 

Thanks
Christian
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 

try directly with optim()
Kjetil
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra

--
Internal Virus Database is out-of-date.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] 'skewing' a normal random variable

2005-04-03 Thread Kjetil Brinchmann Halvorsen

Ashraf Chaudhary wrote:
Hi All;
The following question is directed more to statisticians on the list.
Suppose I have a normal random variable. Is there a transformation that I
can apply so that the new variable is slightly positively skewed. 
I greatly appreciate your help.
Ashraf

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 

look at the sn package on CRAN --- skew normal distributions, m
and the web page of its author.
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra

--
Internal Virus Database is out-of-date.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Where can I found the package ordinal ?

2005-03-27 Thread Kjetil Brinchmann Halvorsen

Peter Dalgaard wrote:
¢é ¨ [EMAIL PROTECTED] writes:
 

Hello,dear all: 

I want to install the package ordinal,but I don't see the package listed 
under package sources.
I try to search it by google,then I found this:
http://euridice.tue.nl/~plindsey/rlibs.html 

but the connect does not work.
Where can I found the package ordinal ?   Is it still available? 
   

Try
http://fdg-popgen146.unimaas.nl/~plindsey/rlibs.html
(I really don't know why I should even take the time Googling for Jim
 Pat's current whereabouts. They clearly do not believe in neither
version control, CRAN, nor in following standard terminology.)
 

By the way, ordinal does not compile under rw2010dev.
Kjetil
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Bivariate lognormal distribution

2005-03-26 Thread Kjetil Brinchmann Halvorsen

Vicky Landsman wrote:
Thanks to Prof. Ripley, Kjetil and Spencer Graves for help.
I will be more specific.
I have to simulate a bivariate lognormal pair (Y1,Y0) where E(Y1)=X'b, 
E(Y0)=X'd, Var(Y1)=c1, Var(Y0)=c0,
X is a data matrix, and b and d are vectors of parameters.
Vicky.
You did'nt specify the dependence!
Kjetil

- Original Message - From: Spencer Graves 
[EMAIL PROTECTED]
To: Prof Brian Ripley [EMAIL PROTECTED]
Cc: Vicky Landsman [EMAIL PROTECTED]; R-help list 
R-help@stat.math.ethz.ch
Sent: Friday, March 25, 2005 4:40 PM
Subject: Re: [R] Bivariate lognormal distribution


 I hope Professor Ripley will correct me if I'm mistaken, but the 
documentation for mvrnorm in library(MASS) says it will, Simulate 
from a Multivariate Normal Distribution.  If you want the density 
function or probabilities or quantiles, you can get those from 
library(mvtnorm).
 Just for completeness, to use normal for a lognormal, you need 
to take the logarithms of your number (which must be all positive;  
zeros and negative numbers become NA), then compute mean vector and 
variance matrix of the logs, compute probabilities on the log scale, 
then back transform by exponentiating to get the results back into 
the original scale.
 hope this helps.  spencer graves

Prof Brian Ripley wrote:
On Thu, 24 Mar 2005, Vicky Landsman wrote:
Is there a package that enables to create the bivariate log-normal 
variables?

Just exponentiate each of a bivariate normal pair.  You can get the 
latter from mvrnorm in package MASS.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html



--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] mixtures as outcome variables

2005-03-24 Thread Kjetil Brinchmann Halvorsen

Kjetil Brinchmann Halvorsen wrote:
Jason W. Martinez wrote:
Dear R-users,
I have an outcome variable and I'm unsure about how to treat it. Any
advice?
If you only concentrate on the relative proportions, this are called 
compositional data. I f your data are in
mydata (n x 4), you obtain compositions by
sweep(mydata, 1, apply(mydata, 1, sum), /)

There are not (AFAIK) specific functions/packages for R for 
compositional data AFAIK, but you
can try googling. Aitchison  has a monography (Chapman  Hall) and a 
paper in JRSS B.

One way to start might be lm's or anova on the symmetric logratio 
transform of the
compositons. The R function lm can take a multivariate response, but 
some extra programming will be needed
for interpretation. With simulated data:

 slr
function(y) { # y should sum to 1
 v - log(y)
 return( v - mean(v) ) }
 testdata - matrix( rgamma(120, 2,3), 30, 4)
 str(testdata)
num [1:30, 1:4] 0.200 0.414 0.311 2.145 0.233 ...
 comp - sweep(testdata, 1, apply(testdata,1,sum), /)
# To get the symmetric logratio transform:
comp - t(apply(comp, 1, slr))
# Observe:
apply(cov(comp), 1, sum)
[1] -5.551115e-17  2.775558e-17  5.551115e-17 -2.775558e-17
 lm( comp ~ 1)
Call:
lm(formula = comp ~ 1)
Coefficients:
[,1]  [,2]  [,3]  [,4]   (Intercept)   
0.17606   0.06165  -0.03783  -0.19988
Followup:
 mmod - manova(comp ~ x)
 summary(mmod)
Error in summary.manova(mmod) : residuals have rank 3  4

So the manova() function cannot be used. I guess MANOVA for 
compositional data should be
a straight extension, but it must be programmed , standard manova cannot 
be used.

Kjetil
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Bivariate lognormal distribution

2005-03-24 Thread Kjetil Brinchmann Halvorsen

Vicky Landsman wrote:
Dear experts! 
Is there a package that enables to create the bivariate log-normal variables? 
 

What do you mean by create? If you mean simulate, why not use mvrnorm 
from MASS, and
then exponentiate?

Kjetil
Thanks a lot, 
Vicky Landsman. 
	[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 


--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] mixtures as outcome variables

2005-03-23 Thread Kjetil Brinchmann Halvorsen

Jason W. Martinez wrote:
Dear R-users,
I have an outcome variable and I'm unsure about how to treat it. Any
advice?
I have spending data for each county in the state of California (N=58).
Each county has been allocated money to spend on any one of the
following four categories: A, B, C, and D.
Each county may spend the money in any way they see fit. This also means
that the county need not spend all the money that was allocated to them.
The data structure looks something like the one below:
COUNTYAB   C   DTotal

alameda  2534221  192 2835475  3063249  9988537
alpine   3174 850004555855232
amador0   000   0

The goal is to explain variation in spending patterns, which are
presumably the result of characteristics for each county.
I may treat the problem like a simple linear regression problem for each
category, but by definition, money spent in one category will take away
the amount of money that can be spent in any other category---and each
county is not allocated the same amount of money to spend.
I have constructed proportions of amount spent on each category and have
conducted quasibinomial regression, on each dependent outcome but that
does not seem very convincing to me. 

Would anyone have any advice about how to treat an outcome variable of
this sort?
Thanks for any hints!
Jason


 

If you only concentrate on the relative proportions, this are called 
compositional data. I f your data are in
mydata (n x 4), you obtain compositions by
sweep(mydata, 1, apply(mydata, 1, sum), /)

There are not (AFAIK) specific functions/packages for R for 
compositional data AFAIK, but you
can try googling. Aitchison  has a monography (Chapman  Hall) and a 
paper in JRSS B.

One way to start might be lm's or anova on the symmetric logratio 
transform of the
compositons. The R function lm can take a multivariate response, but 
some extra programming will be needed
for interpretation. With simulated data:

 slr
function(y) { # y should sum to 1
 v - log(y)
 return( v - mean(v) ) }
 testdata - matrix( rgamma(120, 2,3), 30, 4)
 str(testdata)
num [1:30, 1:4] 0.200 0.414 0.311 2.145 0.233 ...
 comp - sweep(testdata, 1, apply(testdata,1,sum), /)
# To get the symmetric logratio transform:
comp - t(apply(comp, 1, slr))
# Observe:
apply(cov(comp), 1, sum)
[1] -5.551115e-17  2.775558e-17  5.551115e-17 -2.775558e-17
 lm( comp ~ 1)
Call:
lm(formula = comp ~ 1)
Coefficients:
[,1]  [,2]  [,3]  [,4]   
(Intercept)   0.17606   0.06165  -0.03783  -0.19988

 summary(lm( comp ~ 1))
Response Y1 :
Call:
lm(formula = Y1 ~ 1)
Residuals:
Min   1Q   Median   3Q  Max
-1.29004 -0.46725 -0.07657  0.55834  1.20551
Coefficients:
Estimate Std. Error t value Pr(|t|)
[1,]   0.1761 0.1265   1.3910.175
Residual standard error: 0.6931 on 29 degrees of freedom
Response Y2 :
Call:
lm(formula = Y2 ~ 1)
Residuals:
   Min  1Q  Median  3Q Max
-1.2982 -0.5711 -0.1355  0.5424  1.6598
Coefficients:
Estimate Std. Error t value Pr(|t|)
[1,]  0.061650.150490.410.685
Residual standard error: 0.8242 on 29 degrees of freedom
Response Y3 :
Call:
lm(formula = Y3 ~ 1)
Residuals:
Min   1Q   Median   3Q  Max
-1.97529 -0.41115  0.03666  0.42785  0.88567
Coefficients:
Estimate Std. Error t value Pr(|t|)
[1,] -0.037830.11623  -0.3250.747
Residual standard error: 0.6366 on 29 degrees of freedom
Response Y4 :
Call:
lm(formula = Y4 ~ 1)
Residuals:
   Min  1Q  Median  3Q Max
-2.8513 -0.3955  0.2815  0.5939  1.2475
Coefficients:
Estimate Std. Error t value Pr(|t|)
[1,]  -0.1999 0.1620  -1.2340.227
Residual standard error: 0.8872 on 29 degrees of freedom
Sorry for not being of more help!
Kjetil
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Question on statistics

2005-03-23 Thread Kjetil Brinchmann Halvorsen

Roy Werkman wrote:
Yes, it is discrete, but the underlying distribution is Gaussian.
 

/ I guess you mean what somebody calls the superpopulation distribution.
Kjetil
/
Just got the following from a college:
Var(mean of finite population) = ((N - n)/(N - 1)) * var(population) / n
This should be it...
Greetings,
Roy
-Original Message-
From: Liaw, Andy [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, March 23, 2005 2:17 PM
To: Roy Werkman; r-help@stat.math.ethz.ch
Subject: RE: [R] Question on statistics

If the sample is drawn with replacement from the finite population, then
the usual formula applies (assuming iid samples); i.e., var(sample mean)
=
var(population) / n.
There's some problem in your description:  A finite population, I
believe, is necessarily discrete (since there are only N possible
values), so it can not be Gaussian (i.e., normal).
Andy
 

From: Roy Werkman
Ehh, by limited distribution, I meant to say a population of N points.
...
Hi,
Can anyone help me with the following (although not directly 
correlated to R functionality)? I have been looking on the internet 
but can not find the answer.

My question: what is the variation on the mean of a limited 
distribution (total N points normally distributed), when I have a 
small sample of that distribution (n  N)?

Your help would be very welcome.
Thanx,
Roy
--
The information contained in this communication and any\ 
att...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


   



--
Notice:  This e-mail message, together with any attachments, contains
information of Merck  Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates (which may be known
outside the United States as Merck Frosst, Merck Sharp  Dohme or MSD
and in Japan, as Banyu) that may be confidential, proprietary
copyrighted and/or legally privileged. It is intended solely for the use
of the individual or entity named on this message.  If you are not the
intended recipient, and have received this message in error, please
notify us immediately by reply e-mail and then delete it from your
system.

--
 


--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Generating Interaction Factors (combinations of Data Frame columns)

2005-03-21 Thread Kjetil Brinchmann Halvorsen

Thomas Hopper wrote:
I'm starting to do a fair amount of DOE in my day job and need to 
generate full- and fractional-factorial designs.

One of the things I'd like to do is generate all possible interaction 
effects, given the main effects. I've been searching through the 
documentation, packages and mail list archives, but the closest I can 
find are combin() in package combinat and combine() and combinations() 
in gregsmisc, none of which actually produces the results I want.

Given a data frame with columns labeled A, B, C and D, I would like to 
generate a data frame with columns that are the combination of each of 
the columns in the original data frame. The output columns would be 
A*B, A*C, A*D, A*E, A*B*C, A*B*D,..., A*B*C*D.

Alternatively, I'd want to generate the interactions for a given level 
(2-factor or 3-factor).

If such a function already exists, I'd be more than happy to use it.
If it doesn't, I can write it, but I would appreciate a little help 
with the algorithm for generating the combinations...how do I loop 
through the given factors to generate all possible combinations?

Thanks,
Tom
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


library(BHH2) # on CRAN
?ffDesMatrix
?ffFullMatrix
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Hazard function or cumulative Hazard function in R

2005-03-21 Thread Kjetil Brinchmann Halvorsen

yassir rabhi wrote:
  Hi, 
I'm student from canada, and i'work in survival
analysis.I want to know if there is a hazard function
or cumulative hazard function in R or not, i know how
to program it, but it is easy to use it if they exists
in R.
Thanks.
 Yassir

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 

library(survival)
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Basic questions about RMySQL

2005-03-18 Thread Kjetil Brinchmann Halvorsen

De la Vega Góngora Jorge wrote:
Hello,
Please forget me if I am asking something that is well documented. I have read
documentation but there are points that are not clear for me. I am not expert
in R nor Databases, but if someone direct me to a tutorial, I will appreciate
it..
1. In my understanding, I can install and use RMySQL withouth having to install MySQL in my PC, to have access to and to create new tables . Is this right?

I doubt very mucg if RMySQP will be of any use without having
MySQL installed!
Kjetil

2. I have created a c:\my.cnf file to access a database I have, but withouth
installing the server, where I can define the user, password and host to
establish a connection?
Thanks in advance
---
Jorge de la Vega Gongora | Telefono: (525) 5268 8379
Investigador | Fax: (525) 5268 8481
Banco de Mexico | email: [EMAIL PROTECTED]
Planeación y Programación de Emisión | web:http://www.stat.umn.edu/~jvega
Calzada Legaria 691 Módulo IV|
Col. Irrigación 11500|
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
-- Mahdi Elmandjra

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: leap years (Was: [R] lm and time series)

2005-03-04 Thread Kjetil Brinchmann Halvorsen

A followup:
How do people treat dsaily time series, when there is a yearly cycle?
For the moment I just left out the 29 of February's, but that dsoes'nt look
really good. Is the only way to include them to use irregular time series?
Kjetil
Gabor Grothendieck wrote:
From:   Matthieu Cornec [EMAIL PROTECTED]
 

I create a multivariate time series containing NA values (that could
come directly from an imported file,)
I want to compute a linear regression and obtain a time serie for both
residuals and fitted values. I have tried the trick ts.intersect,
without success.
Could you help me out of this?

Example:
y-ts(1:10+rnorm(10))
x-ts(1:10)
datats-cbind(y,lagx=lag(x))
Notice the datats could come directly from an imported file, that is
why I did not use ts.intersect(y,lagx=lag(x))
fit-lm(y~lagx,data=datats,na.action=na.omit)
but how do I get a time serie of residuals instead of a vector residuals(fit)?
##
Matthieu Cornec
   

ts is used for regular time series.  Removing NAs, other
than at the beginning or end, means its probably best to
model it as an irregular time series and so to use an
irregular time series package.  Below it is done in zoo.  
Also review the comments in my post to your previous question 
along these lines and, in particular, be sure you read the zoo vignette referenced there which has 15 pages of examples
of time series manipulations.

library(zoo)
# set up test data with NAs
set.seed(1)
x - zoo(1:10)
y - x + rnorm(10)
y[5] - x[2] - NA
# create multivariate zoo series without NAs
# Note: if you want to fill in NAs rather than omit them see ?na.locf
z - na.omit(merge(y, lagx = lag(x, -1)))
# run lm
# (This also works:   z.lm - lm(I(y ~ lagx), z)
# but the syntax is experimental.)
z.lm - lm(y ~ lagx, as.data.frame(z))
# get fitted and resid using fact that their time base is that of z
z.fit - z.resid - z[,1]
z.fit[] - fitted(z.lm)
z.resid[] - resid(z.lm)
# We can just use the zoo series already created.  Its not really
# necessary to convert it to ts but if for some reason we want a 
# ts series the following creates one.
# (This uses facts that we know y starts at 1 and is regularly spaced
# and other series have a subset of the time base of y.)
ts(coredata(merge(y, x, z.fit, z.resid)))

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 


--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Rconsole wishlist

2005-03-03 Thread Kjetil Brinchmann Halvorsen

Duncan Mackay wrote:
Sorry, yes, Rgui under WinXP (SP2). But while Windows date stamps the
history file
 

file.info(.Rhistory)
   

 size isdir mode   mtime   ctime
.Rhistory 5377 FALSE  666 2005-03-04 10:37:52 2005-03-04 10:37:52
   atime
.Rhistory 2005-03-04 13:54:11
the problem is that there can be multiple sessions stored in .Rhistory and
the session dates aren't stored there. Moreover, it seems to me that the
history buffer can also overflow without warning after long sessions or many
repeated sessions and so that you can inadvertently lose parts of your
command log.
Yes. But you can define an environment variable R_HISTSIZE (or some 
similar name, do a
R site search to find. Not defined on my machine now. That should really 
be in the help file for
savehistory()) to avoid the problem.


(Is this right, anyone?) Perhaps it would be preferable for R
to save each session's command history in a separate history file, along the
lines of 
 

NO. it is better to have just one file as now.
Kjetil

.Last - function() {
savefilename - paste(Rhistory,date())
savefilename - gsub( ,_,savefilename)
savefilename - gsub(:,.,savefilename)
savefilename - paste(savefilename,.txt,sep=)
if(interactive())  try(savehistory(savefilename))
cat(Current history saved in file: ,savefilename,\n)
}
but this doesn't address any overflow issues.
Duncan
-Original Message-
From: Liaw, Andy [mailto:[EMAIL PROTECTED] 
Sent: Friday, 4 March 2005 11:14 AM
To: 'Duncan Mackay'; R-news
Subject: RE: [R] Rconsole wishlist

I'm guessing you're talking about Rgui on Windows, but please don't leave us
guessing.
If you run R under Ess/(X)Emacs, you have the entire session that can be
saved in a (transcript) file.
Does your OS not put date stamps on file?
 

file.info(.Rhistory)
   

 size isdir mode   mtime   ctime
.Rhistory 1025 FALSE  666 2005-03-03 19:27:31 2004-08-13 10:45:09
   atime
.Rhistory 2005-03-03 19:27:31
Andy
 

From: Duncan Mackay
Hi all,
Wouldn't it be nice (??!!) if R automatically issued a
warning message when
the R console buffer was about to fill so that you could save all your
output into a text file? (I know about sink(), but I think it 
would be good
to have an easier mechanism to save a complete record of messages and
function output). And on a similar vein, wouldn't it also be nice if R
automatically entered a date stamp into the history file??

Duncan
*
Dr. Duncan Mackay
School of Biological Sciences
Flinders University
GPO Box 2100
Adelaide
S.A.5001
AUSTRALIA
Ph (08) 8201 2627FAX (08) 8201 3015
   

http://www.scieng.flinders.edu.au/biology/people/mackay_d/index.html
 

[[alternative HTML version deleted]]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html



--
Notice:  This e-mail message, together with any attachments,...{{dropped}}
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 


--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] draw random samples from empirical distribution

2005-03-01 Thread Kjetil Brinchmann Halvorsen

bogdan romocea wrote:
Dear useRs,
I have an empirical distribution (not normal etc) and I want to draw
random samples from it. One solution I can think of is to compute let's
say 100 quantiles, then use runif() to draw a random number Q between 1
and 100, and finally run runif() again to pull a random value from the
quantile Q. Is there perhaps a better/more elegant way of doing this?
Thank you,
b.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 

sample(mysample, n, replace=TRUE)
Kjetil
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Bayesian stepwise

2005-02-24 Thread Kjetil Brinchmann Halvorsen

Spencer Graves wrote:
 Does anyone know of research fully Bayesian stepwise procedures 
assuming that models not considered by the stepwise would essentially 
have zero posterior probability?
 I need to analyze the results of ad hoc experiments run in 
manufacturing with crazy confounding and possible supersaturation 
(i.e., more potentially explanatory variables than runs), when each 
run is very expensive in both time and money.   There have to be ways 
to summarize concisely and intelligently what the data can tell us and 
what remains uncertain,
What about chapter 8 section 5 page 363 in Wu/Hamada Experiments --- 
Planning , analysis, and Parameter
design Optimization, titled:
A Bayesian variable selection Strategy for Designs with complex Aliasing

Kjetil

including the level of partial confounding between alternative 
explanations.  I think I've gotten reasonable results with my own 
modification of Venables  Ripley's stepAIC to compute an approximate 
posterior over tested models using the AICc criterion described, e.g., 
by Burnham and Anderson (2002) Model Selection and Multi-Model 
Inference (Springer).  Preliminary simulations showed that when I used 
the naive prior (that all models are equally likely, including the 
null model), the null model is usually rejected when true.  What a 
surprise!  I think I can fix that using a more intelligent prior.  I 
also think I can evaluate the partial confounding between alternative 
models by studying the correlation matrix between the predictions of 
alternative models.
 Comments?
 Thanks,
 Spencer Graves

Frank E Harrell Jr wrote:
Smit, Robin wrote:
I am hoping to get some advise on the following:
 
I am looking for an automatic variable selection procedure to reduce 
the
number of potential predictor variables (~ 50) in a multiple regression
model.
 
I would be interested to use the forward stepwise regression using the
partial F test. I have looked into possible R-functions but could 
not find this
particular approach.  There is a function (stepAIC) that uses the 
Akaike criterion or Mallow's
Cp criterion. In addition, the drop1 and add1 functions came closest 
to what I want
but with them I cannot perform the required procedure. Do you have 
any ideas?  Kind regards,
Robin Smit

Business Unit TNO Automotive
Environmental Studies  Testing
PO Box 6033, 2600 JA Delft
THE NETHERLANDS

Robin,
If you are looking for a method that does not offer the best 
predictive accuracy and that violates every aspect of statistical 
inference, you are on the right track.  See 
http://www.stata.com/support/faqs/stat/stepwise.html for details.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html



--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Warnings by functions mean(), median()

2005-02-19 Thread Kjetil Brinchmann Halvorsen

[EMAIL PROTECTED] wrote:
Hello,
following functions doesnt work correct with my data: median(), 
geo.mean().

My datafiles contain more than 10.000 lines and six columns from a 
flow-cytometer-measurment. I need the arithmetic and geometric mean 
and median. For the calculation of the geometric mean i wrote 
following function:

fix(geo.mean)
function(x)
{
n-length(x)
gm-prod(x)^(1/n)
This is probably what gives the NaN below.
exp(mean(log(x)))
would be more to the point.
Kjetil
return(gm)
}
The function median() error tells me need numeric data. The data are 
numeric. The function geo.mean() gave out [1] NaN. What are the 
reasons and what are the solutions?

I'am a newbie and need urgently information.
Thanks.
Here is an short output with the results:
9997   385.42   68.54   9.82  124.09  23.93  138.24
9998   342.89   73.65 133.35 1134.19 345.99 1876.88
   316.23   76.35  48.26  421.70 129.80  873.79
1  291.64  103.66   6.85  107.46  26.42  189.38
100010.000.00   0.000.00   0.000.00
 mean(data)
  FSC   SSC   FL1   FL2  FL32   FL4
375.94880  73.76219  50.73413 434.42837 110.06393 637.34980
 geo.mean(data)
[1] NaN
 median(data)
Error in median(data) : need numeric data

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html



--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] (arbitrary) precision

2005-02-18 Thread Kjetil Brinchmann Halvorsen

Gabor Grothendieck wrote:
Michael Grottke Michael.Grottke at duke.edu writes:
: I am currently using R for fitting a model to various data sets 
: (minimizing the negative log-likelihood) and calculating a number of 
: metrics based on the parameter estimates. Within these calculations, I 
: have steps of the form
: 
: log(log(1+x)),
: 
: where x can be very small (e.g., around 1e-16). Unfortunately, the 
: precision of doubles does not suffice for coping with the difference in 
: the orders of magnitude of 1 and x: 1+x is rounded to 1.
: 
: One way for solving this problem seems to be to use an arbitrary 
: precision library implemented in C and call the respective routines for 
: calculating the logarithm(s) from within R.
: 
: My questions are as follows:
: 1. Is there any better/more direct way to solve the problem?
: 2. Is there any arbitrary precision library you can suggest in particular?
: 

The approximation log(1+x) = x would be accuate to several decimal
places in your case so your expression would reduce to 
log(log(1+x)) = log(x).

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 

see also   ?log1p
Kjetil
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Profiles graphics in a contingency table

2005-02-12 Thread Kjetil Brinchmann Halvorsen

Campo Elías PARDO wrote:
Dear Users,
How can I obtain a profiles graphic in a CT similar to Excel. 

Campo Elias  PARDO
[[alternative HTML version deleted]]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 

If you mean profiles as in correspondence analysis, just do:
( assuming your table is mytab)
N - sum(mytab)
Nr - rowSums(mytab)
Nc - colSums(mytab)
perfRow - mytab
for (i in 1:NROW(mytab)) }{
perfRow[i,] - perfRow[i,]/Nr[i] }
perfRow
and then you can just plot them, maybe using lines().
But have a look into ?mosaicplot
Kjetil
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Displaying a distribution -- was: Combining two histograms

2005-02-02 Thread Kjetil Brinchmann Halvorsen

Berton Gunter wrote:
Rug plots cannot show replicated values -- spikes at discrete values -- in
the data distribution. Quantile plots do.
 

But you can use rug(jitter(x))
Kjetil
-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
The business of the statistician is to catalyze the scientific learning
process.  - George E. P. Box

 

-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Gabor 
Grothendieck
Sent: Wednesday, February 02, 2005 9:20 AM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] Displaying a distribution -- was: Combining 
two histograms

Berton Gunter gunter.berton at gene.com writes:
: 
: May I take this off topic a little to seek collective 
wisdom (and so feel
: free to reply privately).
: 
: The catalyst is Deepayan's remark:
: 
:  Histograms were appropriate for drawing density estimates by 
:  hand in the  good old days, but I can imagine very few 
situations where I 
:  would not prefer to use smoother density estimates when I 
have the 
:  computational power to do so.
:  
:  Deepayan
: 
: Generally, I agree; but the appearance and thus one's perception and
: interpretation of both histograms and density plots depend upon the
: parameters chosen for the display (bin boundaries for 
histograms; bandwidth
: and kernel for density plots). 

If the problem is distrust of the procedure one can simulataneously
display the raw data with a rug plot, 

example(rug)
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

   

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 


--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Finding runs of TRUE in binary vector

2005-01-28 Thread Kjetil Brinchmann Halvorsen

Sean Davis wrote:
I have a binary vector and I want to find all regions of that vector 
that are runs of TRUE (or FALSE).

 a - rnorm(10)
 b - a0.5
 b
 [1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
My function would return something like a list:
region[[1]] 1,3
region[[2]] 5,5
region[[3]] 7,10
Any ideas besides looping and setting start and ends directly?
?rle

Thanks,
Sean
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html



--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Conflicts using Rcmdr, nlme and lme4

2005-01-28 Thread Kjetil Brinchmann Halvorsen

Douglas Bates wrote:

I can guarantee that this is the last time that I will fundamentally 
redesign the computational methods and data structures used in fitting 
mixed-effects models.  This is the fifth time I have done it from 
scratch and after this one I am quitting the business.

Hope that does'nt mean you are quitting completely from statistical 
computing and R?

Kjetil
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html



--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Data Simulation in R

2005-01-19 Thread Kjetil Brinchmann Halvorsen

Uwe Ligges wrote:
Doran, Harold wrote:
Thanks. But, I think I am doing that. I use rm() and gc() as the code
moves along. The datasets are stored as a list. Is there a way that I
can save the list object and call each dataset within a list one at a
time, or must the entire list be in memory at once?

The list is in memory - and must be to access its elements.
Either save the list elements to separate files, or even better make 
use of a database.

Uwe Ligges
Or, when the dat is simulated, why can't you just (re)-simulate the 
dataset just before using
it, then delete, but saving the random seed, so you can re-simulate if 
needed?

Kjetil


Harold
-Original Message-
From: Uwe Ligges [mailto:[EMAIL PROTECTED] Sent: 
Wednesday, January 19, 2005 5:51 AM
To: Doran, Harold
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Data Simulation in R

Doran, Harold wrote:

Dear List:
A few weeks ago I posted some questions regarding data simulation 
and received some very helpful comments, thank you. I have modified 
my code accordingly and have made some progress.

However, I now am facing a new challenge along similar lines. I am 
attempting to simulate 250 datasets and then run the data through a 
linear model. I use rm() and gc() as I move along to clean up the 
workspace and preserve memory. However, my aim is to use sample 
sizes of 5,000 and 10,000. By any measure this is a huge task.

My machine has 2GB RAM and a Pentium 4 2.8 GHz machine with Windows

XP.
I have the following in the target section of the Windows shortcut 
--max-mem-size=1812M

With such large samples, R is unable to perform the analysis, at least


with the code I have developed. It halts when it runs out of memory. A


collegue subsequently constructed the simulation using another 
software program with a similar computer and, while it took over night


(and then some), the program produced the results desired.
I am curious if it is the case that such large simulations are out 
of the grasp of R or if my code is not adequately organized to 
perform the simulation.

I would appreciate any thoughts or advice.


Don't hold all datasets (and results, if they are big) in the memory at
the same time!!!
Either generate them when you use them and delete them afterwards, or
save them to disc an only load one by one for further analyses.
Also, you might want to call gc() after you removed large objects...
Uwe Ligges


Harold

library(MASS)
library(nlme)
mu-c(100,150,200,250)
Sigma-matrix(c(400,80,80,80,80,400,80,80,80,80,400,80,80,80,80,400),4
,4
)
mu2-c(0,0,0)
Sigma2-diag(64,3)
sample.size-5000
N-250 #Number of datasets
#Take a single draw from VL distribution vl.error-mvrnorm(n=N, mu2, 
Sigma2)

#Step 1 Create Data
Data - lapply(seq(N), function(x)
as.data.frame(cbind(1:10,mvrnorm(n=sample.size, mu, Sigma
#Step 2 Add Vertical Linking Error
for(i in seq(along=Data)){
Data[[i]]$V6 - Data[[i]]$V2
Data[[i]]$V7 - Data[[i]]$V3 + vl.error[i,1]
Data[[i]]$V8 - Data[[i]]$V4 + vl.error[i,2]
Data[[i]]$V9 - Data[[i]]$V5 + vl.error[i,3] }
#Step 3 Restructure for Longitudinal Analysis long - lapply(Data, 
function(x) reshape(x, idvar=Data[[i]]$V1, 
varying=list(c(names(Data[[i]])[2:5]),c(names(Data[[i]])[6:9])),
v.names=c(score.1,score.2), direction=long))

#
#Clean up Workspace
rm(Data,vl.error)
gc()
#
# Step 4 Run GLS
glsrun1 - lapply(long, function(x) gls(score.1~I(time-1), data=x, 
correlation=corAR1(form=~1|V1), method='ML'))

# Extract intercepts and slopes
int1 - sapply(glsrun1, function(x) x$coefficient[1])
slo1 - sapply(glsrun1, function(x) x$coefficient[2])

#Clean up workspace
rm(glsrun1)
gc()
glsrun2 - lapply(long, function(x) gls(score.2~I(time-1), data=x, 
correlation=corAR1(form=~1|V1), method='ML'))

# Extract intercepts and slopes
int2 - sapply(glsrun2, function(x) x$coefficient[1])
slo2 - sapply(glsrun2, function(x) x$coefficient[2])
#Clean up workspace
rm(glsrun2)
gc()

# Print Results
cat(Original Standard Errors,\n, Intercept,\t, 
sd(int1),\n,Slope,\t,\t, sd(slo1),\n)

cat(Modified Standard Errors,\n, Intercept,\t, 
sd(int2),\n,Slope,\t,\t, sd(slo2),\n)

rm(list=ls())
gc()
[[alternative HTML version deleted]]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html



--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!

Re: [R] way off topic

2005-01-19 Thread Kjetil Brinchmann Halvorsen

Erin Hodgess wrote:
Dear R People:
Here is another off topic question, please:
Does anyone know where to find some archaelogical data (carbon
dating), please?
 

http://lib.stat.cmu.edu/DASL/
has archeology as a topic.
Kjetil
When I googled, I got reseach papers but no data.
Thanks,
Sincerely,
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: [EMAIL PROTECTED]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 


--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

1 2 3 >

1 - 100 of 256 matches

Mail list logo