Re: [R] AICc vs AIC for model selection
Spencer Graves wrote: Regarding AIC.c, have you tried RSiteSearch(AICc) and RSiteSearch(AIC.c)? This produced several comments that looked to me like they might help answer your question. Beyond that, I've never heard of the forecast package, and I got zero hits for RSiteSearch(best.arima), so I can't comment directly on your question. The forecast package is at http://www-personal.buseco.monash.edu.au/~hyndman/Rlibrary/forecast/ Kjetil Do you have only one series or multiple? If you have only one, I think it would be hard to justify more than a simple AR(1) model. Almost anything else would likely be overfitting. If you have multiple series, have you considered using 'lme' in the 'nlme' package? Are you familiar with Pinheiro and Bates (2000) Mixed-Effects Models in S and S-Plus (Springer)? If not, I encourage you to spend some quality time with this book. My study of it has been amply rewarded, and I believe yours will likely also. Best Wishes, Spencer Graves Sachin J wrote: Hi, I am using 'best.arima' function from forecast package to obtain point forecast for a time series data set. The documentation says it utilizes AIC value to select best ARIMA model. But in my case the sample size very small - 26 observations (demand data). Is it the right to use AIC value for model selection in this case. Should I use AICc instead of AIC. If so how can I modify best.arima function to change the selection creteria? Any pointers would be of great help. Thanx in advance. Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] combinatorial programming problem
Hola! I am programming a class (S3) symarray for storing the results of functions symmetric in its k arguments. Intended use is for association indices for more than two variables, for instance coresistivity against antibiotics. There is one programming problem I haven't solved, making an inverse of the index function indx() --- se code below. It could for instance return the original k indexes in strictly increasing order, to make indx() formally invertible. Any ideas? Kjetil Code: # Implementing an S3 class for symarrays with array rank r for dimension # [k, k, ..., k] with k=r repeated r times. We do not store the diagonal. # Storage requirement is given by {r, k}= choose(k, r) # where r=array rank, k=maximum index symarray - function(data=NA, dims=c(1,1)){ r - dims[1] k - dims[2] if(r k) stop(symarray needs dimension larger than array rank) len - choose(k, r) out - data[1:len] attr(out, dims) - dims class(out) - symarray out } # Index calculation: indx - function(inds, k){ r - length(inds) if(r==1) return(inds) else { if(inds[1]==1) { return( indx(inds[-1]-1, k-1 ) ) } else { return( indx(c(inds[1]-1, seq(from=k-r+2, by=1, to=k)), k) + indx( inds[-1]-inds[1], k-inds[1] )) } } } # end indx # Methods for assignment and indexing: [.symarray - function(x, inds, drop=FALSE){ dims - attr(x, dims) k - dims[2] inds - indx(inds, k) res - NextMethod([, x) res } [-.symarray - function(x, inds, value){ dims - attr(x, dims) k - dims[2] inds - indx(inds, k) res - NextMethod([-, x) res } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] access list component names with lapply
Thomas Girke wrote: I have a question regarding accessing the names of list components when applying a function with lapply. Here is an example that demonstrates what I'd like to do. I have a list like this one: mylist - list(a=letters[1:10], b=letters[10:1], c=letters[1:3]) This is how I am doing this: mapply(functuion(x,y)myfun(x, y),x, names(x))) Kjetil Now I would like to append the names of the list components to their corresponding vectors with the c() function. I thought this could be done like in the following command, but it doesn't: lapply(mylist, function(x) { c(names(x), x) } ) I know how to do this in a for loop, but lapply runs so much faster over large lists. Any help on this simple problem will be highly appreciated. Thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Intercepts in linear models.
Rolf Turner wrote: A colleague asked me if there is a way to specify with a ***variable*** (say ``cflag'') whether there is an intercept in a linear model. She had in mind something like lm(y ~ x - cflag) something like lm( if (cflag) y ~ x-0 else y ~ x, ... Kjetil where cflag could be 0 or 1; if it's 0 an intercept should be fitted, if it's 1 then no intercept. This doesn't work ``of course''. The cflag just gets treated as another predictor and because it is of the wrong length an error is generated. The best I could come up with was lm(as.formula(paste(y ~ x -,cflag))) Which is kludgy. It (of course!) also be done using an if statement. Is there a sexier way? cheers, Rolf Turner __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Classifying time series by shape over time
Andreas Neumann wrote: Dear all, I have hundreds of thousands of univariate time series of the form: character seriesid, vector of Date, vector of integer (some exemplary data is at the end of the mail) I am trying to find the ones which somehow have a shape over time that looks like the histogramm of a (skewed) normal distribution: hist(rnorm(200,10,2)) The mean is not interesting, i.e. it does not matter if the first nonzero observation happens in the 2. or the 40. month of observation. So all that matters is: They should start sometime, the hits per month increase, at some point they decrease and then they more or less disappear. Short Example (hits at consecutive months (Dates omitted)): 1. series: 0 0 0 2 5 8 20 42 30 19 6 1 0 0 0- Good 2. series: 0 3 8 9 20 6 0 3 25 67 7 1 0 4 60 20 10 0 4 - Bad Series 1 would be an ideal case of what I am looking for. Graphical inspection would be easy but is not an option due to the huge amount of series. Does function turnpoints)= in package pastecs help_ Kjetil Questions: 1. Which (if at all) of the many packages that handle time series is appropriate for my problem? 2. Which general approach seems to be the most straightforward and best supported by R? - Is there a way to test the time series directly (preferably)? - Or do I need to type-cast them as some kind of histogram data and then test against the pdf of e.g. a normal distribution (but how)? - Or something totally different? Thank you for your time, Andreas Neumann Data Examples (id1 is good, id2 is bad): id1 dates hits 1 2004-12-01 3 2 2005-01-01 4 3 2005-02-0110 4 2005-03-01 6 5 2005-04-0135 6 2005-05-0114 7 2005-06-0133 8 2005-07-0113 9 2005-08-01 3 10 2005-09-01 9 11 2005-10-01 8 12 2005-11-01 4 13 2005-12-01 3 id2 dates hits 1 2001-01-01 6 2 2001-02-01 5 3 2001-03-01 5 4 2001-04-01 6 5 2001-05-01 2 6 2001-06-01 5 7 2001-07-01 1 8 2001-08-01 6 9 2001-09-01 4 10 2001-10-0110 11 2001-11-01 0 12 2001-12-01 3 13 2002-01-01 6 14 2002-02-01 5 15 2002-03-01 1 16 2002-04-01 2 17 2002-05-01 4 18 2002-06-01 4 19 2002-07-01 0 20 2002-08-01 1 21 2002-09-01 0 22 2002-10-01 2 23 2002-11-01 2 24 2002-12-01 2 25 2003-01-01 2 26 2003-02-01 3 27 2003-03-01 7 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] reading in only one column from text file
mark salsburg wrote: How do I manipulate the read.table function to read in only the 2nd column??? Se the colClasses argument of read.table() Kjetil [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] what is scale function? Is it for variable transformation?
Kum-Hoe Hwang wrote: HOwdy I read R books about scale function for variable transformation. Acoording to this book scale function leads me to better regression results. Or am I worng? I hope somebody tell me about a scale function? Is it for variable transformation? Did you try to read ?scale ? Look at: x - rnorm(100, 5, 10) mean(x) [1] 4.304616 sd(x) [1] 9.926883 x - scale(x) mean(x) [1] 5.39499e-17 sd(x) [1] 1 Kjetil -- Kum-Hoe Hwang, Phone : 82-31-250-3516Email : [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] error function
Nongluck Klibbua wrote: hi all, Does anyone know which command to use for error function(erf)? Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html function erf if package (CRAN) NORMT3, as help.search(error function) could have told yoy. Kjetil __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Npmc for doing post-hoc after Kruskal
Farrel Buchinsky wrote: I followed the threads that enquired about doing post-hoc tests after doing Kruskal testing. It took me to npmc. But npmc is giving an output I do not understand. I noticed a thread entitled npmc function: 'x' must be atomic but there never appeared to be a resolution. npmc(npmcinput) Error in sort(unique.default(x), na.last = TRUE) : 'x' must be atomic Here is the dataframe I created and called npmcinput To prove that I have the data set up correctly I ran kruskal.test(OtoMax~grp) Kruskal-Wallis rank sum test data: OtoMax by grp Kruskal-Wallis chi-squared = 59.4028, df = 3, p-value = 7.885e-13 grp OtoMax 11 0 21 1 31 0 41 1 51 1 61 1 71 1 81 1 91 1 10 1 2 11 1 1 12 1 1 13 1 1 14 1 1 15 1 2 16 1 1 17 1 1 18 1 1 19 1 2 20 1 2 21 1 0 22 1 1 23 1 1 24 1 2 25 1 1 26 1 1 27 1 1 28 1 2 29 3 4 30 3 4 31 3 2 32 3 2 33 3 2 34 3 3 35 3 2 36 3 2 37 3 2 38 3 3 39 3 2 40 3 2 41 3 4 42 3 3 43 3 2 44 3 4 45 3 3 46 3 4 47 3 2 48 3 2 49 3 2 50 3 3 51 3 2 52 3 3 53 3 0 54 3 4 55 3 2 56 3 2 57 2 2 58 2 2 59 2 2 60 2 2 61 2 2 62 2 2 63 2 3 64 2 2 65 2 2 66 2 3 67 2 2 68 2 2 69 2 2 70 2 2 71 2 2 72 2 2 73 2 3 74 2 2 75 2 3 76 2 3 77 2 3 78 2 2 79 2 3 80 2 4 81 2 2 82 2 2 83 2 2 84 2 2 85 0 1 86 0 1 87 0 0 88 0 1 89 0 0 90 0 0 91 0 0 92 0 0 93 0 1 94 0 0 Subsequently I renamed grp to be class and OtoMax to be var and everything seemed to work. Is that really the way one has to do it? As a general pointer to my overall R capabilities- does the documentation say that one has to name them class and var? Yes, the documentation says so! It is easier to help if you can post your example data in a way which can be cutpasted into R, for instance by using dput and posting the result. Kjetil Has my e-mail transgressed R etiquette? Don't think so --- except for the missing output from dput()! Farrel Buchinsky, MD --- Mobile (412) 779-1073 Pediatric Otolaryngologist Allegheny General Hospital Pittsburgh, PA ** This email and any files transmitted with it are confidentia...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] CCF and Lag questions
Pallavi Naresh wrote: I am new to R and new to time series modeling. I have a set of variables (var1, var2, var3, var4, var5) for which I have historical yearly data. I am trying to use this data to produce a prediction of var1, 3 years into the future. I have a few basic questions: 1) I am able to read in my data, and convert it to a time series format using 'ts.' data_ts - ts(data, start = 1988, end = 2005, frequency = 1, deltat = 1) However, I am not able to refer to the individual columns or variables of my new time series object. For example, I am able to reference 'var1' by typing, data$var1, but I can not do the same by using data_ts$var1. I don't see how I can use data_ts without being able to reference the individual columns in my dataset. So you have a multiple time series. Type class(data_ts) to see. This is a matrix, not a data.frame so you need: data_ts[,var1] in place of data_ts$var1 2) Since I'm trying to build a multivariate time series model, I want to find the correlations of var1 with my other variables (var1, var2, ...etc.) and their lagged values. But since I'm trying to produce a forecast for 3 years into the future, I want to find the ccf between var1 and my other variables lagged 3 years. I tried doing: ccf(var1, lag(var2, 3)) Try ccf( data_ts[,var1], data_ts[,var2]) but I get the following error: Error in na.fail.default(ts.union(as.ts(x), as.ts(y))) : missing values in object So if you have missing values in the object try: ccf( data_ts[,var1], data_ts[,var2], na.action=na.pass) Does anyone know how to use the lag funciton and ccf together? Is it necessary? Kjetil 3) Suppose var1 and var2 are both of length 20. I would expect the correlation of the fourth lag of ccf(var1, var2) to be the same as lag zero of: ccf(var1[1:17], var2[4:20]), but they are not. Can someone explain why not? 4) How do I interpret the negative lags produced from the ccf function? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Increase Decimal Accuracy?
Ann Hess wrote: Is there any way to increase the decimal accuracy for probability distributions. For example (in R): 1-pchisq(90,5) [1] 0 But in Maple, I find that the value is 0.67193x10-17. Look at this: 1-pchisq(90,5) [1] 0 pchisq(90,5, lower=FALSE) [1] 6.71932e-18 Kjetil I need to compare some really small p-values...is there a way to increase the decimal place accuracy beyond 1x10-16 (which seems to be the limit)? Any help would be appreciated. Ann __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] linear lists, creation, insertion, deletion, traversal *someone?*
Christian Hoffmann wrote: Hi, In a second try I will ask this list to give me some useful pointers. Linear lists, as described e.g. by N.Wirth in Algorithms and Data Structures, seem not to be implemented in S/R, although in lisp we have cons, car, cdr. This has been discussed on the list a (long) time ago, so you can search the archives. I remember L Tierney opined they should be implementes as a (S4) class. Kjetil Nevertheless I want to implement an algorithm using such linear lists, porting a trusted program to R (S). I need: (from Modula/Oberon) pSC* = POINTER TO SC; SC* = RECORDNext*: pSC; .. ... END; # generate and fill a linked list: VAR p, pA, Kol: pSC; NEW( Kol); pA := Kol; while () { NEW( p ); pA.Next := p; pA := p; ... } Mapping this list management to R's list() is possible, but may not be so transparent. Insertions and deletions from a list may be more convoluted. Does anybody have experience in this? I could not find any reference in R resources, also the Blue Book is mute here. Thank you Christian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to do a proc summary in R?
Emilie Berthiaume wrote: Hi, I'm a SAS user trying to convert myself to R but I still have problems with some pretty simple commands. First I wanted to add up a number of red-tailed hawks seen per day ( julian day) per year. So I tried: RTyrday - tapply(RThr,list(year,julian),sum) Try: RT - as.data.frame.table(RTyrday) And then I tried the following regression: mod1 - glm(RTyrday~julian+year, family=gaussian (link=identity),data=RT) here it is simpler with lm() Kjetil Wich didn't work since my vector RTyrday and julian don't have the same length. My question is: How can I create a new data sheet with the output of my function tapply ? Something I could have done in SAS by giving an output out to my proc summary Thank you, Emilie Emilie Berthiaume Graduate Student Biology Department Sherbooke University Sherbrooke, Québec CANADA [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Compute a correlation matrix from an existing covariance matrix
Marc Bernard wrote: Dear All, I am wondering if there is an R function to convert a covariance matrix to a correlation matrix. I have a covariance matrix sigma and I want to compute the corresponding correlation matrix R from sigma. Does cov2cor do what you want? Kjetil Thank you very much, Bernard - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to get around heteroscedasticity with non-linear least squares in R?
Quin Wills wrote: I am using nls to fit dose-response curves but am not sure how to approach more robust regression in R to get around the problem of the my error showing increased variance with increasing dose. package sfsmisc has rnls (robust nls) which might be of use. Kjetil My understanding is that rlm or lqs would not be a good idea here. 'Fairly new to regression work, so apologies if I'm missing something obvious. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to read more than 1 table?
Atte Tenkanen wrote: Question 1) I want to read many csv-tables and run the same commands for all of them. Is there some simpler solution than this below to solve this problem? for (g in 1:6) { if (g==1){k=kt1_0057} if (g==2){k=kt1_0101} if (g==3){k=kt1_0613} if (g==4){k=staten} if (g==5){k=tenpenny} if (g==6){k=fiddrunk} TABLE=read.table(paste(/home/user/,k,.csv,sep=),sep = ,, na.strings=.,header=F,fill=TRUE); put your filenames in a character vector filenames: filenames - paste( c(kt1_0057, ...), .csv, sep=) tables - lapply( filenames, function(x) read.table(file=x, na.strings= ..., ...) ) and if you want to repeat the same task for your six tables do it with lapply() or sapply() print(TABLE) } Question 2) Is it possible to create new variables for example with the assistance of for-loop without initialising them beforehand? This way: sapply(1:10, function(i) your.task(i) ) sapply() will do the initialization for you! Kjetil __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] function for prediting garch
oliver wee wrote: hello, In my time series data, I was able to successfully fit its ARIMA model (Box-Jenkins) and its GARCH model and estimate their parameters. I was also able to forecast future values of the time series based on my fitted ARIMA model using the predict() function call. However, I'm not sure what is the correct function command to call in order to forecast future values of my time series using both the fitted ARIMA model and the fitted GARCH model. Using predict() didn't give me the result I was looking for. And I can't find any documentation using help.search, You should have given reproducible code! In my understanding, (g)arch is applied to an uncorrelated series without autocorrelastions, as the residuals from a properly estimated ARIMA model. So to get the predictions for the original series, you need to 1) predict with the ARIMA model 2) estimate a garch model to the residuals 3) predict the residuals 4) modify the prediction from 1) with the prediction from 3) Kjetil I think what I am looking for is akin to the garchsim and garchpred commands in Mathlab. Any help is appreciated. Thanks! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] read.table
Diethelm Wuertz wrote: Thanks a lot that works fine! Next problem, if I would have my own package, and the file test.csv would be located in the data directory How to use the function data to get data(test) Also put in the data subdirectory the file test.R with the commands to read test.csv Kjetil resulting in: test %y-%m-%d VALUE 1 1999-01-01 100 2 2000-12-31 999 Again Thanks in advance Diethelm Wuertz Phil Spector wrote: Look at the check.names= argument to read.table -- you want to set it to FALSE. But rememeber that you'l have to put quotes around the name whenever you use it, as in x$'%y-%m-%d' - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley [EMAIL PROTECTED] On Tue, 14 Feb 2006, Diethelm Wuertz wrote: I have a file named test.csv with the following 3 lines: %y-%m-%d;VALUE 1999-01-01;100 2000-12-31;999 read.table(test.csv, header = TRUE, sep = ;) delivers: X.y..m..d VALUE 1 1999-01-01 100 2 2000-12-31 999 I would like to see the following ... %y-%m-%d VALUE 1 1999-01-01 100 2 2000-12-31 999 Note, readLines(test.csv, 1) delivers [1] %y-%m-%d;VALUE Is this possible ??? Thanks DW __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] how I can perform Multivariate Garch analysis in R
Arun Kumar Saha wrote: Dear aDVISOR, Hope I am not disturbing you. Can you tell me how I can perform Multivariate Garch analysis in R. Also please, it is my humble request let me know some resource materials on Multivariate Garch analysis itself. You could tryhelp.search(garch) or RSiteSearch(garch) But for me this only leads to univariate garch (at least two implementations). Kjetil Sincerely yours, -- Arun Kumar Saha, M.Sc.[C.U.] S T A T I S T I C I A N[Analyst] Transgraph Consulting [www.transgraph.com] Hyderabad, INDIA Contact # Home: +91-033-25558038 Office: +91-040-55755003 Mobile: 919849957010 E-Mail: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] LCA in e1071
Lisa Wang wrote: Hello there, In the e1071 package,'lca' function only works for binary data for Latent class analysis. Am I right? I would like to test the agreement amongst 6 raters for nominal data ona scale from 1-4, and conduct a latent class analysis. What package package irr (on CRAN) might help. Kjetil and what function may I use? Thank you very much Lisa Wang Princess Margaret Hospital tel: 416 946 4501 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R: mean from list
[EMAIL PROTECTED] wrote: great!! thanks very much, mean(unlist(lapply(listdata, function(z) z$c))) WHY unlist(lapply(... when sapply(... is simpler? Kjetil works well. and what about getting the average table $a (displaying the average elements across all 1000 matrix)? could you please help me? I am struggling with this... thanks in advance Roberto mean(unlist(lapply(x, function(z) z$d))) should do the trick On Sun, 12 Feb 2006 20:06:12 +, [EMAIL PROTECTED] wrote: hi all, I have a simple problem that i am not able to solve. I've a list called datalist with the following structure: [...] [[10]] [[10]]$a -1 0 1 -1 31 5 2 0 6 7 5 1 1 7 36 [[10]]$b -1 0 1 -1 31 5 2 0 6 7 5 1 1 7 36 [[10]]$c [1] 0.855 [[10]]$d [1] 0.855 [...] with [[1]] ... [[100]]. How can i get the mean value of datalist[[x]]$d, where x represents all elements from 1 to 1000 ? thanks in advance Roberto Furlan University of Turin La mia Cartella di Posta in Arrivo è protetta da SPAMfighter 205 messaggi contenenti spam sono stati bloccati con successo. Scarica gratuitamente SPAMfighter! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Rajarshi Guha [EMAIL PROTECTED] http://jijo.cjb.net La mia Cartella di Posta in Arrivo è protetta da SPAMfighter 205 messaggi contenenti spam sono stati bloccati con successo. Scarica gratuitamente SPAMfighter! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] putting text in the corner
Fernando Mayer wrote: You can get the x and y position of any place of an open graphic device with the mouse cursor, using the function locator(), and even assing this values to an object, as in: xy-locator() You will have a list with x and y positions. Then you can use: text(xy$x,xy$y,...) or even text(locator(), test, ...) Kjetil See ?locator. HTH, Fernando Mayer. Thomas Steiner escreveu: I want to write some text in a corner of my plot. Is it possible to get the xlim and ylim of an open window? Or is there anything similar like legend(x=bottomright, inset=0.01,legend=...) for text(x=1,y=2, test) Thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] New user: Custom probability distribution
Jan Danielsson wrote: Hello, Given a probability function: p(x) = 12 / (25(x+1)) , x=0, 1, 2, 3 we generate the following values: C1 C2 0 0.48 1 0.24 2 0.16 3 0.12 Now, I'm supposed to create 50 random values using this table. In MiniTab, I simply selected Calc - Random Data - Discrete, and selected the columns, and it created 50 random values in a new column.[1] How do I do the same thing in R? sample( 0:3, 50, prob=c(0.48, 0.24, 0.26, 0.12)) Kjetil [1] You may be wondering why I'm telling you this. Well, it's because if I were in your shoes, I'd think Oh, someone wants me to solve his homework.. I have already solved it using MiniTab, but I want to be able to use R instead (since I prefer NetBSD). __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] rob var/cov + LAD regression
Angelo Secchi wrote: Hi, after looking around I was not able to get info about these two questions: 1. Is there a function to have a jackknifed corrected var/cov estimate (as described in MacKinnon and White 1985) in a standard OLS regression? Not sure, but look at package sandwich (on CRAN). 2. Does R possess a LAD (Least Absolute Deviation) regression function? This can be done with package quantreg (on CRAN). Kjetil Any help? Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] power and sample size for a GLM with poisson response variable
Craig A Faulhaber wrote: Hi all, I would like to estimate power and necessary sample size for a GLM with a response variable that has a poisson distribution. Do you have any suggestions for how I can do this in R? Thank you for your help. Sincerely, Craig package asypow (on CRAN) or simulation. Kjetil __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] can I do this with read.table??
Douglas Grove wrote: Hi, I'm trying to figure out if there's an automated way to get read.table to read in my data and *not* convert the character columns into anything, just leave them alone. What I'm referring ?Did you read the help page? What about argument as.is=TRUE? See also argument colClasses Kjetil to as 'character columns' are columns in the data that are quoted. For columns of alphabetic strings (that aren't TRUE or FALSE) I can suppress conversion to factor with as.is=TRUE, but what I'd like to stop is the conversion of quoted numbers of the form 01,02,..., into numeric form. By an 'automated way', I mean one that does not involve me having to know which columns in the data are the ones I want kept as they are. This doesn't seem like an unreasonable thing to want to do. After all, say I've got the data.frame: A - data.frame(a=1:3, b=I(c(01,02,03))) I can export this to a text file with the simple command write.table(A, A.txt, sep=\t, row.names=FALSE, quote=TRUE) but I cannot find an equally simple mechanism for reading this data back in from A.txt that allows me to reconstruct my data.frame 'A'. Is this an unreasonable thing to expect? Thanks, Doug __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R vs. Excel (R-squared)
Cleber N. Borges wrote: I was quite interested in this thread (discussion), once that I am chemistry student and I work with Mixtures Designs that are models without intercept. I thought quite attention the follow afirmation: ' Thus SST, the corrected total sum of squares, should be used when you have a model with an intercept term but the uncorrected total sum of squares should be used when you do not have an intercept term. ' (Douglas Bates) I have as reference a book called: Experiments with Mixtures: Designs, Models, and the Analysis of Mixture Data second edition John A. Cornell (Professor of Statistics in University Of Florida) In this book, pg 42: item 2.7 - THE ANALYSIS OF VARIANCE TABLE, I have the model below: y(x) = 11.7x1 + 9.4x2 + 16.4x3 + 19.0x1x2 + 11.4x1x3 - 9.6x2x3 with the follow ANOVA Table: source of variationD.F.SSMS Regressionp-1SSR=\sum( y_{pred} - y_{mean} )^2ssR/(p-1) ResidualN-pSSE=\sum( y_{exp} - y_{pred} )^2ssE/(N-p) TotalN-1SSE=\sum( y_{exp} - y_{mean} )^2 pred = predicted exp = experimental and in many others books. I always see the ANOVA Table of Mixtures systems with SST, the corrected total sum of squares ( N-1 degrees freedom ). I would like to ask: 1) What is approach ( point view ) more adequate ? With a mixture model, although you do not have a intercept term directly in the model, it is there, occulted,as the sum of the design variables representing the mixture is 1! So it is correct to use the corrected sum of squares. Kjetil 2) Could someone indicate some reference about this subject ? Thanks a lot. Regards Cleber N. Borges Dados x1 x2 x3 y 10011 10012.4 0.50.5015 0.50.5014.8 0.50.5016.1 0108.8 01010 00.50.510 00.50.59.7 00.50.511.8 00116.8 00116 0.500.517.7 0.500.516.4 0.500.516.6 ## Model d.lm - lm( y ~ -1 + x1*x2*x3 - x1:x2:x3, data = Dados ) ### Anova like in the book d.aov - aov( y ~ x1*x2*x3 - x1:x2:x3, data = Dados ) SSR (fitted Model) = 128.296 Douglas Bates wrote: On 1/24/06, Lance Westerhoff [EMAIL PROTECTED] wrote: Hi- On Jan 24, 2006, at 12:08 PM, Peter Dalgaard wrote: Lance Westerhoff [EMAIL PROTECTED] writes: Hello All- I found an inconsistency between the R-squared reported in Excel vs. that in R, and I am wondering which (if any) may be correct and if this is a known issue. While it certainly wouldn't surprise me if Excel is just flat out wrong, I just want to make sure since the R- squared reported in R seems surprisingly high. Please let me know if this is the wrong list. Thanks! Excel is flat out wrong. As the name implies, R-squared values cannot be less than zero (adjusted R-squared can, but I wouldn't think that is what Excel does). I had thought the same thing, but then I came across the following site which states: Note that it is possible to get a negative R- square for equations that do not contain a constant term. If R-square is defined as the proportion of variance explained by the fit, and if the fit is actually worse than just fitting a horizontal line, then R- square is negative. In this case, R-square cannot be interpreted as the square of a correlation. Since R^2 = 1 - (SSE/SST) I guess you can have SSE SST which would result in a R^2 of less then 1.0. However, it still seems very strange which made me wonder what is going on in Excel needless to say! http://www.mathworks.com/access/helpdesk/help/toolbox/curvefit/ ch_fitt9.html This seems to be a case of using the wrong formula. R^2 should measure the amount of variation for which the given model accounts relative to the amount of variation for which the *appropriate* null model does not account. If you have a constant or intercept term in a linear model then the null model for comparison is one with the intercept only. If you have a linear model without an intercept term then the appropriate null model for comparison is the model that predicts all the responses as zero. Thus SST, the corrected total sum of squares, should be used when you have a model with an intercept term but the uncorrected total sum of squares should be used when you do not have an intercept term. It is disappointing to see the MathWorks propagating such an elementary misconception. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html .
Re: [R] finding index of maximum value in vector
Jean-Luc Fontaine wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I found: max.col(matrix(c(1,3,2),nrow=1)) Is there a more concise/elegant way? which.max Thanks, - -- Jean-Luc Fontaine http://jfontain.free.fr/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFDpbfAkG/MMvcT1qQRAjGnAJsH+9MWuiv+K+US0McW2fLzRx2LHwCfYpNG gWKLuxnnlNde8WEr7wU5NUM= =7/gr -END PGP SIGNATURE- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Age of an object?
Philippe Grosjean wrote: Martin Maechler wrote: Trevor == Trevor Hastie [EMAIL PROTECTED] on Tue, 13 Dec 2005 12:51:34 -0800 writes: Trevor It would be nice to have a date stamp on an object. One way to do this with important objects is to use the comment function (in package base) comment(myobj) - made last sunday of 2005 Kjetil Trevor In S/Splus this was always available, because objects were files. [are you sure about always available? In any case, objects were not single files anymore for a long time, at least for S+ on windows, and AFAIK also on unixy versions recently ] This topic has come up before. IIRC, the answer was that for many of us it doesn't make sense most of the time: I remember it was discussed several times. I don't remember why it was considered too difficult to do. If you work with *.R files ('scripts') in order to ensure reproducibility, you will rerun -- often source() -- these files, and the age of the script file is really more interesting. Also, I *always* use the equivalent of q(save = no) and almost only use save() to particularly save the results of expensive computations {often, simulations}. OK, now let me give examples where having such an information would ease the work greatly: you have a (graphical) view of the content of an object (for instance, the one using the view button in R commander), or you have a graphical object explorer that has a cache to speed up display of information about objects in a given workspace (for instance, the SciViews-R object explorer). What a wonderful feature it will be to tell if an object was changed since last query. In the view, one could have a visual clue if it is up-to-date or not. In the object explorer, I could update information only for objects that have changed... Trevor I have looked around, but I presume this information is not available. I assume you will get other answers, more useful to you, which will be based on a class of objects which carry an 'creation-time' attribute. Yes, but that would work only for objects designed that way, and only if the methods that manipulate that object do the required housework to update the 'last-changed' attribute (the question was about last access of an object, not about its creation date, so 'last-changed' is a better attribute here). If you access the object directly with, let's say, [EMAIL PROTECTED] - newvalue, that attribute is not updated, isn't it? Best, Philippe Grosjean Martin Maechler, ETH Zurich __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Age of an object?
Philippe Grosjean wrote: Martin Maechler wrote: Trevor == Trevor Hastie [EMAIL PROTECTED] on Tue, 13 Dec 2005 12:51:34 -0800 writes: Trevor It would be nice to have a date stamp on an object. Following up on my post of a few minutes ago, I tried to write an timestamp function timestamp - function(obj, moretext){ comment(obj) - paste(Sys.time(), moretext, sep=\n) } but this does'nt work. myobj - 1:10 timestamp(myobj, test) Error in timestamp(myobj, test) : object obj not found Kjetil Trevor In S/Splus this was always available, because objects were files. [are you sure about always available? In any case, objects were not single files anymore for a long time, at least for S+ on windows, and AFAIK also on unixy versions recently ] This topic has come up before. IIRC, the answer was that for many of us it doesn't make sense most of the time: I remember it was discussed several times. I don't remember why it was considered too difficult to do. If you work with *.R files ('scripts') in order to ensure reproducibility, you will rerun -- often source() -- these files, and the age of the script file is really more interesting. Also, I *always* use the equivalent of q(save = no) and almost only use save() to particularly save the results of expensive computations {often, simulations}. OK, now let me give examples where having such an information would ease the work greatly: you have a (graphical) view of the content of an object (for instance, the one using the view button in R commander), or you have a graphical object explorer that has a cache to speed up display of information about objects in a given workspace (for instance, the SciViews-R object explorer). What a wonderful feature it will be to tell if an object was changed since last query. In the view, one could have a visual clue if it is up-to-date or not. In the object explorer, I could update information only for objects that have changed... Trevor I have looked around, but I presume this information is not available. I assume you will get other answers, more useful to you, which will be based on a class of objects which carry an 'creation-time' attribute. Yes, but that would work only for objects designed that way, and only if the methods that manipulate that object do the required housework to update the 'last-changed' attribute (the question was about last access of an object, not about its creation date, so 'last-changed' is a better attribute here). If you access the object directly with, let's say, [EMAIL PROTECTED] - newvalue, that attribute is not updated, isn't it? Best, Philippe Grosjean Martin Maechler, ETH Zurich __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Polytopic Vector Analysis (PVA)
Melanie Edwards wrote: I am curious whether anybody has or is developing this capability within R. I have not found any software yet that has this capability and I am not sure whether it is too new a method and nobody is actually using it, googling around I found that this (PVA) is a variant of factor analysis restricted to find non-negative factors. It is not a new method, although maybe the name is. This has been/is used for instance in air quality monitoring to identify sources of pollution, and if you have some prior information about possible sources that maybe could be used to. I guess this could be called 'source unmixing' or something similar, which indicatyes a similarity with independent component analysis. Maybe enough to restrict optimization to non-negative values? help.search(factor analysis) shows that factor analysis is multiply implemented for R, so maybe there is something, and if not, maybe simple to adapt. Kjetil Halvorsen or if there are other means to get the same analysis that I do not know of. Any information regarding developments or use of this method would be helpful. Melanie Edwards [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Generation of missiing values in a time serie...
Gabor Grothendieck wrote: Yes, this is the definition of a time series and therefore of a zoo object. A time series is a mathematical function, i.e. it assigns a single element of the range to each element of the domain. This data does not describe a time series. Since nobody else has mentiones it on this thread: Tha CRAN package pastecs has function `regul' to regularize irregular time series. maybe that is what the original poster want. Kjetil Also it has no underlying regularity as the warning message states. To use as.ts one wants a series with an underlying regularity that has gaps and then as.ts will fill in the gaps with NAs. If we don't have an underlying regularity the question is not well posed but its likely we want to discretize time. The zoo command itself is somewhat forgiving, at least in this case, i.e. it allows one to specify this illegal zoo object with non-unique times for purposes of discretization; however, such a zoo object should not be used other than to get a legal zoo object out. For example, in the following we round the times to one decimal place and then within each set of values at the same discretized time take the last one. (Alternately specify mean instead of tail, 1 if the average is prefered.) Then we convert that to a ts object: as.ts(aggregate(z, round(time(z), 1), tail, 1)) Time Series: Start = c(123, 2) End = c(123, 8) Frequency = 10 time flow seq ts x rtt size 123.1 123.12570 967 123.1257 13394 0.798205 1472 123.2 123.24110 969 123.2411 12680 0.796258 1472 123.3 NA NA NA NANA NA NA 123.4 NA NA NA NANA NA NA 123.5 123.47260 970 123.4726 12680 0.796258 1472 123.6 123.58860 971 123.5886 12680 0.796258 1472 123.7 123.70460 972 123.7046 12680 0.796258 1472 On 12/13/05, Alvaro Saurin [EMAIL PROTECTED] wrote: I think I have found the error. It appears when there are two entries with the same time. Using as input file: - CUT # Output format for PCKs: # TIME FLOW P [+-] SEQ TS X RTT SIZE # 123.125683 0 P + 967 123.125683 13394 0.798205 1472 123.241137 0 P + 968 123.241137 12680 0.796258 1472 123.241137 0 P + 969 123.241137 12680 0.796258 1472 123.472631 0 P + 970 123.472631 12680 0.796258 1472 123.588613 0 P + 971 123.588613 12680 0.796258 1472 123.704594 0 P + 972 123.704594 12680 0.796258 1472 - CUT I run fhe following code: - CUT h_types - list (0, 0, NULL, NULL, 0, 0, 0, 0, 0) h_names - list (time, flow, seq, ts, x, rtt, size) pcks_file- pipe (grep ' P ' data, r) pcks - scan (pcks_file, what = h_types, comment.char = '#', fill = TRUE) mat_df - data.frame (pcks[1:2], pcks[5:9]) mat - as.matrix (mat_df) colnames (mat) - h_names z - zoo (mat, mat [,time]) - CUT The dput of 'z' shows: - CUT structure(c(123.125683, 123.241137, 123.241137, 123.472631, 123.588613, 123.704594, 0, 0, 0, 0, 0, 0, 967, 968, 969, 970, 971, 972, 123.125683, 123.241137, 123.241137, 123.472631, 123.588613, 123.704594, 13394, 12680, 12680, 12680, 12680, 12680, 0.798205, 0.796258, 0.796258, 0.796258, 0.796258, 0.796258, 1472, 1472, 1472, 1472, 1472, 1472 ), .Dim = c(6, 7), .Dimnames = list(c(1, 2, 3, 4, 5, 6), c(time, flow, seq, ts, x, rtt, size)), index = structure(c(123.125683, 123.241137, 123.241137, 123.472631, 123.588613, 123.704594), .Names = c(1, 2, 3, 4, 5, 6)), class = zoo) - CUT If I try a 'as.ts(z)', it fails. If I remove the duplicate entry, I can convert it to a TS with no problem. Is this made intentionally? Because then I have to filter the input matrix... But, anyway, the output matrix, after filtering, doesn't seem regular: - CUT as.ts (z) Time Series: Start = 1 End = 5 Frequency = 1 time flow seq ts x rtt size 1 123.12570 967 123.1257 13394 0.798205 1472 2 123.24110 969 123.2411 12680 0.796258 1472 3 123.47260 970 123.4726 12680 0.796258 1472 4 123.58860 971 123.5886 12680 0.796258 1472 5 123.70460 972 123.7046 12680 0.796258 1472 Warning message: 'x' does not have an underlying regularity in: as.ts.zoo(z) - CUT Weird... On 13 Dec 2005, at 16:33, Gabor Grothendieck wrote: Please provide a reproducible example. Note that dput(x) will output an R object in a way that can be copied and pasted into another session. On 12/13/05, Alvaro Saurin [EMAIL PROTECTED] wrote: On 13 Dec 2005, at 13:08, Gabor Grothendieck wrote: Your variable mat is not a matrix; its a data frame. Check it with: class(mat) Here is an example: x - cbind(A = 1:4, B = 5:8) tt - c(1, 3:4, 6) library(zoo) x.zoo - zoo(x, tt) x.ts - as.ts(x.zoo) Fixed, but anyway it fails: h_types - list (0, 0, NULL, NULL, 0, 0, 0, 0, 0) h_names - list (time, flow, seq, ts, x,
Re: [R] store and retrieve object names in a vector
zhihua li wrote: hi netters, suppose i have a series of objects X1, X2, B1,C1... they all have the same dimensions. i want to combine into one by using cbind: y-cbind(X1,X2,B1,C1.) but i don't want to type the names of these objects one by one. instead, i've put their names into a vector: x-c(X1,X2,B1,C1,) Something like: do.call(rbind, lapply(x,get)) # not tested should work! Kjetil i used y-cbind(x). but what i got is a matrix of the names, not a combination of matrices. anybody know how to handle this? thanks a lot! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help: chisq.test
Thanjavur Bragadeesh wrote: I have two groups of patients (improved or not improved) named x and y group respectively after being treated with 5 different drugs X-c(43,52,25,48,57) and Y-c(237,198,245,212,233) when I run chisq.test(cbind(x,y)) This takes cbind(X,Y) as a contingency table,which is what you want. I get a p value of 0.0024 but if I run chisq.test(x,y) I get a p value of 0.22 not significant at 5% This is the same as chisq.test(table(X,Y)), which is the test on the contingency table table(X,Y) Y X198 212 233 237 245 25 0 0 0 0 1 43 0 0 0 1 0 48 0 1 0 0 0 52 1 0 0 0 0 57 0 0 1 0 0 which is not what you want. Kjetil what is the difference between the two thanks bragadeesh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Broken links on CRAN
Doran, Harold wrote: Sorry, didn't think about that. The mirror I used was http://lib.stat.cmu.edu/R/CRAN/ I just tried what you did, but with firefox, and there were no problems. Kjetil I checked other mirrors and they did work fine. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Peter Dalgaard Sent: Monday, December 05, 2005 10:11 AM To: Doran, Harold Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Broken links on CRAN Doran, Harold [EMAIL PROTECTED] writes: Dear List: When I click on the link to download a reference manual for a package on cran, I get an error message that the file is damaged and could not be repaired. I randomly chose various packages and the same error message appears. Are the links actually broken? I have also restarted my machine and closed and re-opened acrobat. I am using Windows XP, Acrobat Professional 6.0.0.5, and Explorer 6.0.2900.2180 Harold One mirror or all of them? If one, which one? Two randomly chosen package PDFs on CRAN master (Vienna) gave no problems for me. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] plotting question
Ed Wang wrote: Yes, I have gone through the manual. My best reference for plotting has been examples either through the list archive or searches on the internet. Nothing in the introductory manual could get me to what I have been able to do so far, but that is limited to plotting componenets of the time series returned from an STL call. This is why I am asking for example or references to examples from anyone who would be willing to share them. For some of us not very familiar with S+, etc. the documentation with R is not enough. While I can plot two time series one above another using the mfrow() function I'd prefer to put two time series in one plot in different colours and using two different symbols, which I cannot do using calls to plot(). What about making the two time series into an mts (multiple time series) object, with my.mts - cbind(ts.1, ts.2) or maybe my.ts - ts.union(ts.1, ts.2)This latest command does not assume a commomtime base. Then plot(my.ts,plot.type=single, col=c(red, blue)) Kjetil Thanks. A man is not old until regrets take the place of dreams. Actor John Barrymore From: Berton Gunter [EMAIL PROTECTED] To: 'Ed Wang' [EMAIL PROTECTED], r-help@stat.math.ethz.ch Subject: RE: [R] plotting question Date: Mon, 5 Dec 2005 14:12:47 -0800 ?lines ?points An Introduction to R (and numerous other books on R) explains this. Have you read it? -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] squared coherency and cross-spectrum
Spencer Graves wrote: I haven't seen a reply, so I will comment even though I've never used coherency / coherence nor spectrum. RSiteSearch(coherence) produced 13 hits, the third of which looked like it might be relevant to your question (http://finzi.psych.upenn.edu/R/Rhelp02a/archive/37640.html). RSiteSearch(coherency) produced 12 hits, at least some of which look like they might help you. In my cursory review, it looked like at least one of the coherence / coherency hits also mentioned the co-spectrum. Whether that's true or not, the examples with ?spectrum includes the statement, for multivariate examples see the help for spec.pgram. If you still have a question for this listserve after reviewing these references, PLEASE do read the posting guide! 'www.R-project.org/posting-guide.html'. I believe that people who follow more closely that posting guide tend to receive quicker, more useful answers than those who don't. I hope you won't mind if I now ask you a question: What can you get from coherency and co-spectrum that you can't get as easily from autocorrelation and partial autocorrelation functions, including the cross-correlations? Although the question is not to me,I try to answer,as I am planning to try to use this techniques! What I hope to get from them is on what time scale to time series is correlated, or more succinctly, the high frequency variation we can se in both series (ground and sattelite measurement of same phenomenon), are they correlated? Kjetil hope this helps. spencer graves Ling Jin wrote: Hi All, I have two time series, each has length 354. I tried to calculate the coherency^2 between them, but the value I got is always 1. On a website, it says: Note that if the ensemble averaging were to be omitted, the coherency (squared) would be 1, independent of the data. Does any of you know how to specify properly in R in order to get more useful coherency? The examples in the help do give coherencies that are not 1s, but I did not notice any special specification. Next question is on co-spectrum. When I supply spectrum function with multiple time series, it only gives me spectrum (smoothed periodogram) of individual time series. Is there any way I can get the cross-spectrum? I believe R has calculated it, but I could not find in the returned values. Attached is the smoothed periodogram of the two time series. Thanks a lot! Ling __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Impaired boxplot functionality - mean instead of median
P Ehlers wrote: I'd like to add two comments to Martin's sensible response. 1. I've seen several intro-stats textbooks that define a boxplot to have whiskers to the extreme data values and then define Tukey's boxplot as a modified boxplot. I wish authors wouldn't do that. 2. I've also seen boxplots used for sample sizes as small as -- are you ready for it? -- n = 2!! (Admittedly, only in plots comparing several groups.) The help page for stripchart() points out that stripcharts are a good alternative to boxplots when sample sizes are small. My own rule-of-thumb: n 20 for single boxplots, n 12 for multiple boxplots. Woul've it make sense to have an option to replace boxes with dotplots for only those groups with number of observations lesser tahn nmin=20 (say) Kjetil Peter Ehlers Martin Maechler wrote: Boxplots were invented by John W. Tukey and I think should be counted among the top small but smart achievements from the 20th century. Very wisely he did *not* use mean and standard deviations. Even though it's possible to draw boxplots that are not boxplots (and people only recently explained how to do this with R on this mailing list), I'm arguing very strongly against this. If I see a boxplot - I'd want it to be a boxplot and not have the silly (please excuse) 10%90% whiskers which declare 20% of the points as outliers {in the boxplot sense}. If you want the mean +/- sd plot, do *not* misuse boxplots for them, please! Martin Maechler, ETH Zurich Evgeniy == Evgeniy Kachalin [EMAIL PROTECTED] on Thu, 01 Dec 2005 19:04:47 +0300 writes: Evgeniy Hello to all users and wizards. Evgeniy I am regulary using 'boxplot' function or its analogue - 'bwplot' from Evgeniy the 'lattice' library. [there's the lattice *package* !] Evgeniy But they are, as far as I understand, totally Evgeniy flawed in functionality: they miss ability to select what they would Evgeniy draw 'in the middle' - median, mean. What the box means - standard Evgeniy error, 90% or something else. What the whiskers mean - 100%, 99% or Evgeniy something else. Evgeniy Is there any way to realize it? Or is there any other good data Evgeniy visualization function for comparing means of various data groups? Evgeniy Ideally I would like to have a bit more customised function for doing Evgeniy that. For example, 'boxplot(a~b,data=d,mid='mean'). Evgeniy -- Evgeniy Evgeniy, ICQ 38317310. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] help with R
Ed Wang wrote: Morning, I've downloaded the precompiled R 2.1.1 version and am using Windows XP on my office workstation. As mentioned previously, I've resorted to batch jobs to avoid the hanging that occurs when I try to plot the 3690 length vector of data. If it's warranted, I can do a build from the source and change specific parameters in the makefile if people feel it is warranted. Based on Berton's suggestion to look at the range of packages available I think stl() might be as appropriate a package to use to identify all three components of the time series data I have: underlying trend, seasonality over a full year period (periodicity of one year, or 246 days in my case), and residual (which I have no expectation that it will necessarily be ~N(0,\sigma^2)). For the following dataset (15 years, 246 days/year = 3690 days of data) what reasonal parameters for running stl() would folks suggest? I've not had any luck with getting stl() to return any useful information. It continues to stop with the statement series is not periodic or has less than two periods using stl(zHO, s.window=1, s.degree=2, l.window=246) or the obvious ways I might try running stl() (i.e. plot(stl(zHO))). It's possible I've not properly specified the length of expected periodicity as a parameter (246 days in my case). In creating the timeseries 9ts) object you need myts - ts(mydata,, frequency=246) Kjetil All suggestions are welcome! I'm trying to avoid going back and fitting a linear model with 245 dummy variables. Thanks. Ed __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Question on KalmanSmooth
I am trying to use KalmanSmooth to smooth a time series fitted by arima (and with missing values), but the $smooth component of the output baffles me. Look at the following example: testts - arima.sim(list(ar=0.9),n=100) testts[6:14] - NA testmod - arima(testts, c(1,0,0)) testsmooth - KalmanSmooth(testts, testmod$model) par(mfrow=c(2,1)) plot(testsmooth$smooth, type=l) plot(testsmooth$var, type=l) Look at the lower panel plot, how the uncertainty of the smoothed values first is lowered, then being the highest at the end ( of the smoothed part, indexes 6:14). Anybody can explain this,or is this an error? Kjetil Halvorsen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] coherency-Time Series
[EMAIL PROTECTED] wrote: hello! My name is Stefanos, from Athens. I'm a new user of R and I'm studying multivariate time series. I can't find in the help menu how to calculate the cross spectrum and the coherency of 2 Time Series. Would you like to help me? Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html See ?spectrum,and especially component $coh of output. Kjetil __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] multilevel models and sample size
ronggui wrote: It is not a pure R question,but I hope some one can give me advices. I want to use analysis my data with the multilevel model.The data has 2 levels the second level has 52 units and each second level unit has 19-23 units.I think the sample size is quite small,but just now I can't make the sample size much bigger.So I want to ask if I use the multilevel model to analysis the data set,will it be acceptable? or unacceptable because of the small sample size? This kind of question I usually try to answer by simulation, which is very easy in R. Kjetil Thank you very much! ronggui 2005-11-28 -- Deparment of Sociology Fudan University My new mail addres is [EMAIL PROTECTED] Blog:http://sociology.yculblog.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] 'For each file in folder F do....'
Ron Piccinini wrote: Hello, I have 2700 text files in a folder and need to apply the same program/procedure to each individually. I'm trying to find how to code something like: For each file in Folder do {Procedure} is there an easy way to do this? other suggestions? files - listfiles() results - lapply(files, yourprocessing()) where yourprocessing is a function taking as argument a file name and returning whatever you want. Kjetil I have tried to list all the files names in a vector e.g. listfiles[1:10,1] 1 H:/Rtest/AXP.txt 2H:/Rtest/BA.txt 3 H:/Rtest/C.txt 4 H:/Rtest/CAT.txt 5H:/Rtest/DD.txt 6 H:/Rtest/DIS.txt 7H:/Rtest/EK.txt 8H:/Rtest/GE.txt 9H:/Rtest/GM.txt 10 H:/Rtest/HD.txt but R doesn't like statements of type read.table(file=listfiles[1,1]) since 'file' must be a character string or connection... Any thoughts? Many thanks in advance, Ron Piccinini. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R: pp plot
Dr. Herwig Meschke wrote: To construct a nonparametric (1-alpha) confidence set for an arbitrary CDF F, you can use the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality (e.g., see Wasserman, L. (2005). All of Statistics. 2nd, corr. printing. NY: Springer, p. 99). With n=sample size and eps=sqrt(log(2/alpha)/(2*n)), the lower and upper limits are pmax(F-eps ,0) and pmin(F+eps, 1). Disadvantage: the sample size must be large. Herwig There is also ecdf.ksCI in CRAN package sfsmisc. Kjetil __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Can't figure out warning message
Ravi Varadhan wrote: Hi, I apologize for the previous posting, where the message was not formatted properly. Here is a better version: I have written the following function to check whether a vector has elements satisfying monotonicity. is.monotone - function(vec, increase=T){ ans - TRUE vec.nomis - vec[!is.na(vec)] if (increase any(diff(vec.nomis,1) 0, na.rm=T)) ans - FALSE if (!increase any(diff(vec.nomis,1) 0, na.rm=T)) ans - FALSE ans } This works correctly, but I get this error message as below. x - 2:10 is.monotone(x) [1] TRUE Warning messages: 1: the condition has length 1 and only the first element will be used in: if (increase any(diff(vec.nomis, 1) 0, na.rm = T)) ans - FALSE 2: the condition has length 1 and only the first element will be used in: if (!increase any(diff(vec.nomis, 1) 0, na.rm = T)) ans - FALSE Try to double the : in place of Kjetil I am unable to see why the condition should have a length greater than 1, since any should give me a single logical value. Can any one tell me what is going on here? (I am using version 2.1.1 on Windows). Thanks very much, Ravi. -- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] -- [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Use of paste with apply()
I was surprised by: test - matrix( as.character(1:4), 2) test [,1] [,2] [1,] 1 3 [2,] 2 4 apply(test, 1, paste, sep=+) [,1] [,2] [1,] 1 2 [2,] 3 4 apply(test, 1, paste, sep=*) [,1] [,2] [1,] 1 2 [2,] 3 4 te - matrix(1:4, 2) te [,1] [,2] [1,]13 [2,]24 apply(te, 1, sum) [1] 4 6 Why doesn't paste behave in apply as sum? Kjetil -- Checked by AVG Free Edition. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] OLS variables
John Fox wrote: Dear Leaf, I assume that you're using lm() to fit the model, and that you don't really want *all* of the interactions among 20 predictors: You'd need quite a lot of data to fit a model with 2^20 terms in it, and might have trouble interpreting the results. If you know which interactions you're looking for, then why not specify them directly, as in lm(y ~ x1*x2 + x3*x4*x5 + etc.)? On the other hand, it you want to include all interactions, say, up to three-way, and you've put the variables in a data frame, then lm(y ~ .^3, data=DataFrame) will do it. This is nice with factors, but with continuous variables, and need of a response-surface type, of model, will not do. For instance, with variables x, y, z in data frame dat lm( y ~ (x+z)^2, data=dat ) gives a model mwith the terms x, z and x*z, not the square terms. There is a need for a semi-automatic way to get these, for instance, use poly() or polym() as in: lm(y ~ polym(x,z,degree=2), data=dat) Kjetil There are many terms in this model, however, if not quite 2^20. The introductory manual that comes with R has information on model formulas in Section 11. I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Leaf Sun Sent: Sunday, November 06, 2005 3:11 AM To: r-help@stat.math.ethz.ch Subject: [R] OLS variables Dear all, Is there any simple way in R that can I put the all the interactions of the variables in the OLS model? e.g. I have a bunch of variables, x1,x2, x20... I expect then to have interaction (e.g. x1*x2, x3*x4*x5... ) with some combinations(2 way or higher dimensions). Is there any way that I can write the model simpler? Thanks! Leaf __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Checked by AVG Free Edition. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] FDR analyses: minimum number of features
Spencer Graves wrote: Two thoughts on this: 1. Your FDR (Not Franklin Delano Roosevelt) sounds like another name for Type I error rate. It is certainly not the same as type I error rate. Type I error rate is the proportion of true nulls which are rejected, while the FDR is the proportion of rejected null hypothesis which really are true nulls! To me FDR seems like a more promising avenue to multiple testing than the old familywise error rate. Who knows what is a family? Kjetil The definition of reasonably reliable FDRs would seem to relate to the status of the literature on this issue among researchers in genotyping. As more reports of FRDs in genotyping are published, I would expect that methodology for estimation and the standard for accuracy would similarly evolve. 2. Have you tried the Bioconductor (www.bioconductor.org/) listserve? They might be able to say something more useful than a general list like this. spencer graves Dupont, William wrote: Dear List, We are planning a genotyping study to be analyzed using false discovery rates (FDRs) (See Storey and Tibshirani PNAS 2003; 100:9440-5). I am interested in learning if there is any consensus as to how many features (ie. how many P values) need to be studied before reasonably reliable FDRs can be derived. Does anyone know of a citation where this is discussed? Bill Dupont William D. Dupont phone: 615-343-4100 URL http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/WilliamDupont __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] significance of spectal peak with spectrum()
Sebastian Leuzinger wrote: thanks a lot. I am interested in the more complex case where the interest is about a specific frequency being significant, not at least one frequency being significantly different from the backgrond white noise. I have discussed this issue with very knowledgable people in the field who could not help me either. I would be interested in any references as well. http://bayes.wustl.edu/ where you can download Bretthorst's book Bayesian Spectrum Analysis and Parameter Estimation Kjetil On Friday 16 September 2005 10:36, you wrote: Sebastian Leuzinger wrote: the null hypothesis would be: one particular frequency peak is not significantly different from the background noise. So you want to know, e.g., whether there is something going on at 1000 Hz? This is difficult: If you are considering the periodogram to be a density, then you do not know the distribution of the value of a single frequency, because it depends on the stuff going on at other frequencies. Second point is (and already asked): Kind of [background] noise? The only really easy test is for the Null signal is white noise, hence H1 is at least one non-white-noisy frequency. [If somebody knows a really good book or papers that cover other cases than the trivial one mentioned above, I am very interested to hear about them, BTW.] If you have another kind of noise (such as blue or pink noise), things become even worse. Uwe Ligges On Friday 16 September 2005 09:28, you wrote: Sebastian Leuzinger wrote: Hello, has anybody got a simple recepie to test the significance level of the peaks after using spectrum() ? What is you null hypothesis? - Kind of noise? - One particular frequency is noisy or all noisy? - ... Uwe Ligges (R-version 2.0.1, linux SuSE9.3) -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Multivariate Skew Normal distribution
(Ted Harding) wrote: On 01-Sep-05 Caio Lucidius Naberezny Azevedo wrote: Hi all, Could anyone tell me if there is any package (or function) that generates values from a multivariate skew normal distribution? Thanks all, Caio Hello, Caio Please tell us what you mean by skew normal distribution. Since normal (i.e. gaussian) distributions are not skew, you Well, but then somebody (Azzalini?) coined the term skew-normal, which you can read about athttp://azzalini.stat.unipd.it//SN or simply do library(help=sn) # after installing from CRAN. This family is obtained by skewing a normal family, hence the name. You can also skew a t -family or whatever other symmetric family you like. I found this usefull for modelling. Kjetil presumably mean something different from what you said, so unless we understand this more clearly it is unlikely that anyone can make a suggestion which would meet your needs. With best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 01-Sep-05 Time: 15:02:37 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- Internal Virus Database is out-of-date. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] coercing created variables into a new data frame using na.omit()
Prof Brian Ripley wrote: I don't know if you can read your message, but I find it exceedingly difficult and there seem to be several typos. Please use the space and return keys ... and only send a message once. You problem is perhaps that you are not looking at the data frame, but at the variable in the workspace. attach()ing data frames is convenient but error-prone (as you have found). rm(new.variable) should solve this, but it is better to cultivate a different style. For example with(data.frame1, { # commands to create value data.frame1$new.variable - value }) data.frame3 - na.omit(data.frame1) That cannot possible work, as assignment within with is local to with's environment. I have used superassigmnent for this (-), but that cannot possible be a good style? Look at the following: test - data.frame( a=1:5, b=1:5) test a b 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 with(test, test$c - 1:5) test a b 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 with(test, test$c - 1:5) test a b c 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 So what is the best style her? Kjetil I think too that the creation of the value can be vectorized simply, generalizing something like value - (age1 - 7)*(weight - mw) On Fri, 12 Aug 2005, sp219 wrote: Hi, I am an R newbie and one thing I am having trouble with binding variables that I have created within one data frame into a new data frame when using na.omit(). To illustrate this problem I will give the example I am working on and the approah I have been using:- data.frame1-filepath attach(data.frame1) #create a new variable using a function new.variable-rep(1,length(weight3)) for (x in 1:length(new.variable)) {f-age1[x]-7)*(weight[x]-mw))+((age2[x]-7)*(weight2[x]-mw))+((age3[x]-7)* (weight3[x]-mw)))/(((age1[x]-7)^2)+((age2[x]-7)^2)+((age3[x]-7)^2))); new.variable[x]-f} #then bind it into the existing old data frame data.frame2-cbind(data.frame1,newvariable) rm(dat.frame1) attach(data.frame2) #everything o.k. so far but now the problem part... I basically want to remove all the rows with NA in the new data frame including corresponding rows in the new variable data.frame3-na.omit(data.frame2) rm(data.frame2) attach(data.frame3) length of new.variable has not changed but the length of all the other variables in data.frame2 has? Could someone please provide an explanation or an alternative route if possible? Any suggestions much appreciated, Thankyou, Simon Pickett Simon Pickett Centre for Ecology and Conservation Biology University of Exeter in Cornwall Tremough Campus Penryn Cornwall TR10 9EZ UK Tel: 01326371852 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- Internal Virus Database is out-of-date. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] computationally singular
Weiwei Shi wrote: I think the problem might be caused two variables are very correlated. Should I check the cov matrix and try to delete some? But i am just not quite sure of your reply. Could you detail it with some steps? thanks, Why not do principal component analysis? To identify the zero variance linear combination(s) look at the nzero eigenvalues. Also, it *might* make sense to calculate a mahalanobis distance replacing the matrix inverse with a pseudoinverse. Kjetil weiwei On 8/8/05, Christian Hennig [EMAIL PROTECTED] wrote: Once I had a situation where the reason was that the variables were scaled to extremely different magnitudes. 1e-25 is a *very* small number but still there is some probability that it may help to look up standard deviations and to multiply the variable with the smallest st.dev. with 1e20 or something. Best, Christian On Mon, 8 Aug 2005, Weiwei Shi wrote: Hi, I have a dataset which has around 138 variables and 30,000 cases. I am trying to calculate a mahalanobis distance matrix for them and my procedure is like this: Suppose my data is stored in mymatrix S-cov(mymatrix) # this is fine D-sapply(1:nrow(mymatrix), function(i) mahalanobis(mymatrix, mymatrix[i,], S)) Error in solve.default(cov, ...) : system is computationally singular: reciprocal condition number = 1.09501e-25 I understand the error message but I don't know how to trace down which variables caused this so that I can sacrifice them if there are not a lot. Again, not sure if it is due to some variables and not sure if dropping variables is a good idea either. Thanks for help, weiwei -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html *** NEW ADDRESS! *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 [EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- Internal Virus Database is out-of-date. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] use of NA's
Robert Kinley wrote: with NAs it's always safest to use the construction if(is.na(foo)) rather than if(foo==NA) cheers Bob Kinley And if it is not clear why, try: NA == NA [1] NA Kjetil tom wright [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 05/08/2005 12:30 To r-help@stat.math.ethz.ch cc Subject [R] use of NA's Can someone please explain why this works: d-c(0,2,3,2,0,3,4,0,0,0,0,0) d.mat-matrix(data=d,nrow=4,ncol=3,byrow=TRUE) for(i in 1:length(d.mat[1,])){ + d.mat[,i][d.mat[,i]==0]-mean(d.mat[,i][d.mat[,i]0]) + } Whereas: d-c(0,2,3,2,0,3,4,0,0,0,0,0) d.mat-matrix(data=d,nrow=4,ncol=3,byrow=TRUE) d.mat[d.mat==0]-NA for(i in 1:length(d.mat[1,])){ + d.mat[,i][d.mat[,i]==NA]-mean(d.mat[,i],na.rm=TRUE) + } dosnt Thanks Tom __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- Internal Virus Database is out-of-date. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Binary outcome with non-absorbing outcome state
John Sorkin wrote: I am trying to model data in which subjects are followed through time to determine if they fall, or do not fall. Some of the subjects fall once, some fall several times. Follow-up time varies from subject to subject. I know how to model time to the first fall (e.g. Cox Proportional Hazards, Kaplan-Meir analyses, etc.) but I am not sure how I can model the data if I include the data for those subjects who fall more than once. I would appreciate suggestions about a models that I could use, how I would quantify the follow-up time, how I account for the imbalance in the data (some subjects would contribute one outcome measure, others multiple measures), etc. Many thanks, John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC and University of Maryland School of Medicine Claude Pepper OAIC University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 410-605-7119 -- NOTE NEW EMAIL ADDRESS: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html help.search(recurrent) leads to CRAN pakage survrec. You couls also have a look at CRAN package eha and at Lindsey's package event -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] comparing strength of association instead of strength of evidence?
Weiwei Shi wrote: Dear all: I still need some further help since I think the question itself might be very interesting (i hope so:) : the question is on chisq.test, my concern is which criteria should be used here to evaluate the independence. The reason i use this old subject of the email is, b/c I think the problem here is about how to look at p.value, which evaluate the strength of evidence instead of association. If i am wrong, please correct me. the result looks like this: index word.comb id in.class0 in.class1 p.value odds.ratio 1 1 TOTAL|LAID 54|241 2 4 0.0004997501 0.00736433 2 2 THEFT|RECOV 52|53 40751 146 0.0004997501 4.17127643 3 3 BOLL|ACCID 10|21 36825 1202 0.0004997501 0.44178546 4 4 LAB|VANDAL 8|55 24192 429 0.0004997501 0.82876099 5 5 VANDAL|CAUS 55|59 80164 0.0004997501 0.18405918 6 6AI|TOTAL 9|54 194945 0.0034982509 0.63766766 7 7AI|RECOV 9|53 238561 0.0004997501 0.57547012 8 8 THEFT|TOTAL 52|54 33651 110 0.0004997501 4.56174408 the target is to look for best subset of word.comb to differentiate between class0 and class1. p.value is obtained via p.chisq.sim[i] - as.double(chisq.test(tab, sim=TRUE, B=myB)$p.value) and B=2 (I increased B and it won't help. the margin here is class0=2162792 class1=31859 ) So, in conclusion, which one I should use first, odds.ratio or p.value to find the best subset. If your goal is to discriminate between two different classes, why not calculate directly a measure of discriminative ability, such as probability of correct classification? Kjetil I read some on feature selection in text categorization (A comparative study on feature selection in text categorization might be a good ref.). Anyone here has other suggestions? thanks, weiwei On 6/24/05, Gabor Grothendieck [EMAIL PROTECTED] wrote: On 6/24/05, Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] wrote: Weiwei Shi wrote: Hi, I asked this question before, which was hidden in a bunch of questions. I repharse it here and hope I can get some help this time: I have 2 contingency tables which have the same group variable Y. I want to compare the strength of association between X1/Y and X2/Y. I am not sure if comparing p-values IS the way even though the probability of seeing such weird observation under H0 defines p-value and it might relate to the strength of association somehow. But I read the following statement from Alan Agresti's An Introduction to Categorical Data Analysis : Chi-squared tests simply indicate the degree of EVIDENCE for an associationIt is sensible to decompose chi-squared into components, study residuals, and estimate parameters such as odds ratios that describe the STRENGTH OF ASSOCIATION. Here are some things you can do: tab1-array(c(11266, 125, 2151526, 31734), dim=c(2,2)) tab2-array(c(43571, 52, 2119221, 31807), dim=c(2,2)) library(epitools) # on CRAN ?odds.ratio Help for 'odds.ratio' is shown in the browser library(help=epitools) # on CRAN tab1 [,1][,2] [1,] 11266 2151526 [2,] 125 31734 odds.ratio(11266, 125, 2151526, 31734) Error in fisher.test(tab) : FEXACT error 40. Out of workspace. # so this are evidently for tables with smaller counts library(vcd) # on CRAN ?oddsratio Help for 'oddsratio' is shown in the browser oddsratio( tab1) # really is logodds ratio [1] 0.2807548 plot(oddsratio( tab1) ) library(help=vcd) # on CRAN Read this for many nice functions. fourfoldplot(tab1) mosaicplot(tab1) # not really usefull for this table Also has a look at function Crosstable in package gmodels. To decompose the chisqure you can program yourselves: decomp.chi - function(tab) { rows - rowSums(tab) cols - colSums(tab) N - sum(rows) E - rows %o% cols / N contrib - (tab-E)^2/E contrib } decomp.chi(tab1) [,1] [,2] [1,] 0.1451026 0.0007570624 [2,] 9.8504915 0.0513942218 So you can easily see what cell contributes most to the overall chisquared. Kjetil Can I do this decomposition in R for the following example including 2 contingency tables? tab1-array(c(11266, 125, 2151526, 31734), dim=c(2,2)) tab1 [,1][,2] [1,] 11266 2151526 [2,] 125 31734 tab2-array(c(43571, 52, 2119221, 31807), dim=c(2,2)) tab2 [,1][,2] [1,] 43571 2119221 [2,]52 31807 Here are a few more ways of doing this using chisq.test, glm and assocplot: ## chisq.test ### tab1.chisq - chisq.test(tab1) # decomposition of chisq resid(tab1.chisq)^2 [,1] [,2] [1,] 0.1451026 0.0007570624 [2,] 9.8504915 0.0513942218 # same with(tab1.chisq, (observed - expected)^2/expected) [,1] [,2] [1,] 0.1451026
Re: [R] What method I should to use for these data?
luan_sheng wrote: -Original Message- From: luan_sheng [mailto:[EMAIL PROTECTED] Sent: Thursday, July 07, 2005 9:46 PM To: (r-help@stat.math.ethz.ch) Cc: ([EMAIL PROTECTED]) Subject: What method I should to use for these data? Dear R user: I am studying the allele data of two populations. the following is the data: a1 a2 a3 a4 a5 a6 a7 a8 a9 a10a11 a12 a13 a14 a15 a16 a17 pop1 0.0217 0. 0.0109 0.0435 0.0435 0. 0.0109 0.0543 0.1739 0.0761 0.1413 0.1522 0.1087 0.0870 0.0435 0.0217 0.0109 pop2 0.0213 0.0213 0. 0. 0. 0.0426 0.1702 0.2128 0.1596 0.1809 0.0957 0.0745 0.0106 0.0106 0. 0. 0. a1,a2,a3 .. a17 is the frequency of 17 alleles , the sum is 1. I want to test the significance of the distribution of 17 alleles between two populations. How can I do? I want to use chisquare, is is right for these data ? If you want to use chisquare, you need the counts and not only the proportions. If that is right can be answered only if we know your hypothesis. Kjetil can anyone help me ? Thanks!! luan Yellow Sea Fisheries Research Institute , Chinese Academy of Fishery Sciences , Qingdao , 266071 __ ÑÅ»¢Ãâ·ÑGÓÊÏä£ÖйúµÚÒ»¾øÎÞÀ¬»øÓʼþɧÈų¬´óÓÊÏä __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] rlm/M/MM-estimator questions
Christian Hennig wrote: Hi list, 1) How can the MM-estimator method=MM in function rlm be tuned to 85% efficiency? It seems that there is a default tuning to 95%. I presume, but am not sure, that the MM-estimator uses phi=phi.bisquare as default and the tuning constant could be set by adding a parameter c=... Is this true? Which value to use for 85%? (In principle I should be able to figure that out theoretically, but it would be much easier if somebody already knew the constant or a straightforward way to compute it.) I have done this once, but cannot find the code or remember the constant. But given the constant, it is easy to do this in R. rlm has an argument psi with default psi huber: psi.huber function (u, k = 1.345, deriv = 0) { if (!deriv) return(pmin(1, k/abs(u))) abs(u) = k } environment: namespace:MASS Use this argument with psi=function(u, k= your.value, deriv=0) psi.huber(u,k,deriv) 2) The M-estimator with bisquare uses rescaled MAD of the residuals as scale estimator according to the rlm help page. Does this mean that a scale estimator is used which is computed from least squares residuals? Are M-estimator and residual scale estimator iterated until convergence of them both? (Does this converge?) Not sure about this at the moment. Or what else? What does rescaled mean? rescaled means multiplied with the constant which makes it a consistent estimator under the normal model, default in the R mad function Kjetil Thank you, Christian *** NEW ADDRESS! *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 [EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] comparing strength of association instead of strength of evidence?
Weiwei Shi wrote: Hi, I asked this question before, which was hidden in a bunch of questions. I repharse it here and hope I can get some help this time: I have 2 contingency tables which have the same group variable Y. I want to compare the strength of association between X1/Y and X2/Y. I am not sure if comparing p-values IS the way even though the probability of seeing such weird observation under H0 defines p-value and it might relate to the strength of association somehow. But I read the following statement from Alan Agresti's An Introduction to Categorical Data Analysis : Chi-squared tests simply indicate the degree of EVIDENCE for an associationIt is sensible to decompose chi-squared into components, study residuals, and estimate parameters such as odds ratios that describe the STRENGTH OF ASSOCIATION. Here are some things you can do: tab1-array(c(11266, 125, 2151526, 31734), dim=c(2,2)) tab2-array(c(43571, 52, 2119221, 31807), dim=c(2,2)) library(epitools) # on CRAN ?odds.ratio Help for 'odds.ratio' is shown in the browser library(help=epitools) # on CRAN tab1 [,1][,2] [1,] 11266 2151526 [2,] 125 31734 odds.ratio(11266, 125, 2151526, 31734) Error in fisher.test(tab) : FEXACT error 40. Out of workspace. # so this are evidently for tables with smaller counts library(vcd) # on CRAN ?oddsratio Help for 'oddsratio' is shown in the browser oddsratio( tab1) # really is logodds ratio [1] 0.2807548 plot(oddsratio( tab1) ) library(help=vcd) # on CRAN Read this for many nice functions. fourfoldplot(tab1) mosaicplot(tab1) # not really usefull for this table Also has a look at function Crosstable in package gmodels. To decompose the chisqure you can program yourselves: decomp.chi - function(tab) { rows - rowSums(tab) cols - colSums(tab) N - sum(rows) E - rows %o% cols / N contrib - (tab-E)^2/E contrib } decomp.chi(tab1) [,1] [,2] [1,] 0.1451026 0.0007570624 [2,] 9.8504915 0.0513942218 So you can easily see what cell contributes most to the overall chisquared. Kjetil Can I do this decomposition in R for the following example including 2 contingency tables? tab1-array(c(11266, 125, 2151526, 31734), dim=c(2,2)) tab1 [,1][,2] [1,] 11266 2151526 [2,] 125 31734 tab2-array(c(43571, 52, 2119221, 31807), dim=c(2,2)) tab2 [,1][,2] [1,] 43571 2119221 [2,]52 31807 BTW, is there some good forum on the theory of statistics? r-help is a good one but I don't want to bother people by asking some questions weakly associated with R here. Thanks, -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] chisq test and fisher exact test
Weiwei Shi wrote: Hi, I have a text mining project and currently I am working on feature generation/selection part. My plan is selecting a set of words or word combinations which have better discriminant capability than other words in telling the group id's (2 classes in this case) for a dataset which has 2,000,000 documents. One approach is using contrast-set association rule mining while the other is using chisqr or fisher exact test. An example which has 3 contingency tables for 3 words as followed (word coded by number): tab[,,1:3] , , 1 [,1][,2] [1,] 11266 2151526 [2,] 125 31734 , , 2 [,1][,2] [1,] 43571 2119221 [2,]52 31807 , , 3 [,1][,2] [1,] 427 2162365 [2,]5 31854 I have some questions on this: 1. What's the thumb of rule to use chisq test instead of Fisher exact test. I have a vague memory which said for each cell, the count needs to be over 50 if chisq instead of fisher exact test is going to be used. In the case of word 3, I think I should use fisher test. However, running chisq like below is fine: tab[,,3] [,1][,2] [1,] 427 2162365 [2,]5 31854 chisq.test(tab[,,3]) Pearson's Chi-squared test with Yates' continuity correction data: tab[, , 3] X-squared = 0.0963, df = 1, p-value = 0.7564 but running on the whole set of words (including 14240 words) has the following warnings: p.chisq-as.double(lapply(1:N, function(i) chisq.test(tab[,,i])$p.value)) There were 50 or more warnings (use warnings() to see the first 50) warnings() Warning messages: 1: Chi-squared approximation may be incorrect in: chisq.test(tab[, , i]) 2: Chi-squared approximation may be incorrect in: chisq.test(tab[, , i]) 3: Chi-squared approximation may be incorrect in: chisq.test(tab[, , i]) 4: Chi-squared approximation may be incorrect in: chisq.test(tab[, , i]) 2. So, my second question is, is this warning b/c I am against the assumption of using chisq. But why Word 3 is fine? How to trace the warning to see which word caused this warning? 3. My result looks like this (after some mapping treating from number id to word and some words are stemmed here, like ACCID is accident): of[1:50,] map...2. p.fisher 21 ACCID 0.00e+00 30 CD 0.00e+00 67ROCK 0.00e+00 104 CRACK 0.00e+00 111 CHIP 0.00e+00 179 GLASS 0.00e+00 84BACK 4.199878e-291 395 DRIVEABL 5.335989e-287 60 CAP 9.405235e-285 262 WINDSHIELD 2.691641e-254 13 IV 3.905186e-245 110 HZ 2.819713e-210 11CAMP 9.086768e-207 2 SHATTER 5.273994e-202 297ALP 1.678521e-177 162BED 1.822031e-173 249BCD 1.398391e-160 493 RACK 4.178617e-156 59CAUS 7.539031e-147 3.1 question: Should I use two-sided test instead of one-sided for fisher test? I read some material which suggests using two-sided. 3.2 A big question: Even though the result looks very promising since this is case of classiying fraud cases and the words selected by this approach make sense. However, I think p-values here just indicate the strength to reject null hypothesis, not the strength of association between word and class of document. So, what kind of statistics I should use here to evaluate the strength of association? odds ratio? Any suggestions are welcome! Thanks! You can use chisq.test with sim=TRUE, or call it as usual first, see if there is a warning, and then recall with sim=TRUE. Kjetil -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] glm with variance = mu+theta*mu^2?
Spencer Graves wrote: How might you fit a generalized linear model (glm) with variance = mu+theta*mu^2 (where mu = mean of the exponential family random variable and theta is a parameter to be estimated)? This appears in Table 2.7 of Fahrmeir and Tutz (2001) Multivariate Statisticial Modeling Based on Generalized Linear Models, 2nd ed. (Springer, p. 60), where they compare log-linear model fits to cellular differentiation data based on quasi-likelihoods between variance = phi*mu (quasi-Poisson), variance = phi*mu^2 (quasi-exponential), and variance = mu+theta*mu^2. The quasi function accepted for the family argument in glm generates functions variance, validmu, and dev.resids. I can probably write functions to mimic the quasi function. However, I have two questions in regard to this: (1) I don't know what to use for dev.resids. This may not matter for fitting. I can try a couple of different things to see if it matters. (2) Might someone else suggest something different, e.g., using something like optim to solve an appropriate quasi-score function? Thanks, spencer graves Since nobody has answerd this I will try. The variance function mu+theta*mu^2 is the variance function of the negative binomial family. If this variance function is used to construct a quasi-likelihood, the resulting quasi- likelihood is identical to the negative binomial likelihood, so for fitting we can simly use glm.nb from MASS, which will give the correct estimated values. However, in a quasi-likelihood setting the (co)varince estimation from glm.nb is not appropriate, and from the book (fahrmeir ..) it seems that the estimation method used is a sandwich estimator, so we can try the sandwich package. This works but the numerical results are somewhat different from the book. Any comments on this? my code follows: library(Fahrmeir) library(help=Fahrmeir) library(MASS) cells.negbin - glm(y~TNF+IFN+TNF:IFN, data=cells, family=negative.binomial(1/0.215)) summary(cells.negbin) Call: glm(formula = y ~ TNF + IFN + TNF:IFN, family = negative.binomial(1/0.215), data = cells) Deviance Residuals: Min 1Q Median 3Q Max -1.6714 -0.8301 -0.2153 0.4802 1.4282 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 3.39874495 0.18791125 18.087 4.5e-10 *** TNF 0.01616136 0.00360569 4.482 0.00075 *** IFN 0.00935690 0.00359010 2.606 0.02296 * TNF:IFN -0.5910 0.7002 -0.844 0.41515 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for Negative Binomial(4.6512) family taken to be 1.012271) Null deviance: 46.156 on 15 degrees of freedom Residual deviance: 12.661 on 12 degrees of freedom AIC: 155.49 Number of Fisher Scoring iterations: 5 confint(cells.negbin) Waiting for profiling to be done... 2.5 % 97.5 % (Intercept) 3.0383197319 3.7890206510 TNF 0.0091335087 0.0238915483 IFN 0.0023292566 0.0170195707 TNF:IFN -0.0001996824 0.960427 library(sandwich) Loading required package: zoo vcovHC( cells.negbin ) (Intercept) TNF IFN TNF:IFN (Intercept) 0.01176249372 -0.0001279740135 -0.0001488223001 0.0212541999 TNF -0.00012797401 0.039017282 0.021242875 -0.0019793137 IFN -0.00014882230 0.021242875 0.054314079 -0.0013277626 TNF:IFN 0.0212542 -0.001979314 -0.001327763 0.0002370104 cov2cor(vcovHC( cells.negbin )) (Intercept)TNFIFNTNF:IFN (Intercept) 1.000 -0.5973702 -0.5887923 0.1272950 TNF -0.5973702 1.000 0.4614542 -0.6508822 IFN -0.5887923 0.4614542 1.000 -0.3700671 TNF:IFN 0.1272950 -0.6508822 -0.3700671 1.000 cells.negbin2 - glm.nb( y~TNF+IFN+TNF:IFN, data=cells) summary(cells.negbin) Call: glm(formula = y ~ TNF + IFN + TNF:IFN, family = negative.binomial(1/0.215), data = cells) Deviance Residuals: Min 1Q Median 3Q Max -1.6714 -0.8301 -0.2153 0.4802 1.4282 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 3.39874495 0.18791125 18.087 4.5e-10 *** TNF 0.01616136 0.00360569 4.482 0.00075 *** IFN 0.00935690 0.00359010 2.606 0.02296 * TNF:IFN -0.5910 0.7002 -0.844 0.41515 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for Negative Binomial(4.6512) family taken to be 1.012271) Null deviance: 46.156 on 15 degrees of freedom Residual deviance: 12.661 on 12 degrees of freedom AIC: 155.49 Number of Fisher Scoring iterations: 5 confint( cells.negbin2 ) Waiting for profiling to be done... 2.5 %97.5 % (Intercept) 3.0864669072
Re: [R] Replacing for loop with tapply!?
Sander Oom wrote: Dear all, We have a large data set with temperature data for weather stations across the globe (15000 stations). For each station, we need to calculate the number of days a certain temperature is exceeded. So far we used the following S code, where mat88 is a matrix containing rows of 365 daily temperatures for each of 15000 weather stations: m - 37 n - 2 outmat88 - matrix(0, ncol = 4, nrow = nrow(mat88)) for(i in 1:nrow(mat88)) { # i - 3 row1 - as.data.frame(df88[i, ]) temprow37 - select.rows(row1, row1 m) temprow39 - select.rows(row1, row1 m + n) temprow41 - select.rows(row1, row1 m + 2 * n) outmat88[i, 1] - max(row1, na.rm = T) outmat88[i, 2] - count.rows(temprow37) outmat88[i, 3] - count.rows(temprow39) outmat88[i, 4] - count.rows(temprow41) } outmat88 What you need is not tapply but apply. Something like apply(mat88, 1, function(x) sum(x 30)) where your treshold should replace 30 and the `1' refers to rows. For multiple tresholds: apply(mat88, 1, function(x) c( sum(x20), sum(x25), sum(x30))) Kjetil We have transferred the data to a more potent Linux box running R, but still hope to speed up the code. I know a for loop should be avoided when looking for speed. I also know the answer is in something like tapply, but my understanding of these commands is still to limited to see the solution. Could someone show me the way!? Thanks in advance, Sander. -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Random seed problem in MCMC coupling of chains
Gorjanc Gregor wrote: Thanks to Paul and Gabor for additional tips/examples. Actually, I find Pauls suggestion with setRNG also nice and is exactly what I wanted. Paul, if I understand this correctly, your suggestion with setRNG does not alter RNG flow, it just takes care that chains really have equal seeds. I remember that I have read somewhere that destroying RNG flow over and over to get real randomness is not a good idea. Can someone confirm this? That's in Brian Ripley's Simulation book, and certainly in other places. Kjetil niter - 3 nchain - 2 for (i in 1:niter) { # iterations tmpSeed - setRNG() for (j in 1:nchain) { # chains setRNG(tmpSeed) a - runif(1) cat(iter:, i, chain:, j, runif:, a, \n) } } iter: 1 chain: 1 runif: 0.8160078 iter: 1 chain: 2 runif: 0.8160078 iter: 2 chain: 1 runif: 0.4909793 iter: 2 chain: 2 runif: 0.4909793 iter: 3 chain: 1 runif: 0.4425924 iter: 3 chain: 2 runif: 0.4425924 [... removed other stuff ...] Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana Biotechnical FacultyURI: http://www.bfro.uni-lj.si/MR/ggorjan Zootechnical Department mail: gregor.gorjanc at bfro.uni-lj.si Groblje 3 tel: +386 (0)1 72 17 861 SI-1230 Domzale fax: +386 (0)1 72 17 888 Slovenia, Europe -- One must learn by doing the thing; for though you think you know it, you have no certainty until you try. Sophocles ~ 450 B.C. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R and MLE
Ajay Narottam Shah wrote: I learned R MLE in the last few days. It is great! I wrote up my explorations as http://www.mayin.org/ajayshah/KB/R/mle/mle.html I will be most happy if R gurus will look at this and comment on how it can be improved. I have a few specific questions: * Should one use optim() or should one use stats4::mle()? I felt that mle() wasn't adding much value compared with optim, and in addition, I wasn't able to marry my likelihood functions to it. * One very nice feature of mle() is that you can specify a few parameters which should be fixed in the estimation. How can one persuade optim() to behave like that? give optim() a function to optimize which do not depend on those parameters ... * Can one use deriv() and friends to get analytical derivatives of these likelihood functions? I found I wasn't able to make headway when I was using vector/matrix notation. I think the greatness of R lies in a lovely vector/matrix notation, and it seems like a shame to have to not use that when trying to do deriv(). * For iid problems, the computation of the likelihood function and it's gradient vector are inherently parallelisable. How would one go about doing this within R? Kjetil -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] function and apply
jose silva wrote: Dear all Its very difficult to read your post due to all the blank lines, so I cannot really read it very well, but what about writing your function as (not tried) fun - function(x) c(x[1], or if that doesn't work look into ?mapply But you should really look into the settings of your mail program! Kjetil I think my problem is not complicated but I'm having difficulties to solve it. v is a vector: v=c(p1 , p2 , p3 , p4), and f is a function: f : v - w , where w=c(p1 , p2*(1-p1) , p3*(1-p2)*(1-p1) , p4*(1-p3)*(1-p2)*(1-p1)) I write the function f as: f- function(w,x,y,z) {c(w,x*(1-w),y*(1-x)*(1-w),z*(1-y)*(1-x)*(1-w))} f(a,b,c,d) it works well. But now I want to apply f to each row of a data frame with 4 columns: d-data.frame(a=seq(1,10), b=seq(11,20), c=seq(21,30),d=seq(31,40)) t(apply(d,1,f)) is not working? I think each element of each row is not corresponding to w,x,y,z and now I'm lost? Can someone help me? thks J. Silva [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] geometric mean regression
Poizot Emmanuel wrote: Hi, is it possible to perform a geometric mean regression with R ? Thanks. As has been said on this list before, This is R, there is no if, only how, but if you actually wanted to ask how it is possible, it would help if you explained what is geometric mean regression. Kjetil Emmanuel Poizot Cnam/Intechmer B.P. 324 50103 Cherbourg Cedex Phone (Direct) : (00 33)(0)233887342 Fax : (00 33)(0)233887339 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html No virus found in this incoming message. Checked by AVG Anti-Virus. Version: 7.0.322 / Virus Database: 267.4.0 - Release Date: 01/06/2005 -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Dynamic Dictionary Data Type?
Prof Brian Ripley wrote: On Thu, 2 Jun 2005, hadley wickham wrote: An environment is a hash table, and copying occurs only on modification, when any language would have to copy in this context. Caveat: the default is new.env(hash=FALSE), so an environment is a hash table in one sense but not necessarily so in the strict sense. Yes, I'm aware that copying only occurs on modification. However, it was my understanding that a - list(a =1) a$b - 4 would create a new copy of a, Thomas was talking about environments, not lists (and $ works differently for them). whereas in Java say HashMap a = new HashMap(); a.put(a, 1); a.put(b, 2); wouldn't create a copy of a. (please bear in mind my java syntax is very rusty!) Caching data implies updating at each step, thus possibly creating n copies of the list. Is that wrong? It depends what you mean by `copy'. If you expand a hash table you at some point need to re-arrange the hashing of the entries. That's what will happen with an R environment or an R list too. The possible difference is that you might expect the table to have some room for expansion, and in your list example you did not give any. R's operations on lists make more copies than are strictly necessary, but it is not clear that this is more costly than the housekeeping that would otherwise be necessary. In a$b - 4, the wrapper VECSXP is recreated and the pointers copied across, but the list elements are not copied. For a$a - 4 it is probable that no copying is done (although if a2 - a had been done previously, the pending recursive copy would then be done). (It is also possible that I have overlooked something in the rather complex code used to do these subassignments.) The original poster asked for caching of results. here is an example using new.env(): memo - function(fun) { mem - new.env() function(x) { if (exists(as.character(x), envir=mem)) get(as.character(x), envir=mem, inherits=FALSE) else { val - fun(x) assign(as.character(x), value=val, envir=mem) val } } } fib - memo(function(n) if(n=1) 1 else fib(n-1)+fib(n-2)) system.time( fib(300) ) [1] 0.01 0.00 0.02 NA NA ls(get(mem, env=environment(fib))) *output supressed* To compare: system.time( {fib2 - function(n)if(n=1)1 else fib2(n-1)+fib2(n-2);fib2(30)}) [1] 8.07 0.08 12.75NANA (there is (at least) one problem with this solution: if the global workspace contains a variable `6`, it gives error. Why?) Kjetil -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Dynamic Dictionary Data Type?
Duncan Murdoch wrote: Gabor Grothendieck wrote: On 6/2/05, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Thu, 2 Jun 2005, hadley wickham wrote: An environment is a hash table, and copying occurs only on modification, when any language would have to copy in this context. Caveat: the default is new.env(hash=FALSE), so an environment is a hash table in one sense but not necessarily so in the strict sense. Can you expand on this? When would one use hash = TRUE vs. the default? It's an optimization question. hash = TRUE uses more memory and takes longer to set up, but will give faster searches in big environments. The current default assumes that environments are small so the effort of building the hash table is not worthwhile, and it does linear searches. I suspect that we might have the default wrong (or perhaps should make the transition from linear to hash search automatically at a certain threshold size), but I haven't done the testing necessary to show this. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html An example, with my example in a post some m inutes ago: memo - function(fun, hash) { mem - new.env(hash=hash) function(x) { if (exists(as.character(x), envir=mem)) get(as.character(x), envir=mem, inherits=FALSE) else { val - fun(x) assign(as.character(x), value=val, envir=mem) val } } } fib1 - memo( function(n) if(n=1)1 else fib1(n-1)+fib1(n-2), TRUE) fib2 - memo( function(n) if(n=1)1 else fib2(n-1)+fib2(n-2), FALSE) system.time( fib1(400) ) [1] 0.06 0.00 1.06 NA NA system.time( fib2(400) ) [1] 0.00 0.00 1.44 NA NA Kjetil -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Incompatibility with VGAM
I just discovered that when the VGAM package (not on CRAN) is loaded, glm() doesn't work. This is because VGAM defines a family function() which gets found by glm() in place of the family function from stats. Then VGAM:::family returns an object which doesn't have a $family component, (it has a component $vfamily). I thought namespaces should protect us from this happening? Kjetil -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] how to make legends on histograms
Andreas Zankl wrote: I have a histogram with histograms for several datasets superimposed. How can I add a legend that indicates which dataset uses which linetype? Thanks Andreas ?legend ?locator -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] comparing glm models - lower AIC but insignificant coefficients
Constantinos Antoniou wrote: Hello, I am a new R user and I am trying to estimate some generalized linear models (glm). I am trying to compare a model with a gaussian distribution and an identity link function, and a poisson model with a log link function. My problem is that while the gaussian model has significantly lower (i.e. better) AIC (Akaike Information Criterion) most of the coefficients are not significant. On the other hand, the poisson model has a higher (i.e. worse) AIC, but almost all the coefficients are extremely significant (expect for one that still has p=0.07). Summary output of the two models follows... [sorry for the large number of independent variables, but the issue is less pronounced with fewer covariates]. My question is two-fold: - AIC supposedly can be used to compare non-nested models (although there are concerns and I have also seen a couple in this list's archives). Is this a case where AIC is not a good measure to compare the two models? If so, is there another measure (besides choosing the model with the significant coefficients)? [These are time-series data, so I am also looking at acf/pacf of the residuals]. The topic of using AIC to compare non-nested models have been discussed on the list, please search. But even if AIC can be used to compare non-nested models, the AIC as calculated by R is not suited. The AIC includes an arbitrary additive constant, as the log-likelihood does. And this additive constant depend usually on constants in the density which are inconsequential for AIC, and may be omitted. And even if they were included, it seem doubtfull to me that this would help for comparision of Poisson and normal models, since the underlying measure is different! The experts can comment on that. That said, I would tend to use Poisson if I had count data and a poisson model looks remotely sensible. That will give a more interpretable model, which seems more important than purely data-analytic considerations. And lasstly, if the poisson assumptions seems reasonable, there will be a non-constant variance, and if you use a normal model you should use weighted least squares or tran sform the response (square root). If you try that, maybe you will see that the normal model give lower p-values for the coefficients. Also make a plot of residuals versus fitted value! - Could the very high significance of the coefficients in the poisson model hint at some issue? Maybe that the model fits better than the normal? Kjetil Thanking you in advance, Costas +++ POISSON - LOG LINK +++ Call: glm(formula = TotalDeadInjured[3:48] ~ -1 + Month[3:48] + sin(pi * Month[3:48]/6) + cos(pi * Month[3:48]/6) + sin(pi * Month[3:48]/ 12) + cos(pi * Month[3:48]/12) + ThousandCars[3:48] + monthcycle[3:48] + TotalDeadInjured[1:46] + I((TotalDeadInjured[1:46])^2) + I((TotalDeadInjured[1:46])^3), family = poisson(link = log)) Deviance Residuals: Min 1Q Median 3Q Max -3.6900 -1.1901 -0.1847 0.9477 4.3967 Coefficients: Estimate Std. Error z value Pr(|z|) Month[3:48] -7.712e-02 5.530e-03 -13.947 2e-16 *** sin(pi * Month[3:48]/6) -1.419e-01 2.759e-02 -5.144 2.68e-07 *** cos(pi * Month[3:48]/6) -8.407e-02 1.799e-02 -4.672 2.99e-06 *** sin(pi * Month[3:48]/12) -2.776e-02 1.558e-02 -1.782 0.074702 . cos(pi * Month[3:48]/12) 5.195e-02 1.608e-02 3.232 0.001231 ** ThousandCars[3:48] 2.733e-02 2.255e-03 12.118 2e-16 *** monthcycle[3:48] 6.307e-02 6.546e-03 9.635 2e-16 *** TotalDeadInjured[1:46]-2.925e-02 8.460e-03 -3.457 0.000546 *** I((TotalDeadInjured[1:46])^2) 1.218e-04 3.613e-05 3.370 0.000750 *** I((TotalDeadInjured[1:46])^3) -1.640e-07 4.961e-08 -3.306 0.000946 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 78694.70 on 46 degrees of freedom Residual deviance: 130.03 on 36 degrees of freedom AIC: 476.08 Number of Fisher Scoring iterations: 4 + GAUSSIAN ++ Call: glm(formula = TotalDeadInjured[3:48] ~ -1 + Month[3:48] + sin(pi * Month[3:48]/6) + cos(pi * Month[3:48]/6) + sin(pi * Month[3:48]/ 12) + cos(pi * Month[3:48]/12) + ThousandCars[3:48] + monthcycle[3:48] + TotalDeadInjured[1:46] + I((TotalDeadInjured[1:46])^2) + I((TotalDeadInjured[1:46])^3), family = gaussian(link = identity)) Deviance Residuals: Min 1Q Median 3Q Max -61.326 -12.012 -1.756 14.204 78.991 Coefficients: Estimate Std. Error t value Pr(|t|) Month[3:48] -8.111e+00 2.115e+00 -3.835 0.000487 *** sin(pi * Month[3:48]/6) -2.639e+01 1.095e+01 -2.409 0.021246 * cos(pi * Month[3:48]/6)
Re: [R] Suppressing warning messages
Tolga Uzuner wrote: How do I suppress the following ? Warning messages: 1: the condition has length 1 and only the first element will be used in: if (strike == forward) atmvol(forward, t, alpha, beta, rho, upsilon) else { 2: the condition has length 1 and only the first element will be used in: if (x(z) == 0) 1 else z/x(z) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Maybe better to understand what generates the warning! To assure you are uninformed, say options(warn=-1) Kjetil -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] eigenvalues of a circulant matrix
Globe Trotter wrote: Good point: the Bellman reference is a book: Introduction to Matrix Analysis by Bellman (1960). McGraw-Hill Series in Matrix Theory. --- and republished much later by SIAM. Kjetil --- Robin Hankin [EMAIL PROTECTED] wrote: Hi everyone. The following webpage gives a definition of circulant matrix, which agrees with the definition given in the magic package. http://mathworld.wolfram.com/CirculantMatrix.html best wishes rksh On May 3, 2005, at 08:06 am, Mulholland, Tom wrote: Well since I know nothing about this topic I have lurked so far, but here's my two bob's worth. Firstly I tried to make sense of Brian's initial reply. I have got no idea who Bellman is and you have not referenced (his/her) work in a way I can access the issues you refer to. So I assumed that's exactly what Brian was talking about. Secondly. toeplitz(1:4) [,1] [,2] [,3] [,4] [1,]1234 [2,]2123 [3,]3212 [4,]4321 require(magic) circulant(4) [,1] [,2] [,3] [,4] [1,]1234 [2,]4123 [3,]3412 [4,]2341 So they are obviously two different things. Although I think you may have implied (not stated) that the particular combination you were using resulted in both being exactly the same. It does appear as if in this case the (X) matrix is circulant. But then I'm no expert in even such simple things. Then I had no idea where I was going. So I tried the variations in eigen. I ran you code x-scan(h:/t.txt) y-x[c(109:216,1:108)] X-toeplitz(y) and then X[is.na(X)] numeric(0) So I didn't get any NAs t1 - eigen(X)$vectors t2 - eigen(X,symmetric = TRUE)$vectors identical(t1,t2) [1] TRUE Then t2 - eigen(X,symmetric = TRUE,EISPACK = TRUE)$vectors identical(t1,t2) [1] FALSE So there'e obviously more than one way of getting the vectors. Does the second one make more sense to you? I also noticed in the eigen help that there are references to issues such as IEEE 754 arithmetic,(They may also differ between methods and between platforms.) and or Hermitian if complex. All of these are out of my competence but they do signal to me that there are issues which may relate to hardware, digital arithmetic and other things of that ilk. I added the comment about complex because I have a vague idea that they are related to imaginary parts that you refer to. So not coming to any conclusion that makes sense to me, and given that there are often threads about supposed inaccuracies that have answers such as the digits you see are not always what are held by the machine I set my options(digits = 22) and noticed that some of the numbers are still going at the 22 decimal place suggesting that the machine might be incapable of producing perfectly accurate results using digital arithmetic. My other big sphere of ignorance is complex numbers. So I tried X-toeplitz(complex(real = y)) t1 - eigen(X)$vectors t1[1:20] [1] 0.068041577278880341+0i -0.068041577140546913+0i 0.068041576864811659+0i -0.068041576452430155+0i [5] 0.068041575907139579+0i -0.068041575231135451+0i 0.068041574435267163+0i -0.068041573525828514+0i [9] 0.068041572538722991+0i -0.068041571498323253+0i 0.068041570619888622+0i -0.068041570256170081+0i [13] 0.068041568759931989+0i -0.068041566476633147+0i 0.068041563560502477+0i -0.06804156305007+0i [17] 0.06804138765813+0i -0.068041549792984865+0i 0.068041544123969511+0i -0.068041537810956801+0i t2[1:20] [1] 0.068041381743976906 -0.068041381743976850 0.068041381743976781 -0.068041381743976753 0.068041381743976587 [6] -0.068041381743976725 0.068041381743976920 -0.068041381743976836 0.068041381743976892 -0.068041381743976781 [11] 0.068041381743976781 -0.068041381743977392 0.068041381743976725 -0.068041381743976753 0.068041381743976753 [16] -0.068041381743976698 0.068041381743976587 -0.068041381743976642 0.068041381743976698 -0.068041381743976490 Which is again different. I have no idea what I'm doing but you do seem to get slightly different answers depending upon which method you use. I do not know if one is superior to the others or where one draws the line in terms of accuracy. Tom -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Globe Trotter Sent: Tuesday, 3 May 2005 10:51 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] eigenvalues of a circulant matrix OK, here we go: I am submitting two attachments. The first is the datafile called kinv used to create my circulant matrix, using the following commands: x-scan(kinv) y-x[c(109:1,0:108)] X=toeplitz(y) eigen(X) write(X,ncol=216,file=test.dat) reports the following columns full of NaN's: 18, 58, 194, 200. (Note that eigen(X,symmetric=T) makes no difference and I get the same as above). The second attachment contains only the eigenvectors obtained
Re: [R] How to prove R as good (Was: (no subject))
rene.raupp wrote: Does anybory knows any work comparing R with other (charged) statistical softwares (like Minitab, SPSS, SAS)? I work in a brasilian government bureau and I intend to use R as our preferable statistical software, but I have to show it's as good as the others. Sorry. That will be difficult. Could'nt it do to prove it is better? Kjetil I also intend to use Weka, and for this one I have the same problem. Can anyone help me? Thanks René M. Raupp e-mail: [EMAIL PROTECTED] [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- Internal Virus Database is out-of-date. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Roots of quadratic system.
John Janmaat wrote: Hello, I have a system of quadratic equations (results of a Hamiltonian optimization) which I need to find the roots for. Is there a package and/or function which will find the roots for a quadratic system? Certainly you cxould use solve, see ?solve Alternatively you could go for a computer algebra system with an implemantation of groebner basis, and use an symbolic method. Kjetil Note that I am not opimizing, but rather solving the first order conditions which come from a Hamiltonian. I am basically looking for something in R that will do the same thing as fsolve in Matlab. Thanks, John. == Dr. John Janmaat Department of Economics Acadia University Tel: 902-585-1461 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Warning from Rcmd check - data could not find data set
This is rw2010 from CRAN. When running Rcmd check on a package I get: Warning in utils::data(list = al, envir = data_env) : data set 'vowel.test' not found Warning in utils::data(list = al, envir = data_env) : data set 'vowel.train' not found Warning in utils::data(list = al, envir = data_env) : data set 'waveform.test' not found Warning in utils::data(list = al, envir = data_env) : data set 'waveform.train' not found However, I have no problem with this when using the package. This datasets are loaded, multiple datasets at a time, under another name. data(vowel) loads the two first in the list above. Could it be this (which should be allowed, is mentioned in writing R extensions) or is it something else, or a bug? Kjetil -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] can test the if relationship is significant in cancor?
ronggui wrote: i have try hard to find the answer by google,but i can not find any solution. so i wan to ask: 1,can we test the if canonical relationship is significant after using cancor? One reference is T. W. Anderson: An Introduction to Multivariate Statistical Analysis, second edition, pages 497-498. 2,if it can,how? Following the reference above: cancor.test - function(obj, N){ # obj is object returned from cancor # N is sample size, which is not contained in the cancor object! p1 - NROW(obj$xcoef) p2 - NROW(obj$ycoef) p - p1 + p2 r - length(obj$cor) # Calculating Bartlett modification of minus twice log likelihood: bartlett - -(N-0.5*(p+3))*sum( log( 1-obj$cor^2)) # which is approximately chi-squared with p1p2 degrees of freedom: list(bartlett=bartlett, p.value=pchisq(bartlett, df=p1*p2, lower.tail=FALSE)) } This tests if ALLl the canonical correlations are zero. Anybody knows how good this approximation is, and how dependent on multivariate normality? Kjetil 3,if not,is it under-developed or there is not need to do it?or there is no good way to do it? i hope my question is not too silly. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Data set for loglinear analysis
Warfield Jr., Joseph D. wrote: Dear users I need to perform a loglinear analysis of a real data set for a course project. I need a real data set with contingency tables in at least 3 dimensional, each with more than 2 levels. Thanks Joe Warfield [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Do data(package=datasets) and look. maybe data(UCBAdmissions) Kjetil -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- Internal Virus Database is out-of-date. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] custom loss function + nonlinear models
Christian Mora wrote: Hi all; I'm trying to fit a reparameterization of the assymptotic regression model as that shown in Ratkowsky (1990) page 96. Y~y1+(((y2-y1)*(1-((y2-y3)/(y3-y1))^(2*(X-x1)/(x2-x1/(1-((y2-y3)/(y3-y1))^2)) where y1,y2,y3 are expected-values for X=x1, X=x2, and X=average(x1,x2), respectively. I tried first with Statistica v7 by LS and Gauss-Newton algorithm without success (no convergence: predictors are redundant). Then I tried with the option CUSTOM LOSS FUNCTION and several algorithms like Quasi-Newton, Simplex, Hookes-Jeeves, among others. In all these cases the model converged to some values for the parameters in it. My question is (after searching the help pages) : Is there such a thing implemented in R or can it be easily implemented? In other words, is it possible to define which loss function to use and the algorithm to find the parameters estimates? Thanks Christian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html try directly with optim() Kjetil -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- Internal Virus Database is out-of-date. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] 'skewing' a normal random variable
Ashraf Chaudhary wrote: Hi All; The following question is directed more to statisticians on the list. Suppose I have a normal random variable. Is there a transformation that I can apply so that the new variable is slightly positively skewed. I greatly appreciate your help. Ashraf __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html look at the sn package on CRAN --- skew normal distributions, m and the web page of its author. -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- Internal Virus Database is out-of-date. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Where can I found the package ordinal ?
Peter Dalgaard wrote: ¢é ¨ [EMAIL PROTECTED] writes: Hello,dear all: I want to install the package ordinal,but I don't see the package listed under package sources. I try to search it by google,then I found this: http://euridice.tue.nl/~plindsey/rlibs.html but the connect does not work. Where can I found the package ordinal ? Is it still available? Try http://fdg-popgen146.unimaas.nl/~plindsey/rlibs.html (I really don't know why I should even take the time Googling for Jim Pat's current whereabouts. They clearly do not believe in neither version control, CRAN, nor in following standard terminology.) By the way, ordinal does not compile under rw2010dev. Kjetil -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Bivariate lognormal distribution
Vicky Landsman wrote: Thanks to Prof. Ripley, Kjetil and Spencer Graves for help. I will be more specific. I have to simulate a bivariate lognormal pair (Y1,Y0) where E(Y1)=X'b, E(Y0)=X'd, Var(Y1)=c1, Var(Y0)=c0, X is a data matrix, and b and d are vectors of parameters. Vicky. You did'nt specify the dependence! Kjetil - Original Message - From: Spencer Graves [EMAIL PROTECTED] To: Prof Brian Ripley [EMAIL PROTECTED] Cc: Vicky Landsman [EMAIL PROTECTED]; R-help list R-help@stat.math.ethz.ch Sent: Friday, March 25, 2005 4:40 PM Subject: Re: [R] Bivariate lognormal distribution I hope Professor Ripley will correct me if I'm mistaken, but the documentation for mvrnorm in library(MASS) says it will, Simulate from a Multivariate Normal Distribution. If you want the density function or probabilities or quantiles, you can get those from library(mvtnorm). Just for completeness, to use normal for a lognormal, you need to take the logarithms of your number (which must be all positive; zeros and negative numbers become NA), then compute mean vector and variance matrix of the logs, compute probabilities on the log scale, then back transform by exponentiating to get the results back into the original scale. hope this helps. spencer graves Prof Brian Ripley wrote: On Thu, 24 Mar 2005, Vicky Landsman wrote: Is there a package that enables to create the bivariate log-normal variables? Just exponentiate each of a bivariate normal pair. You can get the latter from mvrnorm in package MASS. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] mixtures as outcome variables
Kjetil Brinchmann Halvorsen wrote: Jason W. Martinez wrote: Dear R-users, I have an outcome variable and I'm unsure about how to treat it. Any advice? If you only concentrate on the relative proportions, this are called compositional data. I f your data are in mydata (n x 4), you obtain compositions by sweep(mydata, 1, apply(mydata, 1, sum), /) There are not (AFAIK) specific functions/packages for R for compositional data AFAIK, but you can try googling. Aitchison has a monography (Chapman Hall) and a paper in JRSS B. One way to start might be lm's or anova on the symmetric logratio transform of the compositons. The R function lm can take a multivariate response, but some extra programming will be needed for interpretation. With simulated data: slr function(y) { # y should sum to 1 v - log(y) return( v - mean(v) ) } testdata - matrix( rgamma(120, 2,3), 30, 4) str(testdata) num [1:30, 1:4] 0.200 0.414 0.311 2.145 0.233 ... comp - sweep(testdata, 1, apply(testdata,1,sum), /) # To get the symmetric logratio transform: comp - t(apply(comp, 1, slr)) # Observe: apply(cov(comp), 1, sum) [1] -5.551115e-17 2.775558e-17 5.551115e-17 -2.775558e-17 lm( comp ~ 1) Call: lm(formula = comp ~ 1) Coefficients: [,1] [,2] [,3] [,4] (Intercept) 0.17606 0.06165 -0.03783 -0.19988 Followup: mmod - manova(comp ~ x) summary(mmod) Error in summary.manova(mmod) : residuals have rank 3 4 So the manova() function cannot be used. I guess MANOVA for compositional data should be a straight extension, but it must be programmed , standard manova cannot be used. Kjetil -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Bivariate lognormal distribution
Vicky Landsman wrote: Dear experts! Is there a package that enables to create the bivariate log-normal variables? What do you mean by create? If you mean simulate, why not use mvrnorm from MASS, and then exponentiate? Kjetil Thanks a lot, Vicky Landsman. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] mixtures as outcome variables
Jason W. Martinez wrote: Dear R-users, I have an outcome variable and I'm unsure about how to treat it. Any advice? I have spending data for each county in the state of California (N=58). Each county has been allocated money to spend on any one of the following four categories: A, B, C, and D. Each county may spend the money in any way they see fit. This also means that the county need not spend all the money that was allocated to them. The data structure looks something like the one below: COUNTYAB C DTotal alameda 2534221 192 2835475 3063249 9988537 alpine 3174 850004555855232 amador0 000 0 The goal is to explain variation in spending patterns, which are presumably the result of characteristics for each county. I may treat the problem like a simple linear regression problem for each category, but by definition, money spent in one category will take away the amount of money that can be spent in any other category---and each county is not allocated the same amount of money to spend. I have constructed proportions of amount spent on each category and have conducted quasibinomial regression, on each dependent outcome but that does not seem very convincing to me. Would anyone have any advice about how to treat an outcome variable of this sort? Thanks for any hints! Jason If you only concentrate on the relative proportions, this are called compositional data. I f your data are in mydata (n x 4), you obtain compositions by sweep(mydata, 1, apply(mydata, 1, sum), /) There are not (AFAIK) specific functions/packages for R for compositional data AFAIK, but you can try googling. Aitchison has a monography (Chapman Hall) and a paper in JRSS B. One way to start might be lm's or anova on the symmetric logratio transform of the compositons. The R function lm can take a multivariate response, but some extra programming will be needed for interpretation. With simulated data: slr function(y) { # y should sum to 1 v - log(y) return( v - mean(v) ) } testdata - matrix( rgamma(120, 2,3), 30, 4) str(testdata) num [1:30, 1:4] 0.200 0.414 0.311 2.145 0.233 ... comp - sweep(testdata, 1, apply(testdata,1,sum), /) # To get the symmetric logratio transform: comp - t(apply(comp, 1, slr)) # Observe: apply(cov(comp), 1, sum) [1] -5.551115e-17 2.775558e-17 5.551115e-17 -2.775558e-17 lm( comp ~ 1) Call: lm(formula = comp ~ 1) Coefficients: [,1] [,2] [,3] [,4] (Intercept) 0.17606 0.06165 -0.03783 -0.19988 summary(lm( comp ~ 1)) Response Y1 : Call: lm(formula = Y1 ~ 1) Residuals: Min 1Q Median 3Q Max -1.29004 -0.46725 -0.07657 0.55834 1.20551 Coefficients: Estimate Std. Error t value Pr(|t|) [1,] 0.1761 0.1265 1.3910.175 Residual standard error: 0.6931 on 29 degrees of freedom Response Y2 : Call: lm(formula = Y2 ~ 1) Residuals: Min 1Q Median 3Q Max -1.2982 -0.5711 -0.1355 0.5424 1.6598 Coefficients: Estimate Std. Error t value Pr(|t|) [1,] 0.061650.150490.410.685 Residual standard error: 0.8242 on 29 degrees of freedom Response Y3 : Call: lm(formula = Y3 ~ 1) Residuals: Min 1Q Median 3Q Max -1.97529 -0.41115 0.03666 0.42785 0.88567 Coefficients: Estimate Std. Error t value Pr(|t|) [1,] -0.037830.11623 -0.3250.747 Residual standard error: 0.6366 on 29 degrees of freedom Response Y4 : Call: lm(formula = Y4 ~ 1) Residuals: Min 1Q Median 3Q Max -2.8513 -0.3955 0.2815 0.5939 1.2475 Coefficients: Estimate Std. Error t value Pr(|t|) [1,] -0.1999 0.1620 -1.2340.227 Residual standard error: 0.8872 on 29 degrees of freedom Sorry for not being of more help! Kjetil -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Question on statistics
Roy Werkman wrote: Yes, it is discrete, but the underlying distribution is Gaussian. / I guess you mean what somebody calls the superpopulation distribution. Kjetil / Just got the following from a college: Var(mean of finite population) = ((N - n)/(N - 1)) * var(population) / n This should be it... Greetings, Roy -Original Message- From: Liaw, Andy [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 23, 2005 2:17 PM To: Roy Werkman; r-help@stat.math.ethz.ch Subject: RE: [R] Question on statistics If the sample is drawn with replacement from the finite population, then the usual formula applies (assuming iid samples); i.e., var(sample mean) = var(population) / n. There's some problem in your description: A finite population, I believe, is necessarily discrete (since there are only N possible values), so it can not be Gaussian (i.e., normal). Andy From: Roy Werkman Ehh, by limited distribution, I meant to say a population of N points. ... Hi, Can anyone help me with the following (although not directly correlated to R functionality)? I have been looking on the internet but can not find the answer. My question: what is the variation on the mean of a limited distribution (total N points normally distributed), when I have a small sample of that distribution (n N)? Your help would be very welcome. Thanx, Roy -- The information contained in this communication and any\ att...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. -- -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Generating Interaction Factors (combinations of Data Frame columns)
Thomas Hopper wrote: I'm starting to do a fair amount of DOE in my day job and need to generate full- and fractional-factorial designs. One of the things I'd like to do is generate all possible interaction effects, given the main effects. I've been searching through the documentation, packages and mail list archives, but the closest I can find are combin() in package combinat and combine() and combinations() in gregsmisc, none of which actually produces the results I want. Given a data frame with columns labeled A, B, C and D, I would like to generate a data frame with columns that are the combination of each of the columns in the original data frame. The output columns would be A*B, A*C, A*D, A*E, A*B*C, A*B*D,..., A*B*C*D. Alternatively, I'd want to generate the interactions for a given level (2-factor or 3-factor). If such a function already exists, I'd be more than happy to use it. If it doesn't, I can write it, but I would appreciate a little help with the algorithm for generating the combinations...how do I loop through the given factors to generate all possible combinations? Thanks, Tom __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html library(BHH2) # on CRAN ?ffDesMatrix ?ffFullMatrix -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Hazard function or cumulative Hazard function in R
yassir rabhi wrote: Hi, I'm student from canada, and i'work in survival analysis.I want to know if there is a hazard function or cumulative hazard function in R or not, i know how to program it, but it is easy to use it if they exists in R. Thanks. Yassir __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html library(survival) -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Basic questions about RMySQL
De la Vega Góngora Jorge wrote: Hello, Please forget me if I am asking something that is well documented. I have read documentation but there are points that are not clear for me. I am not expert in R nor Databases, but if someone direct me to a tutorial, I will appreciate it.. 1. In my understanding, I can install and use RMySQL withouth having to install MySQL in my PC, to have access to and to create new tables . Is this right? I doubt very mucg if RMySQP will be of any use without having MySQL installed! Kjetil 2. I have created a c:\my.cnf file to access a database I have, but withouth installing the server, where I can define the user, password and host to establish a connection? Thanks in advance --- Jorge de la Vega Gongora | Telefono: (525) 5268 8379 Investigador | Fax: (525) 5268 8481 Banco de Mexico | email: [EMAIL PROTECTED] Planeación y Programación de Emisión | web:http://www.stat.umn.edu/~jvega Calzada Legaria 691 Módulo IV| Col. Irrigación 11500| __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: leap years (Was: [R] lm and time series)
A followup: How do people treat dsaily time series, when there is a yearly cycle? For the moment I just left out the 29 of February's, but that dsoes'nt look really good. Is the only way to include them to use irregular time series? Kjetil Gabor Grothendieck wrote: From: Matthieu Cornec [EMAIL PROTECTED] I create a multivariate time series containing NA values (that could come directly from an imported file,) I want to compute a linear regression and obtain a time serie for both residuals and fitted values. I have tried the trick ts.intersect, without success. Could you help me out of this? Example: y-ts(1:10+rnorm(10)) x-ts(1:10) datats-cbind(y,lagx=lag(x)) Notice the datats could come directly from an imported file, that is why I did not use ts.intersect(y,lagx=lag(x)) fit-lm(y~lagx,data=datats,na.action=na.omit) but how do I get a time serie of residuals instead of a vector residuals(fit)? ## Matthieu Cornec ts is used for regular time series. Removing NAs, other than at the beginning or end, means its probably best to model it as an irregular time series and so to use an irregular time series package. Below it is done in zoo. Also review the comments in my post to your previous question along these lines and, in particular, be sure you read the zoo vignette referenced there which has 15 pages of examples of time series manipulations. library(zoo) # set up test data with NAs set.seed(1) x - zoo(1:10) y - x + rnorm(10) y[5] - x[2] - NA # create multivariate zoo series without NAs # Note: if you want to fill in NAs rather than omit them see ?na.locf z - na.omit(merge(y, lagx = lag(x, -1))) # run lm # (This also works: z.lm - lm(I(y ~ lagx), z) # but the syntax is experimental.) z.lm - lm(y ~ lagx, as.data.frame(z)) # get fitted and resid using fact that their time base is that of z z.fit - z.resid - z[,1] z.fit[] - fitted(z.lm) z.resid[] - resid(z.lm) # We can just use the zoo series already created. Its not really # necessary to convert it to ts but if for some reason we want a # ts series the following creates one. # (This uses facts that we know y starts at 1 and is regularly spaced # and other series have a subset of the time base of y.) ts(coredata(merge(y, x, z.fit, z.resid))) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Rconsole wishlist
Duncan Mackay wrote: Sorry, yes, Rgui under WinXP (SP2). But while Windows date stamps the history file file.info(.Rhistory) size isdir mode mtime ctime .Rhistory 5377 FALSE 666 2005-03-04 10:37:52 2005-03-04 10:37:52 atime .Rhistory 2005-03-04 13:54:11 the problem is that there can be multiple sessions stored in .Rhistory and the session dates aren't stored there. Moreover, it seems to me that the history buffer can also overflow without warning after long sessions or many repeated sessions and so that you can inadvertently lose parts of your command log. Yes. But you can define an environment variable R_HISTSIZE (or some similar name, do a R site search to find. Not defined on my machine now. That should really be in the help file for savehistory()) to avoid the problem. (Is this right, anyone?) Perhaps it would be preferable for R to save each session's command history in a separate history file, along the lines of NO. it is better to have just one file as now. Kjetil .Last - function() { savefilename - paste(Rhistory,date()) savefilename - gsub( ,_,savefilename) savefilename - gsub(:,.,savefilename) savefilename - paste(savefilename,.txt,sep=) if(interactive()) try(savehistory(savefilename)) cat(Current history saved in file: ,savefilename,\n) } but this doesn't address any overflow issues. Duncan -Original Message- From: Liaw, Andy [mailto:[EMAIL PROTECTED] Sent: Friday, 4 March 2005 11:14 AM To: 'Duncan Mackay'; R-news Subject: RE: [R] Rconsole wishlist I'm guessing you're talking about Rgui on Windows, but please don't leave us guessing. If you run R under Ess/(X)Emacs, you have the entire session that can be saved in a (transcript) file. Does your OS not put date stamps on file? file.info(.Rhistory) size isdir mode mtime ctime .Rhistory 1025 FALSE 666 2005-03-03 19:27:31 2004-08-13 10:45:09 atime .Rhistory 2005-03-03 19:27:31 Andy From: Duncan Mackay Hi all, Wouldn't it be nice (??!!) if R automatically issued a warning message when the R console buffer was about to fill so that you could save all your output into a text file? (I know about sink(), but I think it would be good to have an easier mechanism to save a complete record of messages and function output). And on a similar vein, wouldn't it also be nice if R automatically entered a date stamp into the history file?? Duncan * Dr. Duncan Mackay School of Biological Sciences Flinders University GPO Box 2100 Adelaide S.A.5001 AUSTRALIA Ph (08) 8201 2627FAX (08) 8201 3015 http://www.scieng.flinders.edu.au/biology/people/mackay_d/index.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] draw random samples from empirical distribution
bogdan romocea wrote: Dear useRs, I have an empirical distribution (not normal etc) and I want to draw random samples from it. One solution I can think of is to compute let's say 100 quantiles, then use runif() to draw a random number Q between 1 and 100, and finally run runif() again to pull a random value from the quantile Q. Is there perhaps a better/more elegant way of doing this? Thank you, b. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html sample(mysample, n, replace=TRUE) Kjetil -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Bayesian stepwise
Spencer Graves wrote: Does anyone know of research fully Bayesian stepwise procedures assuming that models not considered by the stepwise would essentially have zero posterior probability? I need to analyze the results of ad hoc experiments run in manufacturing with crazy confounding and possible supersaturation (i.e., more potentially explanatory variables than runs), when each run is very expensive in both time and money. There have to be ways to summarize concisely and intelligently what the data can tell us and what remains uncertain, What about chapter 8 section 5 page 363 in Wu/Hamada Experiments --- Planning , analysis, and Parameter design Optimization, titled: A Bayesian variable selection Strategy for Designs with complex Aliasing Kjetil including the level of partial confounding between alternative explanations. I think I've gotten reasonable results with my own modification of Venables Ripley's stepAIC to compute an approximate posterior over tested models using the AICc criterion described, e.g., by Burnham and Anderson (2002) Model Selection and Multi-Model Inference (Springer). Preliminary simulations showed that when I used the naive prior (that all models are equally likely, including the null model), the null model is usually rejected when true. What a surprise! I think I can fix that using a more intelligent prior. I also think I can evaluate the partial confounding between alternative models by studying the correlation matrix between the predictions of alternative models. Comments? Thanks, Spencer Graves Frank E Harrell Jr wrote: Smit, Robin wrote: I am hoping to get some advise on the following: I am looking for an automatic variable selection procedure to reduce the number of potential predictor variables (~ 50) in a multiple regression model. I would be interested to use the forward stepwise regression using the partial F test. I have looked into possible R-functions but could not find this particular approach. There is a function (stepAIC) that uses the Akaike criterion or Mallow's Cp criterion. In addition, the drop1 and add1 functions came closest to what I want but with them I cannot perform the required procedure. Do you have any ideas? Kind regards, Robin Smit Business Unit TNO Automotive Environmental Studies Testing PO Box 6033, 2600 JA Delft THE NETHERLANDS Robin, If you are looking for a method that does not offer the best predictive accuracy and that violates every aspect of statistical inference, you are on the right track. See http://www.stata.com/support/faqs/stat/stepwise.html for details. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Warnings by functions mean(), median()
[EMAIL PROTECTED] wrote: Hello, following functions doesnt work correct with my data: median(), geo.mean(). My datafiles contain more than 10.000 lines and six columns from a flow-cytometer-measurment. I need the arithmetic and geometric mean and median. For the calculation of the geometric mean i wrote following function: fix(geo.mean) function(x) { n-length(x) gm-prod(x)^(1/n) This is probably what gives the NaN below. exp(mean(log(x))) would be more to the point. Kjetil return(gm) } The function median() error tells me need numeric data. The data are numeric. The function geo.mean() gave out [1] NaN. What are the reasons and what are the solutions? I'am a newbie and need urgently information. Thanks. Here is an short output with the results: 9997 385.42 68.54 9.82 124.09 23.93 138.24 9998 342.89 73.65 133.35 1134.19 345.99 1876.88 316.23 76.35 48.26 421.70 129.80 873.79 1 291.64 103.66 6.85 107.46 26.42 189.38 100010.000.00 0.000.00 0.000.00 mean(data) FSC SSC FL1 FL2 FL32 FL4 375.94880 73.76219 50.73413 434.42837 110.06393 637.34980 geo.mean(data) [1] NaN median(data) Error in median(data) : need numeric data __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] (arbitrary) precision
Gabor Grothendieck wrote: Michael Grottke Michael.Grottke at duke.edu writes: : I am currently using R for fitting a model to various data sets : (minimizing the negative log-likelihood) and calculating a number of : metrics based on the parameter estimates. Within these calculations, I : have steps of the form : : log(log(1+x)), : : where x can be very small (e.g., around 1e-16). Unfortunately, the : precision of doubles does not suffice for coping with the difference in : the orders of magnitude of 1 and x: 1+x is rounded to 1. : : One way for solving this problem seems to be to use an arbitrary : precision library implemented in C and call the respective routines for : calculating the logarithm(s) from within R. : : My questions are as follows: : 1. Is there any better/more direct way to solve the problem? : 2. Is there any arbitrary precision library you can suggest in particular? : The approximation log(1+x) = x would be accuate to several decimal places in your case so your expression would reduce to log(log(1+x)) = log(x). __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html see also ?log1p Kjetil -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Profiles graphics in a contingency table
Campo Elías PARDO wrote: Dear Users, How can I obtain a profiles graphic in a CT similar to Excel. Campo Elias PARDO [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html If you mean profiles as in correspondence analysis, just do: ( assuming your table is mytab) N - sum(mytab) Nr - rowSums(mytab) Nc - colSums(mytab) perfRow - mytab for (i in 1:NROW(mytab)) }{ perfRow[i,] - perfRow[i,]/Nr[i] } perfRow and then you can just plot them, maybe using lines(). But have a look into ?mosaicplot Kjetil -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Displaying a distribution -- was: Combining two histograms
Berton Gunter wrote: Rug plots cannot show replicated values -- spikes at discrete values -- in the data distribution. Quantile plots do. But you can use rug(jitter(x)) Kjetil -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck Sent: Wednesday, February 02, 2005 9:20 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] Displaying a distribution -- was: Combining two histograms Berton Gunter gunter.berton at gene.com writes: : : May I take this off topic a little to seek collective wisdom (and so feel : free to reply privately). : : The catalyst is Deepayan's remark: : : Histograms were appropriate for drawing density estimates by : hand in the good old days, but I can imagine very few situations where I : would not prefer to use smoother density estimates when I have the : computational power to do so. : : Deepayan : : Generally, I agree; but the appearance and thus one's perception and : interpretation of both histograms and density plots depend upon the : parameters chosen for the display (bin boundaries for histograms; bandwidth : and kernel for density plots). If the problem is distrust of the procedure one can simulataneously display the raw data with a rug plot, example(rug) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Finding runs of TRUE in binary vector
Sean Davis wrote: I have a binary vector and I want to find all regions of that vector that are runs of TRUE (or FALSE). a - rnorm(10) b - a0.5 b [1] TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE My function would return something like a list: region[[1]] 1,3 region[[2]] 5,5 region[[3]] 7,10 Any ideas besides looping and setting start and ends directly? ?rle Thanks, Sean __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Conflicts using Rcmdr, nlme and lme4
Douglas Bates wrote: I can guarantee that this is the last time that I will fundamentally redesign the computational methods and data structures used in fitting mixed-effects models. This is the fifth time I have done it from scratch and after this one I am quitting the business. Hope that does'nt mean you are quitting completely from statistical computing and R? Kjetil __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Data Simulation in R
Uwe Ligges wrote: Doran, Harold wrote: Thanks. But, I think I am doing that. I use rm() and gc() as the code moves along. The datasets are stored as a list. Is there a way that I can save the list object and call each dataset within a list one at a time, or must the entire list be in memory at once? The list is in memory - and must be to access its elements. Either save the list elements to separate files, or even better make use of a database. Uwe Ligges Or, when the dat is simulated, why can't you just (re)-simulate the dataset just before using it, then delete, but saving the random seed, so you can re-simulate if needed? Kjetil Harold -Original Message- From: Uwe Ligges [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 19, 2005 5:51 AM To: Doran, Harold Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Data Simulation in R Doran, Harold wrote: Dear List: A few weeks ago I posted some questions regarding data simulation and received some very helpful comments, thank you. I have modified my code accordingly and have made some progress. However, I now am facing a new challenge along similar lines. I am attempting to simulate 250 datasets and then run the data through a linear model. I use rm() and gc() as I move along to clean up the workspace and preserve memory. However, my aim is to use sample sizes of 5,000 and 10,000. By any measure this is a huge task. My machine has 2GB RAM and a Pentium 4 2.8 GHz machine with Windows XP. I have the following in the target section of the Windows shortcut --max-mem-size=1812M With such large samples, R is unable to perform the analysis, at least with the code I have developed. It halts when it runs out of memory. A collegue subsequently constructed the simulation using another software program with a similar computer and, while it took over night (and then some), the program produced the results desired. I am curious if it is the case that such large simulations are out of the grasp of R or if my code is not adequately organized to perform the simulation. I would appreciate any thoughts or advice. Don't hold all datasets (and results, if they are big) in the memory at the same time!!! Either generate them when you use them and delete them afterwards, or save them to disc an only load one by one for further analyses. Also, you might want to call gc() after you removed large objects... Uwe Ligges Harold library(MASS) library(nlme) mu-c(100,150,200,250) Sigma-matrix(c(400,80,80,80,80,400,80,80,80,80,400,80,80,80,80,400),4 ,4 ) mu2-c(0,0,0) Sigma2-diag(64,3) sample.size-5000 N-250 #Number of datasets #Take a single draw from VL distribution vl.error-mvrnorm(n=N, mu2, Sigma2) #Step 1 Create Data Data - lapply(seq(N), function(x) as.data.frame(cbind(1:10,mvrnorm(n=sample.size, mu, Sigma #Step 2 Add Vertical Linking Error for(i in seq(along=Data)){ Data[[i]]$V6 - Data[[i]]$V2 Data[[i]]$V7 - Data[[i]]$V3 + vl.error[i,1] Data[[i]]$V8 - Data[[i]]$V4 + vl.error[i,2] Data[[i]]$V9 - Data[[i]]$V5 + vl.error[i,3] } #Step 3 Restructure for Longitudinal Analysis long - lapply(Data, function(x) reshape(x, idvar=Data[[i]]$V1, varying=list(c(names(Data[[i]])[2:5]),c(names(Data[[i]])[6:9])), v.names=c(score.1,score.2), direction=long)) # #Clean up Workspace rm(Data,vl.error) gc() # # Step 4 Run GLS glsrun1 - lapply(long, function(x) gls(score.1~I(time-1), data=x, correlation=corAR1(form=~1|V1), method='ML')) # Extract intercepts and slopes int1 - sapply(glsrun1, function(x) x$coefficient[1]) slo1 - sapply(glsrun1, function(x) x$coefficient[2]) #Clean up workspace rm(glsrun1) gc() glsrun2 - lapply(long, function(x) gls(score.2~I(time-1), data=x, correlation=corAR1(form=~1|V1), method='ML')) # Extract intercepts and slopes int2 - sapply(glsrun2, function(x) x$coefficient[1]) slo2 - sapply(glsrun2, function(x) x$coefficient[2]) #Clean up workspace rm(glsrun2) gc() # Print Results cat(Original Standard Errors,\n, Intercept,\t, sd(int1),\n,Slope,\t,\t, sd(slo1),\n) cat(Modified Standard Errors,\n, Intercept,\t, sd(int2),\n,Slope,\t,\t, sd(slo2),\n) rm(list=ls()) gc() [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
Re: [R] way off topic
Erin Hodgess wrote: Dear R People: Here is another off topic question, please: Does anyone know where to find some archaelogical data (carbon dating), please? http://lib.stat.cmu.edu/DASL/ has archeology as a topic. Kjetil When I googled, I got reseach papers but no data. Thanks, Sincerely, Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html