Re: [R] roc and lattice
On 1/9/07, rhk [EMAIL PROTECTED] wrote: Hello, I am afraid I do not fully understand all intricacies of programming in lattice plots. In the code below I try to plot an ROC curve, following R-news 4(1). When I condition on the variable 'group' I get the error message below, when I plot the curve for all data (i.e., y ~ pred.prob), I get the plot I want. Can someone point out why conditioning gives that message? Thanks, Ruud plot.a - xyplot(y ~ pred.prob|group, data=x.df, + xlim=c(0,1),xlab=1-specificiteit, + ylab=sensitiviteit, + panel=function(x,y,subscripts,...){ + DD - table(-x,y) + sens - cumsum(DD[,2])/sum(DD[,2]) + mspec - cumsum(DD[,1])/sum(DD[,1]) + panel.xyplot(mspec,sens,type=l,...) + panel.abline(0,1) + }) print(plot.a) Error in panel(x = c(0.000265710002003069, 0.000345712857778025, 0.000265710002003069, : subscript out of bounds __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Since you haven't bothered to give us a reproducible example, all we can say is that somewhere in your code, a subscript is out of bounds. Since the only place where you use subscripts is as DD[,2] etc, I would start by checking if DD indeed has two columns and two rows as you seem to expect. Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] where is the NIR dataset?
I did just the download of the pls package, but the NIR dataset is not available require(pls) [1] TRUE data(NIR) Warning message: data set 'NIR' not found in: data(NIR) is there another package with the dataset for the examples? With regards Carmen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] where is the NIR dataset?
NIR seems to be not dataset. It is a member of dataset yarn. Try: library(pls) data(yarn) str(yarn) On 1/10/07, Carmen Meier [EMAIL PROTECTED] wrote: I did just the download of the pls package, but the NIR dataset is not available require(pls) [1] TRUE data(NIR) Warning message: data set 'NIR' not found in: data(NIR) is there another package with the dataset for the examples? With regards Carmen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] where is the NIR dataset?
See changelog of pls 2.0-0 - The 'NIR' and 'sensory' data sets have been renamed to 'yarn' and 'oliveoil'. you can see it in package source from CRAN http://cran.r-project.org/src/contrib/Descriptions/pls.html HTH On 1/10/07, Carmen Meier [EMAIL PROTECTED] wrote: talepanda schrieb: NIR seems to be not dataset. It is a member of dataset yarn. Try: but i seems that it was a dataset NIR: have a look to a former question: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/47269.html / require(pls) / / data(NIR) / / class(NIR) / / [1] data.frame / / / With regards Carmen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] randomForest and missing data
On Tue, 9 Jan 2007, Bálint Czúcz wrote: There is an improved version of the original random forest algorithm available in the party package (you can find some additional information on the details here: http://www.stat.uni-muenchen.de/sfb386/papers/dsp/paper490.pdf ). I do not know whether it yields a solution to your problem about missing data, but maybe it's a check worth... yes, `cforest()' is able to deal with missing values. More specifically, the implementation is based on conditional trees (`ctree()') which are able to set up surrogate splits. Torsten Best regards: Bálint On 1/4/07, Darin A. England [EMAIL PROTECTED] wrote: Does anyone know a reason why, in principle, a call to randomForest cannot accept a data frame with missing predictor values? If each individual tree is built using CART, then it seems like this should be possible. (I understand that one may impute missing values using rfImpute or some other method, but I would like to avoid doing that.) If this functionality were available, then when the trees are being constructed and when subsequent data are put through the forest, one would also specify an argument for the use of surrogate rules, just like in rpart. I realize this question is very specific to randomForest, as opposed to R in general, but any comments are appreciated. I suppose I am looking for someone to say It's not appropriate, and here's why ... or Good idea. Please implement and post your code. Thanks, Darin England, Senior Scientist Ingenix __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] posthoc tests with ANCOVA
On Tue, 9 Jan 2007, Walter Durka wrote: dear all, Walter, _please_ cc questions on contributed packages to the maintainer, since not everybody follows r-help closely... I want to perform a posthoc test for my ANCOVA: a1-aov(seeds~treatment*length) With summary(glht(a1, linfct = mcp(treatment = Tukey))) R tells me: covariate interactions found -- please choose appropriate contrast one needs to specify a certain value for `length' which, I assume, is a numeric covariate, right? The current interface doesn't support this (this on my to-do-list), however, you can set up the matrix of linear functions by yourself (contact me privately if you have problems to do that). How do I build these contrasts? Ideally, I would like to have the posthoc test for the ANCOVA including a block-effect a2-aov(seeds~treatment*length+Error(site)) How do I make a posthoc test here? its on the to-do-list as well :-( Torsten Thanks for any comments Walter -- * Dr. Walter Durka Department Bioz?noseforschung Department of community ecology Helmholtz-Zentrum f?r Umweltforschung GmbH - UFZ Helmholtz Centre for Environmental Research - UFZ Theodor-Lieser-Str. 4 / 06120 Halle / Germany [EMAIL PROTECTED] / http://www.ufz.de/index.php?en=798 phone +49 345 558 5314 / Fax +49 345 558 5329 + Das UFZ hat einen neuen Namen: Helmholtz-Zentrum f?r Umweltforschung GmbH - UFZ The UFZ has a new name: Helmholtz Centre for Environmental Research - UFZ + __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] select subsets in data frame
Dear WizaRds! A trivial question indeed on selecting subsets in data frames. I am sorry. Unfortunately, I did not find any helpful information on the introduction, searched the help archive and read in introductory books. Please help: I want to select column KB which is read via read.csv2 as a data.frame into d. I checked that it is indeed a data.frame object and included the correct header information in line 1. For example purposes, look at this small object: *= (4) d - data.frame(A=1:3, Date=c(01.01.07,02.01.07,03.01.07), KB=c(Eenie, Meenie, Miney) ) d[KB==Eenie,] # gives @ output-start [1] ADate KB 0 rows (or 0-length row.names) output-end @ If I follow Venables/ Ripley in Modern Applied Statistics with S, it should look like this: *= (5) library(MASS) attach(painters) painters[Colour=17,] @ gives the correct subset. But d[KB==Eenie,] # gives Error in `[.data.frame`(d, KB == Eenie, ) : object KB not found I need every KB named Eenie. What did I do wrong? The alternative I found seems to be quite complicated: *= (6) d[which( d[,KB]==Eenie ), ] @ output-start A DateKB 1 1 01.01.07 Eenie output-end Thank you so much for your help. cheers mark I believe I found the missing link between animal and civilized man. It's us. -- Konrad Lorenz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] select subsets in data frame
Mark Hempelmann wrote: Dear WizaRds! A trivial question indeed on selecting subsets in data frames. I am sorry. Unfortunately, I did not find any helpful information on the introduction, searched the help archive and read in introductory books. Please help: I want to select column KB which is read via read.csv2 as a data.frame into d. I checked that it is indeed a data.frame object and included the correct header information in line 1. For example purposes, look at this small object: *= (4) d - data.frame(A=1:3, Date=c(01.01.07,02.01.07,03.01.07), KB=c(Eenie, Meenie, Miney) ) d[KB==Eenie,] # gives @ output-start [1] ADate KB 0 rows (or 0-length row.names) output-end @ Try this instead: subset(d, KB == Eenie) A DateKB 1 1 01.01.07 Eenie ?subset If I follow Venables/ Ripley in Modern Applied Statistics with S, it should look like this: *= (5) library(MASS) attach(painters) painters[Colour=17,] @ gives the correct subset. But d[KB==Eenie,] # gives Error in `[.data.frame`(d, KB == Eenie, ) : object KB not found Works for me if I attach the data frame first: attach(d) d[KB == Eenie,] A DateKB 1 1 01.01.07 Eenie I need every KB named Eenie. What did I do wrong? The alternative I found seems to be quite complicated: *= (6) d[which( d[,KB]==Eenie ), ] @ output-start A DateKB 1 1 01.01.07 Eenie output-end Thank you so much for your help. cheers mark I believe I found the missing link between animal and civilized man. It's us. -- Konrad Lorenz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dimensions of a all objects
Generally it is difficult to get an overview of what's there. But the following function I acquired from (???) ages ago does a nice job: lls - function (pos = 1, pat = ) { dimx - function(dd) if (is.null(dim(dd))) length(dd) else dim(dd) lll - ls(pos = pos, pat = pat) cat(formatC(mode, 1, 15), formatC(class, 1, 18), formatC(name, 1, max(nchar(lll)) + 1), size\n---\n) if (length(lll) 0) { for (i in 1:length(lll)) { cat(formatC(eval(parse(t = paste(mode(, lll[i], , 1, 15), formatC(paste(eval(parse(t = paste(class(, lll[i], , collapse = ), 1, 18), formatC(lll[i], 1, max(nchar(lll)) + 1), , eval(parse(t = paste(dimx(, lll[i], , \n) } } } Just say lls() and you get a reasnoable listing of obejcts. Best, Bendix __ Bendix Carstensen Senior Statistician Steno Diabetes Center Niels Steensens Vej 2-4 DK-2820 Gentofte Denmark +45 44 43 87 38 (direct) +45 30 75 87 38 (mobile) +45 44 43 73 13 (fax) [EMAIL PROTECTED] http://www.biostat.ku.dk/~bxc __ -Original Message- From: Farrel Buchinsky [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 09, 2007 3:30 PM To: r-help@stat.math.ethz.ch Subject: [R] dimensions of a all objects Why will the following command not work sapply(objects(),dim) What does it say about the objects list? What does it say about the dim command? Likewise, the following also does not work all-ls() for (f in all) print(dim(f)) -- Farrel Buchinsky [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scripts with littler
John Lawrence Aspden wrote: I've got a library (brainwaver), installed locally in ~/R/library, and this information is recorded in the ~/.Renviron file. In my script I load the library, but if I call it using #!/usr/bin/r --vanilla, this stops working. (Various private e-mails exchanged. Again, thanks Dirk!) Just in case anyone else is trying to do this, it turns out that if you can persuade your end users to install the library to ~/R/library, then you can say: #!/usr/bin/r --vanilla library(brainwaver, lib.loc='~/R/library') although in my case, brainwaver depends on another library, which it now can't find, so actually I have to load them in order: #!/usr/bin/r --vanilla library(waveslim, lib.loc='~/R/library') library(brainwaver, lib.loc='~/R/library') Alternatively, #!/usr/bin/r --vanilla .libPaths('~/R/library') library(brainwaver) works, although be careful, I've noticed that it seems to behave a bit strangely on my debian setup. e.g. #!/usr/bin/r --vanilla cat(.Library,'*', .libPaths(),\n) .libPaths('~/R/library') cat(.Library,'*', .libPaths(),\n) gives output /usr/lib/R/library * /usr/local/lib/R/site-library /usr/lib/R/site-library /usr/lib/R/library /usr/lib/R/library * ~/R/library /usr/lib/R/library that is, it seems to have removed /usr/local/lib/R/site-library and /usr/lib/R/site-library as well as added ~/R/library Cheers, John. -- Contractor in Cambridge UK -- http://www.aspden.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problems with optim, for-loops and machine precision
Dear R experts, I have been encountering problems with the optim routine using for loops. I am determining the optimal parameters of several nested models by minimizing the negative Log-Likelihood (NLL) of a dataset. The aim is to find the model which best describes the data. To this end, I am simulating artificial data sets based on the model with the least number of parameters (6) and the parameters determined with the field data. For each artificial set I estimate the parameters of the model with 6 parameters and the next more complex model with 7 parameters (two of these parameters are equal in the 6-parameter model) by minimizing the corresponding NLL with optim. In theory the 7-parameter model should fit the data either equally or better than the 6-parameter model. Therefore the difference of the minimal NLLs should be 0 or larger. For 500 data sets I use the following code: require(odesolve) res=matrix(nrow=500,ncol=18) colnames(res)=c(a_23,beta_23,mu_23,d_23,I_23,M_23,NLL_23, a_21,beta_21,mu_21,e_21,d_21,I_21,M_21,NLL_21, NLL23_min_21,conv23,conv21) for (s in 1:500) { assign(data,read.table(paste(populations/TE23simset_,s,.txt,sep=)),e nv=MyEnv) #reading a data set M23=optim(rep(0.1,6),NLL23,method=L-BFGS-B,lower=0, upper=c(Inf,Inf,Inf,Inf,1,1),control=c(maxit=150)) if (M23$convergence==0) { M21=optim(rep(0.1,7),NLL21,method=L-BFGS-B,lower=0, upper=c(Inf,Inf,Inf,Inf,Inf,1,1),control=c(maxit=150)) } res[s,1]=M23$par[1] res[s,2]=M23$par[2] res[s,3]=M23$par[3] res[s,4]=M23$par[4] res[s,5]=M23$par[5] res[s,6]=M23$par[6] res[s,7]=M23$value res[s,8]=M21$par[1] res[s,9]=M21$par[2] res[s,10]=M21$par[3] res[s,11]=M21$par[4] res[s,12]=M21$par[5] res[s,13]=M21$par[6] res[s,14]=M21$par[7] res[s,15]=M21$value res[s,16]=M23$value-M21$value res[s,17]=M23$convergence res[s,18]=M21$convergence write.table(res,compare23_21TE061221.txt) rm(M23,M21) } } For some strange reason the results do not correspond to what I expect: about 10% of the solutions have a difference of NLL smaller than 0. I have verified the optimisation of these results manually and found that a minimal NLL was ignored and a higher NLL was returned at convergence. To check what was happening I inserted a printing line in the NLL function to print all parameters and the NLL as the procedure goes on. To my surprise optim then stopped at the minimal NLL which had been ignored before. I have then reduced the machine precision to .Machine$double.digits=8 thinking, that the printing was slowing down the procedure and by reducing the machine precision to speed up the calculations. For an individual calculation this solved my problem. However if I implemented the same procedure in the loop above, the same impossible results occurred again. Can anyone tell me where I should be looking for the problem, or what it is and how I could solve it? Thanks a lot for your help Sincerely Simon Simon Ruegg Dr.med.vet., PhD candidate Institute for Parasitology Winterthurstr. 266a 8057 Zurich Switzerland phone: +41 44 635 85 93 fax: +41 44 635 89 07 e-mail: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fw: Memory problem on a linux cluster using a large data set [Broadcast]
Hi I listened to all your advise and ran my data on a computer with a 64 bits procesor but i still get the same error saying it cannot allocate a vector of that size 1240 kb . I don't want to cut my data in smaller pieces because we are looking at interaction. So are there any other options for me to try out or should i wait for the development of more advanced computers! Thanks, Iris - Forwarded Message From: Iris Kolder [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Thursday, December 21, 2006 2:07:08 PM Subject: Re: [R] Memory problem on a linux cluster using a large data set [Broadcast] Thank you all for your help! So with all your suggestions we will try to run it on a computer with a 64 bits proccesor. But i've been told that the new R versions all work on a 32bits processor. I read in other posts that only the old R versions were capable of larger data sets and were running under 64 bit proccesors. I also read that they are adapting the new R version for 64 bits proccesors again so does anyone now if there is a version available that we could use? Iris Kolder - Original Message From: Liaw, Andy [EMAIL PROTECTED] To: Martin Morgan [EMAIL PROTECTED]; Iris Kolder [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch; N.C. Onland-moret [EMAIL PROTECTED] Sent: Monday, December 18, 2006 7:48:23 PM Subject: RE: [R] Memory problem on a linux cluster using a large data set [Broadcast] In addition to my off-list reply to Iris (pointing her to an old post of mine that detailed the memory requirement of RF in R), she might consider the following: - Use larger nodesize - Use sampsize to control the size of bootstrap samples Both of these have the effect of reducing sizes of trees grown. For a data set that large, it may not matter to grow smaller trees. Still, with data of that size, I'd say 64-bit is the better solution. Cheers, Andy From: Martin Morgan Iris -- I hope the following helps; I think you have too much data for a 32-bit machine. Martin Iris Kolder [EMAIL PROTECTED] writes: Hello, I have a large data set 320.000 rows and 1000 columns. All the data has the values 0,1,2. It seems like a single copy of this data set will be at least a couple of gigabytes; I think you'll have access to only 4 GB on a 32-bit machine (see section 8 of the R Installation and Administration guide), and R will probably end up, even in the best of situations, making at least a couple of copies of your data. Probably you'll need a 64-bit machine, or figure out algorithms that work on chunks of data. on a linux cluster with R version R 2.1.0. which operates on a 32 This is quite old, and in general it seems like R has become more sensitive to big-data issues and tracking down unnecessary memory copying. cannot allocate vector size 1240 kb. I've searched through use traceback() or options(error=recover) to figure out where this is actually occurring. SNP - read.table(file.txt, header=FALSE, sep=)# read in data file This makes a data.frame, and data frames have several aspects (e.g., automatic creation of row names on sub-setting) that can be problematic in terms of memory use. Probably better to use a matrix, for which: 'read.table' is not the right tool for reading large matrices, especially those with many columns: it is designed to read _data frames_ which may have columns of very different classes. Use 'scan' instead. (from the help page for read.table). I'm not sure of the details of the algorithms you'll invoke, but it might be a false economy to try to get scan to read in 'small' versions (e.g., integer, rather than numeric) of the data -- the algorithms might insist on numeric data, and then make a copy during coercion from your small version to numeric. SNP$total.NAs = rowSums(is.na(SN # calculate the number of NA per row and adds a colum with total Na's This adds a column to the data.frame or matrix, probably causing at least one copy of the entire data. Create a separate vector instead, even though this unties the coordination between columns that a data frame provides. SNP = t(as.matrix(SNP)) # transpose rows and columns This will also probably trigger a copy; snp.na-SNP R might be clever enough to figure out that this simple assignment does not trigger a copy. But it probably means that any subsequent modification of snp.na or SNP *will* trigger a copy, so avoid the assignment if possible. snp.roughfix-na.roughfix(snp.na) fSNP-factor(snp.roughfix[, 1])# Asigns factor to case control status snp.narf- randomForest(snp.roughfix[,-1], fSNP, na.action=na.roughfix, ntree=500, mtry=10, importance=TRUE, keep.forest=FALSE, do.trace=100) Now you're entirely in the hands of the randomForest. If
[R] prime in expression in plot
I want to write something like (LaTeX style) b_{norm}=\frac{F\prime(0)}{R\prime(0)} how do I add the prime (first derivative) to a R-plot? The help of plotmath just talks about partialdiff. Can you complete this command? text( 30,0.05,labels=expression(b[plain(norm)]==frac(F(0),R(0))) ) Thanks, Thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] select subsets in data frame
try: d[d[,KB] == Eenie,] or d[d$KB==Eenie,] What you were comparing is just two character strings (KB==Eenie) which of course is FALSE and won't select anything. You have to explicitly compare values from the KB column of your data On 1/10/07, Mark Hempelmann [EMAIL PROTECTED] wrote: Dear WizaRds! A trivial question indeed on selecting subsets in data frames. I am sorry. Unfortunately, I did not find any helpful information on the introduction, searched the help archive and read in introductory books. Please help: I want to select column KB which is read via read.csv2 as a data.frame into d. I checked that it is indeed a data.frame object and included the correct header information in line 1. For example purposes, look at this small object: *= (4) d - data.frame(A=1:3, Date=c(01.01.07,02.01.07,03.01.07), KB=c(Eenie, Meenie, Miney) ) d[KB==Eenie,] # gives @ output-start [1] ADate KB 0 rows (or 0-length row.names) output-end @ If I follow Venables/ Ripley in Modern Applied Statistics with S, it should look like this: *= (5) library(MASS) attach(painters) painters[Colour=17,] @ gives the correct subset. But d[KB==Eenie,] # gives Error in `[.data.frame`(d, KB == Eenie, ) : object KB not found I need every KB named Eenie. What did I do wrong? The alternative I found seems to be quite complicated: *= (6) d[which( d[,KB]==Eenie ), ] @ output-start A DateKB 1 1 01.01.07 Eenie output-end Thank you so much for your help. cheers mark I believe I found the missing link between animal and civilized man. It's us. -- Konrad Lorenz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prime in expression in plot
how do I add the prime (first derivative) to a R-plot? sorry for the noise, I found it myself: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/20984.html I use now (works fine) plot(1,1,xlab=expression(frac(F'(0),R'(0))),xaxt=n) Thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] correlation value and map
Dear R-help community, I have 2 different arrays of precipitation data each of the same dimensions of [longitude, latitude, time] dim=[30,32,43], called array1 and array2. I need to correlate them. This is the code I used to get one overall correlation value for the whole of the area of interest: result - cor(array1,array2,use=complete.obs) result This give me a single value but I'm not convinced it is actually a correlation value for the total area for the total time period of 43 yearscan anybody tell me if I am indeed wrong in my coding and/or indeed my low knowledge of the statistics of correlation. Also, I wanted to produce a correlation map over the 43 years. Could you also advise me if this is correct, I am more confident that this is than the above code: result - array(NA, c(30,32)) for(i in 1:30){ for(j in 1:32){ array1.ts - array1[i,j,] array2.ts - array2[i,j,] result[i,j] - cor(array1.ts,array2.ts,use= complete.obs) } } I appreciate your time very much. If I don't iron out this problem now the ground-work for my entire PhD will not be stable at all, Many thanks for reading my problem, happy 2007 :-) Jenny Barnes Jennifer Barnes PhD student - long range drought prediction Climate Extremes Department of Space and Climate Physics University College London Holmbury St Mary, Dorking Surrey RH5 6NT 01483 204149 Web: http://climate.mssl.ucl.ac.uk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fractional brownian motion
Dear All; I have used fbmSim to simulate a fbm sequence, however, when I tried to estimate the Hurst effect, none of the nine procedures gave me an answer close enough to the real value, which is 0.5 (n=1000). So, would you please advice, 1. which is the best method to estimate the H among the 9 mehods, R/S, higuchi or Whittle? 2. how to choose the levels (default=50), minnpts, cutoff values or if there is any other issues to consider? Would you please send your code if you can get the right estimate for the attached dataset? 3. I also simulated multiple sequences, but some of the estimated results have H1, how would you explain this, and how to correct it. 4. if I have a sample size of 30, what are your suggestions for estimating the hurst effect. I really appreciate your help, and thanks in advance. Sincerely; Qing list(c(0, 0.0602883614054335, 0.0792551804556156, 0.109563140496959, 0.125726475390261, 0.101877497256596, 0.0571926667502657, 0.00262554012667387, -0.0651825041744684, -0.0696503406796901, -0.0688337926741136, -0.0361989921035026, -0.0496524166395568, -0.0718218507093852, -0.109583486993751, -0.0451891749832022, -0.0260755094163026, -0.0201228029445018, -0.08818052945358, -0.0285680272562049, -0.032752629105123, -0.00199553309054604, 0.00508312076775159, -0.0382257524810562, -0.0398084284989033, -0.0502088451482971, -0.0502183047870017, -0.0174725900800023, -0.031457396692228, -0.0497298422601627, -0.0499422794745919, -0.0361646489128442, 0.0266653302426084, -0.00219111075589404, 0.0200305010061950, -0.054431205284715, -0.00789973517059908, -0.0085425214867059, -0.0541809440614641, -0.110006586793787, -0.127781398474619, -0.149159749479771, -0.0888724748560444, -0.0768016585462973, -0.0427333724364433, -0.0562443902614682, -0.062866380084345, -0.0883889349587378, -0.08412713827269, -0.124694662643563, - 0.0692092347299578, -0.062046981143375, -0.0853376756560203, -0.0994827776216755, -0.0599248702862878, -0.086369926350214, -0.0890587510305954, -0.0599052704934926, -0.0393644348210234, -0.0364342257334110, -0.0352152731793845, 0.00857219866032972, 0.0180050537574112, -0.0356947245542089, -0.0615077884110117, -0.0650043906408892, -0.0357502728853327, 0.00485292461594026, -0.0458490824792766, -0.0735087346178546, -0.0820482814570969, -0.124917946967105, -0.102342757999288, -0.154497291254316, -0.181381845540350, - 0.166583783090691, -0.130709738638527, -0.124693013403996, -0.120174007595549, - 0.204900741024172, -0.187508161990262, -0.200032164010717, -0.224311714556569, - 0.207979212824500, -0.232450923263701, -0.198989498760939, -0.215501785889735, - 0.226880158415004, -0.203545247216445, -0.182379290331033, -0.197038481353429, - 0.183796217531869, -0.215299419717715, -0.226095790998872, -0.201080854888686, - 0.261516787028393, -0.228388047078831, -0.233878235687051, -0.250375640774054, - 0.245746001611628, -0.232938343176222, -0.204702689747725, -0.259350393091606, - 0.266266298423241, -0.261596784773769, -0.259875864886204, -0.288719457448324, - 0.318140871065267, -0.304593775001704, -0.263467947228369, -0.323808258544402, - 0.368149351192211, -0.347335883864181, -0.400024671714471, -0.371544405921674, - 0.34497157633729, -0.418657437186672, -0.446912741927435, -0.443280747761798, - 0.384739689692638, -0.408072099289829, -0.44994167683647, -0.495273953054503, - 0.495799924479583, -0.43697383254148, -0.440650597876129, -0.458929575025839, - 0.464585181026337, -0.455843249117235, -0.505571057794026, -0.528316203388037, - 0.58236723395483, -0.519865988395428, -0.555187599676786, -0.532496699903787, - 0.522718022822901, -0.559598211031558, -0.54708053308, -0.554634133071482, - 0.541090495738355, -0.558915651522986, -0.541917600099341, -0.578140869146858, - 0.53827826166188, -0.52995892714492, -0.502706986562639, -0.497002923985933, - 0.488096453915811, -0.51306171001298, -0.536938942207764, -0.515525690037527, - 0.527192086185579, -0.511195825348352, -0.531104487863626, -0.521038783472886, - 0.519956505596612, -0.510535242028159, -0.531234776730198, -0.526733559471536, - 0.583555032968273, -0.551226199377139, -0.489093347568197, -0.505836247919194, - 0.485420121220644, -0.511258369655571, -0.505044073824655, -0.496646898991666, - 0.49767793016268, -0.496864531091594, -0.545965735276333, -0.588151447068976, - 0.594481697088302, -0.633356216248837, -0.649947987535778, -0.698052359479746, - 0.651147537552288, -0.678918599568793, -0.707817506128165, -0.69962378236799, - 0.664061266573256, -0.686809968963275, -0.70940931290113, -0.710042179498104, - 0.700294406700751, -0.69089796594532, -0.687411073827526, -0.660740247977237, -0.61857834066703, -0.611492191585365, -0.617870447722936, -0.590305790642949, - 0.616713901938279, -0.619126236382548, -0.618591413756597, -0.604886044395737, - 0.598434945889939, -0.481524771809731, -0.424388823122359, -0.490492676413456, - 0.506402791875861, -0.480172973435785, -0.517922095978117, -0.502318428525046, - 0.496731810144271, -0.531408048623419,
[R] Column names in Zoo object
Hi, I am downloading Bloomberg data from R. This data will be stored in a zoo object by default. The command is dat-blpGetData(con,c(NOK1V FH Equity,AUA AV Equity),PX_OPEN,start=as.chron(as.Date(12/1/2006, %m/%d/%Y)),end=as.chron(as.Date(12/28/2006, %m/%d/%Y))) Here I am downloading the data for two tickers, (NOK1V FH Equity and AUA AV Equity) simultaneously. Then the column names of my zoo object will be NOK1V.FH.Equity and AUA.AV.Equity respectively, which are the ticker names itself. But if I download for only one ticker by the code, dat-blpGetData(con,c(NOK1V FH Equity ),PX_OPEN,start=as.chron(as.Date(12/1/2006, %m/%d/%Y)),end=as.chron(as.Date(12/28/2006, %m/%d/%Y))) Now, the column name of my zoo object is PX_OPEN, the field name instead of the ticker name NOK1V FH Equity. I need my second code to work as the first one. i.e., if only one ticker is used instead of two or more, then the column name taken by my zoo object should be the ticker name and not the field name. Could somebody help me on this? Thanks in advance. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scripts with littler
[By now way off the subject line. Something like 'how to set the libraries inside an R session'.] On Wed, 10 Jan 2007, John Lawrence Aspden wrote: John Lawrence Aspden wrote: I've got a library (brainwaver), installed locally in ~/R/library, and this information is recorded in the ~/.Renviron file. In my script I load the library, but if I call it using #!/usr/bin/r --vanilla, this stops working. (Various private e-mails exchanged. Again, thanks Dirk!) Just in case anyone else is trying to do this, it turns out that if you can persuade your end users to install the library to ~/R/library, then you can say: #!/usr/bin/r --vanilla library(brainwaver, lib.loc='~/R/library') although in my case, brainwaver depends on another library, which it now can't find, so actually I have to load them in order: #!/usr/bin/r --vanilla library(waveslim, lib.loc='~/R/library') library(brainwaver, lib.loc='~/R/library') Alternatively, #!/usr/bin/r --vanilla .libPaths('~/R/library') library(brainwaver) works, although be careful, I've noticed that it seems to behave a bit strangely on my debian setup. 'It' (R) is behaving as you asked it to. Most likely you intended to ask for .libPaths(c(~/R/library, .libPaths())) e.g. #!/usr/bin/r --vanilla cat(.Library,'*', .libPaths(),\n) .libPaths('~/R/library') cat(.Library,'*', .libPaths(),\n) gives output /usr/lib/R/library * /usr/local/lib/R/site-library /usr/lib/R/site-library /usr/lib/R/library /usr/lib/R/library * ~/R/library /usr/lib/R/library that is, it seems to have removed /usr/local/lib/R/site-library and /usr/lib/R/site-library as well as added ~/R/library Exactly as documented. The argument is named 'new' and not 'add', BTW. Please 'be careful' in what you say about the work of others. Cheers, John. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Column names in Zoo object/BlpGetata problem
I think the problem is in the BlpGetData function than in the zoo object. Because instead of zoo object, the data was put into a data frame, but then also the problem prevails. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Shubha Vishwanath Karanth Sent: Wednesday, January 10, 2007 7:29 PM To: r-help@stat.math.ethz.ch Subject: [R] Column names in Zoo object Hi, I am downloading Bloomberg data from R. This data will be stored in a zoo object by default. The command is dat-blpGetData(con,c(NOK1V FH Equity,AUA AV Equity),PX_OPEN,start=as.chron(as.Date(12/1/2006, %m/%d/%Y)),end=as.chron(as.Date(12/28/2006, %m/%d/%Y))) Here I am downloading the data for two tickers, (NOK1V FH Equity and AUA AV Equity) simultaneously. Then the column names of my zoo object will be NOK1V.FH.Equity and AUA.AV.Equity respectively, which are the ticker names itself. But if I download for only one ticker by the code, dat-blpGetData(con,c(NOK1V FH Equity ),PX_OPEN,start=as.chron(as.Date(12/1/2006, %m/%d/%Y)),end=as.chron(as.Date(12/28/2006, %m/%d/%Y))) Now, the column name of my zoo object is PX_OPEN, the field name instead of the ticker name NOK1V FH Equity. I need my second code to work as the first one. i.e., if only one ticker is used instead of two or more, then the column name taken by my zoo object should be the ticker name and not the field name. Could somebody help me on this? Thanks in advance. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fractional brownian motion
You have applied the Hurst parameter estimating function to the fBm, not the fGn as described in the help page for Long Range Dependence Modelling in fSeries. The function aggvarFit computes the Hurst exponent from the variance of an aggregated FGN or FARIMA time series process Trying perFit(diff(x)) (where x is your fBm series) gives an estimate of the Hurst Exponent as 0.53527922 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of ED Sent: Wednesday, January 10, 2007 8:53 AM To: r-help@stat.math.ethz.ch Subject: [R] Fractional brownian motion Dear All; I have used fbmSim to simulate a fbm sequence, however, when I tried to estimate the Hurst effect, none of the nine procedures gave me an answer close enough to the real value, which is 0.5 (n=1000). So, would you please advice, 1. which is the best method to estimate the H among the 9 mehods, R/S, higuchi or Whittle? 2. how to choose the levels (default=50), minnpts, cutoff values or if there is any other issues to consider? Would you please send your code if you can get the right estimate for the attached dataset? 3. I also simulated multiple sequences, but some of the estimated results have H1, how would you explain this, and how to correct it. 4. if I have a sample size of 30, what are your suggestions for estimating the hurst effect. I really appreciate your help, and thanks in advance. Sincerely; Qing list(c(0, 0.0602883614054335, 0.0792551804556156, 0.109563140496959, 0.125726475390261, 0.101877497256596, 0.0571926667502657, 0.00262554012667387, -0.0651825041744684, -0.0696503406796901, -0.0688337926741136, -0.0361989921035026, -0.0496524166395568, -0.0718218507093852, -0.109583486993751, -0.0451891749832022, -0.0260755094163026, -0.0201228029445018, -0.08818052945358, -0.0285680272562049, -0.032752629105123, -0.00199553309054604, 0.00508312076775159, -0.0382257524810562, -0.0398084284989033, -0.0502088451482971, -0.0502183047870017, -0.0174725900800023, -0.031457396692228, -0.0497298422601627, -0.0499422794745919, -0.0361646489128442, 0.0266653302426084, -0.00219111075589404, 0.0200305010061950, -0.054431205284715, -0.00789973517059908, -0.0085425214867059, -0.0541809440614641, -0.110006586793787, -0.127781398474619, -0.149159749479771, -0.0888724748560444, -0.0768016585462973, -0.0427333724364433, -0.0562443902614682, -0.062866380084345, -0.0883889349587378, -0.08412713827269, -0.124694662643563, - 0.0692092347299578, -0.062046981143375, -0.0853376756560203, -0.0994827776216755, -0.0599248702862878, -0.086369926350214, -0.0890587510305954, -0.0599052704934926, -0.0393644348210234, -0.0364342257334110, -0.0352152731793845, 0.00857219866032972, 0.0180050537574112, -0.0356947245542089, -0.0615077884110117, -0.0650043906408892, -0.0357502728853327, 0.00485292461594026, -0.0458490824792766, -0.0735087346178546, -0.0820482814570969, -0.124917946967105, -0.102342757999288, -0.154497291254316, -0.181381845540350, - 0.166583783090691, -0.130709738638527, -0.124693013403996, -0.120174007595549, - 0.204900741024172, -0.187508161990262, -0.200032164010717, -0.224311714556569, - 0.207979212824500, -0.232450923263701, -0.198989498760939, -0.215501785889735, - 0.226880158415004, -0.203545247216445, -0.182379290331033, -0.197038481353429, - 0.183796217531869, -0.215299419717715, -0.226095790998872, -0.201080854888686, - 0.261516787028393, -0.228388047078831, -0.233878235687051, -0.250375640774054, - 0.245746001611628, -0.232938343176222, -0.204702689747725, -0.259350393091606, - 0.266266298423241, -0.261596784773769, -0.259875864886204, -0.288719457448324, - 0.318140871065267, -0.304593775001704, -0.263467947228369, -0.323808258544402, - 0.368149351192211, -0.347335883864181, -0.400024671714471, -0.371544405921674, - 0.34497157633729, -0.418657437186672, -0.446912741927435, -0.443280747761798, - 0.384739689692638, -0.408072099289829, -0.44994167683647, -0.495273953054503, - 0.495799924479583, -0.43697383254148, -0.440650597876129, -0.458929575025839, - 0.464585181026337, -0.455843249117235, -0.505571057794026, -0.528316203388037, - 0.58236723395483, -0.519865988395428, -0.555187599676786, -0.532496699903787, - 0.522718022822901, -0.559598211031558, -0.54708053308, -0.554634133071482, - 0.541090495738355, -0.558915651522986, -0.541917600099341, -0.578140869146858, - 0.53827826166188, -0.52995892714492, -0.502706986562639, -0.497002923985933, - 0.488096453915811, -0.51306171001298, -0.536938942207764, -0.515525690037527, - 0.527192086185579, -0.511195825348352, -0.531104487863626, -0.521038783472886, - 0.519956505596612, -0.510535242028159, -0.531234776730198, -0.526733559471536, - 0.583555032968273, -0.551226199377139, -0.489093347568197, -0.505836247919194, - 0.485420121220644, -0.511258369655571, -0.505044073824655, -0.496646898991666, - 0.49767793016268, -0.496864531091594, -0.545965735276333, -0.588151447068976, - 0.594481697088302, -0.633356216248837, -0.649947987535778, -0.698052359479746, -
Re: [R] prime in expression in plot
Two possibilities are ' (which is the usual way to write \prime in TeX) and minute (which is the nearest character in the Adobe symbol font that plotmath uses). TeX has its own symbol fonts, and most R devices are based rather on the Adobe one. On Wed, 10 Jan 2007, Thomas Steiner wrote: I want to write something like (LaTeX style) b_{norm}=\frac{F\prime(0)}{R\prime(0)} how do I add the prime (first derivative) to a R-plot? The help of plotmath just talks about partialdiff. Can you complete this command? text( 30,0.05,labels=expression(b[plain(norm)]==frac(F(0),R(0))) ) Thanks, Thomas -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fw: Memory problem on a linux cluster using a large data set [Broadcast]
On Wed, 10 Jan 2007, Iris Kolder wrote: Hi I listened to all your advise and ran my data on a computer with a 64 bits procesor but i still get the same error saying it cannot allocate a vector of that size 1240 kb . I don't want to cut my data in smaller pieces because we are looking at interaction. So are there any other options for me to try out or should i wait for the development of more advanced computers! Did you use a 64-bit build of R on that machine? If the message is the same, I strongly suspect not. 64-bit builds are not the default on most OSes. Thanks, Iris - Forwarded Message From: Iris Kolder [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Thursday, December 21, 2006 2:07:08 PM Subject: Re: [R] Memory problem on a linux cluster using a large data set [Broadcast] Thank you all for your help! So with all your suggestions we will try to run it on a computer with a 64 bits proccesor. But i've been told that the new R versions all work on a 32bits processor. I read in other posts that only the old R versions were capable of larger data sets and were running under 64 bit proccesors. I also read that they are adapting the new R version for 64 bits proccesors again so does anyone now if there is a version available that we could use? Iris Kolder - Original Message From: Liaw, Andy [EMAIL PROTECTED] To: Martin Morgan [EMAIL PROTECTED]; Iris Kolder [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch; N.C. Onland-moret [EMAIL PROTECTED] Sent: Monday, December 18, 2006 7:48:23 PM Subject: RE: [R] Memory problem on a linux cluster using a large data set [Broadcast] In addition to my off-list reply to Iris (pointing her to an old post of mine that detailed the memory requirement of RF in R), she might consider the following: - Use larger nodesize - Use sampsize to control the size of bootstrap samples Both of these have the effect of reducing sizes of trees grown. For a data set that large, it may not matter to grow smaller trees. Still, with data of that size, I'd say 64-bit is the better solution. Cheers, Andy From: Martin Morgan Iris -- I hope the following helps; I think you have too much data for a 32-bit machine. Martin Iris Kolder [EMAIL PROTECTED] writes: Hello, I have a large data set 320.000 rows and 1000 columns. All the data has the values 0,1,2. It seems like a single copy of this data set will be at least a couple of gigabytes; I think you'll have access to only 4 GB on a 32-bit machine (see section 8 of the R Installation and Administration guide), and R will probably end up, even in the best of situations, making at least a couple of copies of your data. Probably you'll need a 64-bit machine, or figure out algorithms that work on chunks of data. on a linux cluster with R version R 2.1.0. which operates on a 32 This is quite old, and in general it seems like R has become more sensitive to big-data issues and tracking down unnecessary memory copying. cannot allocate vector size 1240 kb. I've searched through use traceback() or options(error=recover) to figure out where this is actually occurring. SNP - read.table(file.txt, header=FALSE, sep=)# read in data file This makes a data.frame, and data frames have several aspects (e.g., automatic creation of row names on sub-setting) that can be problematic in terms of memory use. Probably better to use a matrix, for which: 'read.table' is not the right tool for reading large matrices, especially those with many columns: it is designed to read _data frames_ which may have columns of very different classes. Use 'scan' instead. (from the help page for read.table). I'm not sure of the details of the algorithms you'll invoke, but it might be a false economy to try to get scan to read in 'small' versions (e.g., integer, rather than numeric) of the data -- the algorithms might insist on numeric data, and then make a copy during coercion from your small version to numeric. SNP$total.NAs = rowSums(is.na(SN # calculate the number of NA per row and adds a colum with total Na's This adds a column to the data.frame or matrix, probably causing at least one copy of the entire data. Create a separate vector instead, even though this unties the coordination between columns that a data frame provides. SNP = t(as.matrix(SNP)) # transpose rows and columns This will also probably trigger a copy; snp.na-SNP R might be clever enough to figure out that this simple assignment does not trigger a copy. But it probably means that any subsequent modification of snp.na or SNP *will* trigger a copy, so avoid the assignment if possible. snp.roughfix-na.roughfix(snp.na) fSNP-factor(snp.roughfix[, 1])# Asigns factor to case control status snp.narf- randomForest(snp.roughfix[,-1], fSNP, na.action=na.roughfix, ntree=500, mtry=10,
Re: [R] contingency table analysis; generalized linear model
Date: Tue, 9 Jan 2007 11:13:41 + (GMT) From: Mark Difford [EMAIL PROTECTED] Subject: Re: [R] contingency table analysis; generalized linear model Dear List, I would appreciate help on the following matter: I am aware that higher dimensional contingency tables can be analysed using either log-linear models or as a poisson regression using a generalized linear model: log-linear: loglm(~Age+Site, data=xtabs(~Age+Site, data=SSites.Rev, drop.unused.levels=T)) GLM: glm.table - as.data.frame(xtabs(~Age+Site, data=SSites.Rev, drop.unused.levels=T)) glm(Freq ~ Age + Site, data=glm.table, family='poisson') where Site is a factor and Age is cast as a factor by xtabs() and treated as such. **Question**: Is it acceptable to step away from contingency table analysis by recasting Age as a numerical variable, and redoing the analysis as: glm(Freq ~ as.numeric(Age) + Site, data=glm.table, family='poisson') My reasons for wanting to do this are to be able to include non- linear terms in the model, using say restricted or natural cubic splines. Thank you in advance for your help. Regards, Mark Difford. --- Mark Difford Ph.D. candidate, Botany Department, Nelson Mandela Metropolitan University, Port Elizabeth, SA. Yes it is, and it is often the preferred way to view the analysis. In this case it looks like Freq is measuring something like species abundance, and it is natural to model this as a Poisson count via a log-link glm. As such you are free to include any reasonable functions of your predictors in modeling the mean. Log-linear models are typically presented as ways of analyzing dependence between categorical variables, when represented as multi-way tables. The appropriate multinomial models, conditioning on certain marginals, happen to be equivalent to Poisson glms with appropriate terms included. I would suggest in your data preparation that you glm.table[,Age] - as.numeric(glm.table[,Age]) at the start, so that now you can think of your data in the right way. Trevor Hastie --- Trevor Hastie [EMAIL PROTECTED] Professor Chair, Department of Statistics, Stanford University Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977 (650) 498-5233 (Biostatistics) Fax: (650) 725-6951 URL: http://www-stat.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Implementation of Wei-Lachin Test
Greetings - Does anybody know if there is an R implementation of the Wei-Lachin tests for incomplete multivariate observations (JASA 79: 653-661)? The authors' example of data where this test can be applied is in a study comparing treatment to placebo where the outcome is a series of time-to-event observations - i.e., time to development of a nonfatal symptom in each of several major body systems (dermal, musculoskeletal, neurologic, etc.). The null hypothesis here is that the joint distributions for each of the treatment and control groups is equal. Regards, Jarrod Dalton Cleveland Clinic is ranked one of the top 3 hospitals in America by U.S.News World Report. Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations. Confidentiality Note: This message is intended for use\ onl...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ARIMA and xreg
Hello, I am fairly new to time series analysis, and I am wondering if anyone could point me to some good references related to mechanics of what is going on when you fit a model using arima that includes a set of external regressors (i.e., use the xreg= feature). Thank you, Spencer [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] logistic regression packages
Hi All: I'm testing a set of data classification algorithms in this paper (www.stat.wisc.edu/~loh/treeprogs/quest1.7/mach1317.pdf ) I couldn't find such algorithms in R packages: 1. LOG: polytomous logistic regression (there was one in MASS library: multinom. But after I update MASS library, multinom was lost.) 2. POL: POLYCLASS algorithm. There is a S-Plus package(polyclass library) for this algorithm, so there should be a corresponding package in R, but I haven't found it so far. Any advice is appreciated. Best, Feng __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] go or goto command
Some computer languages, including C, have a go or go to command which can be used to shift control to a different part of a function. Question: Does the R language have a corresponding command? (Yes, I am aware that it can be abused; but, in the hands of a good programmer, it is simple to use.) Tom Jones __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] correlation value and map
Hi Jenny! So if i understand your datafile corect you have 960 case for a year. Any you have 43 years.. Yes? I'm not sure you should use correlation in this situation because of the autocorrelation of the data. There are big autocorrelation on spatial data's like what you use, and there are also a very big autocorrelation in time series data. I think you have to decompose your time series, and you have to cut down, the trend (and maybe some kind of sesonality), and than for the residuals you should do a correlation. You have to filter out the autocorrelation on the spatial data too, some way.. And because of the above problems, don't calculate correlation for the entierly databases! bye, Zoltan [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] map data.frame() data after having linked them to a read.shape() object
Dear all, I try to first link data in a data.frame() to a (polygon) read.shape() object and then to plot a polygon map showing the data in the data.frame. The first linking is called link in ArcView and relate in ArcMap. I use the code shown below, though without success. Help with this would be greatly appreciated. Thanks! Tord require(maptools) # Read shape file (one row per county) a=read.shape(myshp.shp, dbf.data=TRUE, verbose=TRUE) str(a) ..- attr(*, maxbb)= num [1:4] -100 4900 ..- attr(*, class)= chr ShapeList $ att.data:'data.frame': 490 obs. of 60 variables: ..$ STATE_FIPS: Factor w/ 12 levels 04,06,08,..: 11 11 11 11 4 5 5 5 5 5 ... [snip] ..$ STACOUNT4 : Factor w/ 489 levels ArizonaApache,..: 437 460 451 453 147 207 195 198 231 206 ... ..- attr(*, data_types)= chr [1:60] C C C C ... - attr(*, class)= chr Map # Read case data (one row per case) cases = read.table(cases.txt, h=T,) str(cases) 'data.frame': 431 obs. of 8 variables: $ Year: int 1950 1950 1950 1951 1956 1957 1959 1959 1959 1959 ... $ Case: int 3 1 2 1 1 1 2 4 1 3 ... $ stacount: Factor w/ 106 levels ArizonaApache,..: 1 66 76 66 26 29 15 25 30 60 ... # table the cases data PER Year, PER County (County = stacount) temp = t(table(cases[,c(Year,stacount)])) stacount = dimnames(temp)$stacount temp = cbind(stacount, as.data.frame(temp[,1:ncol(temp)],row.names=F)) str(temp) 'data.frame': 106 obs. of 50 variables: $ stacount: Factor w/ 106 levels ArizonaApache,..: 1 2 3 4 5 6 7 8 9 10 ... $ 1950: int 1 0 0 0 0 0 0 0 0 0 ... [snip] $ 2005: int 0 0 0 0 0 0 0 0 0 0 ... # Pick out a temporary attribute data.frame datfr = a$att.data # Merge the temporaty data frame with tabled cases for(i in 2:ncol(temp)){ datfr = merge(datfr, temp[,c(1,i)], by.x=STACOUNT4, by.y=stacount, all.x=T, all.y=F) } #Replace NAs with 0: for(i in 61:109){ datfr[,i] = ifelse(is.na(datfr[,i])==T,0,datfr[,i]) } str(a$att.data) 'data.frame': 490 obs. of 60 variables: $ NAME : Factor w/ 416 levels Ada,Adams,..: 120 352 265 277 33 210 122 135 372 209 ... [snip] $ STACOUNT4 : Factor w/ 489 levels ArizonaApache,..: 437 460 451 453 147 207 195 198 231 206 ... - attr(*, data_types)= chr C C C C ... # Note that the above data is of attribute type str(datfr) 'data.frame': 490 obs. of 109 variables: $ STACOUNT4 : Factor w/ 489 levels ArizonaApache,..: 1 2 3 4 5 6 7 8 9 10 ... [snip] $ 1951 : num 0 0 0 0 0 0 0 0 0 0 ... [snip] $ 2005 : num 0 0 0 0 0 0 0 0 0 0 ... # Note that at the end of this, data type is not described - it is a simple data frame # bind data together: #Alternative 1: a$att.data = cbind(a$att.data, datfr[,61:109]) # Other alternatives: test = matrix(ncol=49) a$att.data[,61:109] = test a$att.data[,61:109] = datfr[,61:109] # plot: plot(a, auxvar=a$att.data[,61], xlim=c(-125,-99),ylim=c(28,52), xlab=, ylab=, frame.plot=F,axes=F) There were 50 or more warnings (use warnings() to see the first 50) warnings() 49: axes is not a graphical parameter in: polygon(xy$x, xy$y, col,border, lty, ...) 50: frame.plot is not a graphical parameter in: polygon(xy$x, xy$y,col, border, lty, ...) # The a$att.data type has changed to becoming a typical data.frame - attr is not mentioned: str(a$att.data) [snip] $ 2003 : num 0 0 0 0 0 0 0 0 0 0 ... $ 2004 : num 0 0 0 0 0 0 0 0 0 0 ... $ 2005 : num 0 0 0 0 0 0 0 0 0 0 ... -- Tord Snäll Department of Conservation Biology Swedish University of Agricultural Sciences (SLU) P.O. 7002, SE-750 07 Uppsala, Sweden Office/Mobile/Fax +46-18-672612/+46-730-891356/+46-18-673537 E-mail: [EMAIL PROTECTED] www.nvb.slu.se/staff_tordsnall __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] correlation value and map
Hi Zoltan, Right, I have 30x32=960 data points per year (It is actually the mean febuary precipitation total in case you were wondering) at each grid point over the world, so I have 960 data points each of the 43 years. Therefore can I do anything with a trend and residuals? I don't think I can if it's just mean feb precipitation, one data point per grid square per year... I apreicate your help though very much.although I do still need to perform a spatial correlation if anyone else can help? Many thanks, Jenny Hi Jenny! So if i understand your datafile corect you have 960 case for a year. Any you have 43 years.. Yes? I'm not sure you should use correlation in this situation because of the autocorrelation of the data. There are big autocorrelation on spatial data's like what you use, and there are also a very big autocorrelation in time series data. I think you have to decompose your time series, and you have to cut down, the trend (and maybe some kind of sesonality), and than for the residuals you should do a correlation. You have to filter out the autocorrelation on the spatial data too, some way.. And because of the above problems, don't calculate correlation for the entierly databases! bye, Zoltan Jennifer Barnes PhD student - long range drought prediction Climate Extremes Department of Space and Climate Physics University College London Holmbury St Mary, Dorking Surrey RH5 6NT 01483 204149 07916 139187 Web: http://climate.mssl.ucl.ac.uk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problems with optim, for-loops and machine precision
Without more detail - a reproducible example - it is hard to give you concrete advice. I wonder if the functions NLL23 and NLL21 depend on numerical solutions of a system of ODEs, since you invoke the odesolve package? If so, try switching to the Nelder-Mead optimizer, enforcing the parameter constraints using transformation. Probably you are using the finite difference derivatives calculated internally to optim for the gradient information used in the L-BFGS-B optimizer. These can be unstable when based on numerical solutions of odes, causing the optimizer to fail, or sometimes to converge to a non-optimal point. Some other points: - you cannot change machine precision by changing values in .Machine. To change the number of digits printed, use options(digits=8). - use 'library()' instead of 'require()', unless you need to use the return value from 'require()' R. Woodrow Setzer, Ph. D. National Center for Computational Toxicology US Environmental Protection Agency Mail Drop B205-01/US EPA/RTP, NC 27711 Ph: (919) 541-0128Fax: (919) 541-1194 Simon Ruegg [EMAIL PROTECTED] unizh.ch To Sent by: r-help@stat.math.ethz.ch [EMAIL PROTECTED]cc tat.math.ethz.ch Subject [R] problems with optim, 01/10/2007 07:18 for-loops and machine precision AM Dear R experts, I have been encountering problems with the optim routine using for loops. I am determining the optimal parameters of several nested models by minimizing the negative Log-Likelihood (NLL) of a dataset. The aim is to find the model which best describes the data. To this end, I am simulating artificial data sets based on the model with the least number of parameters (6) and the parameters determined with the field data. For each artificial set I estimate the parameters of the model with 6 parameters and the next more complex model with 7 parameters (two of these parameters are equal in the 6-parameter model) by minimizing the corresponding NLL with optim. In theory the 7-parameter model should fit the data either equally or better than the 6-parameter model. Therefore the difference of the minimal NLLs should be 0 or larger. For 500 data sets I use the following code: require(odesolve) res=matrix(nrow=500,ncol=18) colnames(res)=c(a_23,beta_23,mu_23,d_23,I_23,M_23,NLL_23, a_21,beta_21,mu_21,e_21,d_21,I_21,M_21,NLL_21, NLL23_min_21,conv23,conv21) for (s in 1:500) { assign(data,read.table(paste(populations/TE23simset_,s,.txt,sep=)),e nv=MyEnv) #reading a data set M23=optim(rep(0.1,6),NLL23,method=L-BFGS-B,lower=0, upper=c(Inf,Inf,Inf,Inf,1,1),control=c(maxit=150)) if (M23$convergence==0) { M21=optim(rep(0.1,7),NLL21,method=L-BFGS-B,lower=0, upper=c(Inf,Inf,Inf,Inf,Inf,1,1),control=c(maxit=150)) } res[s,1]=M23$par[1] res[s,2]=M23$par[2] res[s,3]=M23$par[3] res[s,4]=M23$par[4] res[s,5]=M23$par[5] res[s,6]=M23$par[6] res[s,7]=M23$value res[s,8]=M21$par[1] res[s,9]=M21$par[2] res[s,10]=M21$par[3] res[s,11]=M21$par[4] res[s,12]=M21$par[5] res[s,13]=M21$par[6] res[s,14]=M21$par[7] res[s,15]=M21$value res[s,16]=M23$value-M21$value res[s,17]=M23$convergence res[s,18]=M21$convergence write.table(res,compare23_21TE061221.txt) rm(M23,M21) } } For some strange reason the results do not correspond to what I expect: about 10% of the solutions have a difference of NLL smaller than 0. I have verified the optimisation of these results manually and found that a minimal NLL was ignored and a higher NLL was returned at convergence. To check what was happening I inserted a printing line in the NLL function to print all parameters and the NLL as the procedure goes on. To my surprise optim then stopped at the minimal NLL which had been ignored before. I have then reduced the machine precision to .Machine$double.digits=8 thinking, that the printing was slowing down the procedure and by reducing the
Re: [R] a question of substitute
Looks like oneway.test has been changed for R 2.5.0. Paste the code in this file: https://svn.r-project.org/R/trunk/src/library/stats/R/oneway.test.R into your session. Then fun.2 from your post will work without the workarounds I posted: fun.2(values ~ group) On 1/9/07, Adrian Dusa [EMAIL PROTECTED] wrote: On Tuesday 09 January 2007 15:14, Gabor Grothendieck wrote: oneway.test is using substitute on its arguments so its literally getting formula rather than the value of formula. Ah-haa... I understand now. Thanks for the tips, they both work as expected. Best, Adrian Try these: fun.3 - function(formula) { mc - match.call() mc[[1]] - as.name(oneway.test) eval.parent(mc) } fun.3(values ~ group) fun.4 - function(formula) { do.call(oneway.test, list(formula)) } fun.4(values ~ group) On 1/9/07, Adrian Dusa [EMAIL PROTECTED] wrote: Hi all, I want to write a wrapper for an analysis of variance and I face a curious problem. Here are two different wrappers: fun.1 - function(formula) { summary(aov(formula)) } fun.2 - function(formula) { oneway.test(formula) } values - c(15, 8, 17, 7, 26, 12, 8, 11, 16, 9, 16, 24, 20, 19, 9, 17, 11, 8, 15, 6, 14) group - rep(1:3, each=7) # While the first wrapper works just fine: fun.1(values ~ group) # the second throws an error: fun.2(values ~ group) Error in substitute(formula)[[2]] : object is not subsettable ### I also tried binding the two vectors in a data.frame, with no avail. I did find a hack, creating two new vectors inside the function and creating a fresh formula, so I presume this has something to do with environments. Could anybody give me a hint on this? Thank you, Adrian -- Adrian Dusa Romanian Social Data Archive 1, Schitu Magureanu Bd 050025 Bucharest sector 5 Romania Tel./Fax: +40 21 3126618 \ +40 21 3120210 / int.101 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Adrian Dusa Arhiva Romana de Date Sociale Bd. Schitu Magureanu nr.1 050025 Bucuresti sectorul 5 Romania Tel./Fax: +40 21 3126618 \ +40 21 3120210 / int.101 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] go or goto command
On 1/10/2007 12:13 AM, Thomas L Jones wrote: Some computer languages, including C, have a go or go to command which can be used to shift control to a different part of a function. Question: Does the R language have a corresponding command? (Yes, I am aware that it can be abused; but, in the hands of a good programmer, it is simple to use.) No, it doesn't. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression packages
1. multinom is is the nnet package 2. There is a polyclass function in package polspline On 10/01/07, Feng Qiu [EMAIL PROTECTED] wrote: Hi All: I'm testing a set of data classification algorithms in this paper (www.stat.wisc.edu/~loh/treeprogs/quest1.7/mach1317.pdf ) I couldn't find such algorithms in R packages: 1. LOG: polytomous logistic regression (there was one in MASS library: multinom. But after I update MASS library, multinom was lost.) 2. POL: POLYCLASS algorithm. There is a S-Plus package(polyclass library) for this algorithm, so there should be a corresponding package in R, but I haven't found it so far. Any advice is appreciated. Best, Feng __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SAS and R code hazard ratios
Greetings, I am new to R and have been comparing CPH survival analysis hazard ratios between R and SAS PhReg. The binary covariates' HRs are the same, however the continuous variables, for example age, have quite different HRs although in the same direction. SAS PhReg produces HRs which are the change in risk for every one increment change in the independent variable. How do I interpret the HRs produced by R?Thanks much, C Colleen Ross, MS Clinical Research Unit Kaiser Permanente Colorado (303) 614-1244 NOTICE TO RECIPIENT: If you are not the intended recipient ...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a question of substitute
On Wednesday 10 January 2007 19:03, Gabor Grothendieck wrote: Looks like oneway.test has been changed for R 2.5.0. Paste the code in this file: https://svn.r-project.org/R/trunk/src/library/stats/R/oneway.test.R into your session. Then fun.2 from your post will work without the workarounds I posted: fun.2(values ~ group) Brilliant :) Super fast change, this is why I love R. Cheers, Adrian -- Adrian Dusa Romanian Social Data Archive 1, Schitu Magureanu Bd 050025 Bucharest sector 5 Romania Tel./Fax: +40 21 3126618 \ +40 21 3120210 / int.101 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scripts with littler
Brian Ripley wrote: Exactly as documented. The argument is named 'new' and not 'add', BTW. Please 'be careful' in what you say about the work of others. Agreed, no criticism intended. I really like R. Sorry. Cheers, John. -- Contractor in Cambridge UK -- http://www.aspden.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 2 problems with latex.table (quantreg package) - reproducible
Dear all, When using latex.table from the quantreg package, I don't seem to be able to set table.env=FALSE: when I don't specify caption (as I think I should, when understanding the R help rightly(?)), I get an error message, and when I do so, of course I get one, as well. The funny thing is, that a table is indeed produced in the first case, so I get a nice tabular, but as I'm using the command within a for - loop, the loop stops due to the error and only one latex table is produced. Example R-Code: library(quantreg) v1 - c(val1,val1,val2) v2 - c(val1,val2,val2) tab - table(v1,v2) latex.table(tab,table.env=FALSE) #error - german R error message (saying that caption is missing and has no default :-) ): #Fehler in cat(caption, \n, file = fi, append = TRUE) : # Argument caption fehlt (ohne Standardwert) latex.table(tab,table.env=FALSE,caption=nothing) #error - german R error message: #Fehler in latex.table(tab, table.env = FALSE, caption = nothing) : # you must have table.env=TRUE if caption is given The second problem is, that - when using latex.table to produce a tabular within a table environment - I would like to specify cgroup with only one value - one multicolumn being a heading for both columns in the table. But I'm not able to produce latex-compilable code: latex.table(tab,cgroup=v2,caption=my table) gives me the following latex code: \begin{table}[hptb] \begin{center} \begin{tabular}{|l||c|c|} \hline \multicolumn{1}{|l||}{\bf tab}\multicolumn{}{c||}{}\multicolumn{2}{c|}{\bf v2}\\ \cline{2-3} \multicolumn{1}{|l||}{}\multicolumn{1}{c|}{val1}\multicolumn{1}{c|}{val2}\\ \hline val111\\ val201\\ \hline \end{tabular} \vspace{3mm} \caption{my table\label{tab}} \end{center} \end{table} and within this code the problem is the second multicolumn (\multicolumn{}{c||}{}), as it has no number specifying how many columns the multicolumn should cover. Latex (at least my version) complains. When deleting this part of the code, the table is compiled and looks exactly how I want it to look. I'm doing this with a system call and an shell script right now, but this seems pretty ugly to me... When I specify 2 columns, this problem doesn't occur: latex.table(tab,cgroup=c(blah,v2),caption=my table) I'm running R Version 2.3.0 (2006-04-24) on a linux machine Fedora Core 5 (i386). Can anyone help me find my mistakes? Thanks a lot ... and sorry for my bad English and potential newbie mistakes!! Kati __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using DBI
can anybody provide a reference to a document on using DBI and what is needed to make it work? I try: drv=dbDriver(Oracle) and it complains about not finding a function. The DBI.pdf does not require anything else installed, but looking at the archive I discovered there is more stuff to be installed. This is on Solaris 5.10 with a full Oracle install, so all Oracle libs are available and sqlplus works. I am trying to connect remotely to another Oracle instance. Thank you very much Stephen [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2 problems with latex.table (quantreg package) - reproducible
The usual R-help etiquette recommends: 1. questions about packages go to the maintainer, not to R-help. 2. examples should be reproducible: ie self contained. if you look carefully at the function latex.summary.rqs you will see that there is a failure to pass the argument ... on to latex.table. This _may_ be the source of your problem if in fact your v1 and v2 were summary.rqs objects, but I doubt that they are. You might try caption = . More generally there are much improved latex tools elsewhere in R; if you aren't making tables that are specific to quantreg, you might want to use them. url:www.econ.uiuc.edu/~rogerRoger Koenker email[EMAIL PROTECTED]Department of Economics vox: 217-333-4558University of Illinois fax: 217-244-6678Champaign, IL 61820 On Jan 10, 2007, at 12:23 PM, Kati Schweitzer wrote: Dear all, When using latex.table from the quantreg package, I don't seem to be able to set table.env=FALSE: when I don't specify caption (as I think I should, when understanding the R help rightly(?)), I get an error message, and when I do so, of course I get one, as well. The funny thing is, that a table is indeed produced in the first case, so I get a nice tabular, but as I'm using the command within a for - loop, the loop stops due to the error and only one latex table is produced. Example R-Code: library(quantreg) v1 - c(val1,val1,val2) v2 - c(val1,val2,val2) tab - table(v1,v2) latex.table(tab,table.env=FALSE) #error - german R error message (saying that caption is missing and has no default :-) ): #Fehler in cat(caption, \n, file = fi, append = TRUE) : # Argument caption fehlt (ohne Standardwert) latex.table(tab,table.env=FALSE,caption=nothing) #error - german R error message: #Fehler in latex.table(tab, table.env = FALSE, caption = nothing) : # you must have table.env=TRUE if caption is given The second problem is, that - when using latex.table to produce a tabular within a table environment - I would like to specify cgroup with only one value - one multicolumn being a heading for both columns in the table. But I'm not able to produce latex-compilable code: latex.table(tab,cgroup=v2,caption=my table) gives me the following latex code: \begin{table}[hptb] \begin{center} \begin{tabular}{|l||c|c|} \hline \multicolumn{1}{|l||}{\bf tab}\multicolumn{}{c||}{}\multicolumn{2}{c|}{\bf v2}\\ \cline{2-3} \multicolumn{1}{|l||}{}\multicolumn{1}{c|}{val1}\multicolumn{1} {c|}{val2}\\ \hline val111\\ val201\\ \hline \end{tabular} \vspace{3mm} \caption{my table\label{tab}} \end{center} \end{table} and within this code the problem is the second multicolumn (\multicolumn{}{c||}{}), as it has no number specifying how many columns the multicolumn should cover. Latex (at least my version) complains. When deleting this part of the code, the table is compiled and looks exactly how I want it to look. I'm doing this with a system call and an shell script right now, but this seems pretty ugly to me... When I specify 2 columns, this problem doesn't occur: latex.table(tab,cgroup=c(blah,v2),caption=my table) I'm running R Version 2.3.0 (2006-04-24) on a linux machine Fedora Core 5 (i386). Can anyone help me find my mistakes? Thanks a lot ... and sorry for my bad English and potential newbie mistakes!! Kati __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Meeting announcement: An Introduction to Data Analysis Using R
*An Introduction to Data Analysis Using R* by Paul Mathews for The Quality Managers Network, Lakeland Community College, Kirtland, Ohio 26 January 2007, 7:30-9:00 AM, Room T136. R is free software used for graphical and statistical data analysis that can be downloaded from the Internet. Because R code is written and shared by many people around the world, it has a tremendous range of capabilities. Consequently, most graphical and statistical methods that have been implemented in any commercially-available statistical software package have been implemented in R. Paul Mathews will present an introduction to data analysis using R including: * How to obtain and install R and R packages. * How to access limited R functions from the R Commander GUI package. * How to enter your data into R. * How to graph your data. * How to perform basic statistical analyses. * How to analyze designed experiments. * How to save your work. * How to learn more about R. -- Paul Mathews Mathews Malnar and Bailey, Inc. 217 Third Street, Fairport Harbor, OH 44077 Phone: 440-350-0911 Fax: 440-350-7210 E-mail: Paul: [EMAIL PROTECTED], [EMAIL PROTECTED] Rebecca: [EMAIL PROTECTED], [EMAIL PROTECTED] Web: www.mmbstatistical.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using DBI
The way MySQL works, I use RMySQL to contact, which in turn uses DBI. There is a library ROracle which, from the manual, works the same. Hence I would start by looking at Roracle for the connection and proceed from there. Kees On Wednesday 10 January 2007 19:39, Bond, Stephen wrote: can anybody provide a reference to a document on using DBI and what is needed to make it work? I try: drv=dbDriver(Oracle) and it complains about not finding a function. The DBI.pdf does not require anything else installed, but looking at the archive I discovered there is more stuff to be installed. This is on Solaris 5.10 with a full Oracle install, so all Oracle libs are available and sqlplus works. I am trying to connect remotely to another Oracle instance. Thank you very much Stephen [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] map data.frame() data after having linked them to a read.shape() object
On Wed, 10 Jan 2007, Tord Snäll wrote: Dear all, I try to first link data in a data.frame() to a (polygon) read.shape() object and then to plot a polygon map showing the data in the data.frame. The first linking is called link in ArcView and relate in ArcMap. I use the code shown below, though without success. Help with this would be greatly appreciated. Please consider continuing this thread on the R-sig-geo list. Searching the archives might also have given you some leads. For now, and apart from not using sp classes (see note in R News in 2005), you have 490 polygons in the shapefile - probably one duplicate and 489 unique entities in STACOUNT4, but only 106 unique stacount of 431 observations in the data frame. The plot method for Map objects is deprecated. The classes in the sp package use S4, not S3, specifically to help users avoid putting things inside objects that break methods. In maptools, see ?readShapePoly, and ?spCbind to see how to read a shapefile into an sp object (check the 490/489 issue), and how the Polygons IDs can be used as a key with the data frame row names to make this easier to do. Please also consider using FIPS numbers as IDs for counties; the five digit ssccc ID is fairly standard, and avoids the problem of repetitive county names across US states. Thanks! Tord require(maptools) # Read shape file (one row per county) a=read.shape(myshp.shp, dbf.data=TRUE, verbose=TRUE) str(a) ..- attr(*, maxbb)= num [1:4] -100 4900 ..- attr(*, class)= chr ShapeList $ att.data:'data.frame': 490 obs. of 60 variables: ..$ STATE_FIPS: Factor w/ 12 levels 04,06,08,..: 11 11 11 11 4 5 5 5 5 5 ... [snip] ..$ STACOUNT4 : Factor w/ 489 levels ArizonaApache,..: 437 460 451 453 147 207 195 198 231 206 ... ..- attr(*, data_types)= chr [1:60] C C C C ... - attr(*, class)= chr Map # Read case data (one row per case) cases = read.table(cases.txt, h=T,) str(cases) 'data.frame': 431 obs. of 8 variables: $ Year: int 1950 1950 1950 1951 1956 1957 1959 1959 1959 1959 ... $ Case: int 3 1 2 1 1 1 2 4 1 3 ... $ stacount: Factor w/ 106 levels ArizonaApache,..: 1 66 76 66 26 29 15 25 30 60 ... # table the cases data PER Year, PER County (County = stacount) temp = t(table(cases[,c(Year,stacount)])) stacount = dimnames(temp)$stacount temp = cbind(stacount, as.data.frame(temp[,1:ncol(temp)],row.names=F)) str(temp) 'data.frame': 106 obs. of 50 variables: $ stacount: Factor w/ 106 levels ArizonaApache,..: 1 2 3 4 5 6 7 8 9 10 ... $ 1950: int 1 0 0 0 0 0 0 0 0 0 ... [snip] $ 2005: int 0 0 0 0 0 0 0 0 0 0 ... # Pick out a temporary attribute data.frame datfr = a$att.data # Merge the temporaty data frame with tabled cases for(i in 2:ncol(temp)){ datfr = merge(datfr, temp[,c(1,i)], by.x=STACOUNT4, by.y=stacount, all.x=T, all.y=F) } #Replace NAs with 0: for(i in 61:109){ datfr[,i] = ifelse(is.na(datfr[,i])==T,0,datfr[,i]) } str(a$att.data) 'data.frame': 490 obs. of 60 variables: $ NAME : Factor w/ 416 levels Ada,Adams,..: 120 352 265 277 33 210 122 135 372 209 ... [snip] $ STACOUNT4 : Factor w/ 489 levels ArizonaApache,..: 437 460 451 453 147 207 195 198 231 206 ... - attr(*, data_types)= chr C C C C ... # Note that the above data is of attribute type str(datfr) 'data.frame': 490 obs. of 109 variables: $ STACOUNT4 : Factor w/ 489 levels ArizonaApache,..: 1 2 3 4 5 6 7 8 9 10 ... [snip] $ 1951 : num 0 0 0 0 0 0 0 0 0 0 ... [snip] $ 2005 : num 0 0 0 0 0 0 0 0 0 0 ... # Note that at the end of this, data type is not described - it is a simple data frame # bind data together: #Alternative 1: a$att.data = cbind(a$att.data, datfr[,61:109]) # Other alternatives: test = matrix(ncol=49) a$att.data[,61:109] = test a$att.data[,61:109] = datfr[,61:109] # plot: plot(a, auxvar=a$att.data[,61], xlim=c(-125,-99),ylim=c(28,52), xlab=, ylab=, frame.plot=F,axes=F) There were 50 or more warnings (use warnings() to see the first 50) warnings() 49: axes is not a graphical parameter in: polygon(xy$x, xy$y, col,border, lty, ...) 50: frame.plot is not a graphical parameter in: polygon(xy$x, xy$y,col, border, lty, ...) # The a$att.data type has changed to becoming a typical data.frame - attr is not mentioned: str(a$att.data) [snip] $ 2003 : num 0 0 0 0 0 0 0 0 0 0 ... $ 2004 : num 0 0 0 0 0 0 0 0 0 0 ... $ 2005 : num 0 0 0 0 0 0 0 0 0 0 ... -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented,
[R] Installation problem with package mixtools
I am trying to install mixtools on Debian Etch and get the following error dell-2 /usr/lib # R CMD INSTALL mixtools_0.1.0.tar.gz * Installing *source* package 'mixtools' ... ** libs gcc -I/usr/share/R/include -I/usr/share/R/include -fpic -g -O2 -std=gnu99 -c new_svalues.c -o new_svalues.o gfortran -fpic -g -O2 -c sphericaldepth.f -o sphericaldepth.o make: gfortran: Command not found make: *** [sphericaldepth.o] Error 127 ERROR: compilation failed for package 'mixtools' ** Removing '/usr/local/lib/R/site-library/mixtools' I have upgraded my installation of libgfortran and have added /usr/lib to PATH but the error persists. The same file has installed perfectly on a similar machine. I am not sure how to proceed from here. thanks Dave __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression packages
Hi David: Thanks for you information. 2 further questions: 1. I found out that multinom is not doing politomous logistic regression, do you know which function does this? 2. the polyclass in polspline does polychotomous regression, while I'm looking for polytomous regression. Do you think these two are similar int erms of prediction? Best, Feng - Original Message - From: David Barron [EMAIL PROTECTED] To: r-help r-help@stat.math.ethz.ch Sent: Wednesday, January 10, 2007 12:14 PM Subject: Re: [R] logistic regression packages 1. multinom is is the nnet package 2. There is a polyclass function in package polspline On 10/01/07, Feng Qiu [EMAIL PROTECTED] wrote: Hi All: I'm testing a set of data classification algorithms in this paper (www.stat.wisc.edu/~loh/treeprogs/quest1.7/mach1317.pdf ) I couldn't find such algorithms in R packages: 1. LOG: polytomous logistic regression (there was one in MASS library: multinom. But after I update MASS library, multinom was lost.) 2. POL: POLYCLASS algorithm. There is a S-Plus package(polyclass library) for this algorithm, so there should be a corresponding package in R, but I haven't found it so far. Any advice is appreciated. Best, Feng __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] go or goto command
Thomas L Jones wrote: Some computer languages, including C, have a go or go to command which can be used to shift control to a different part of a function. Question: Does the R language have a corresponding command? (Yes, I am aware that it can be abused; but, in the hands of a good programmer, it is simple to use.) Nope. Only break, next, and return. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SAS and R code hazard ratios
[EMAIL PROTECTED] wrote: Greetings, I am new to R and have been comparing CPH survival analysis hazard ratios between R and SAS PhReg. The binary covariates' HRs are the same, however the continuous variables, for example age, have quite different HRs although in the same direction. SAS PhReg produces HRs which are the change in risk for every one increment change in the independent variable. How do I interpret the HRs produced by R?Thanks much, C I'm not aware of peculiarities. You're not giving us much to go on though. In fact, not even the function used to fit the model with. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Exactly... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SAS and R code hazard ratios
On Wed, 10 Jan 2007, [EMAIL PROTECTED] wrote: I am new to R and have been comparing CPH survival analysis hazard ratios between R and SAS PhReg. The binary covariates' HRs are the same, however the continuous variables, for example age, have quite different HRs although in the same direction. SAS PhReg produces HRs which are the change in risk for every one increment change in the independent variable. How do I interpret the HRs produced by R?Thanks much, C What function did you use to fit the model in R? If you used coxph(), in the survival package then you should get the same answers as SAS. If you used cph() in the design package then the output says what the increment is that correponds to the quoted hazard ratio, and the default is the interquartile range. -thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] TCL/TK and R documentation?
I am hoping something has changed since I last asked about this. Is there any good source of documentation, including examples, of using tcl/tk as a gui for R applications? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] axis labels at subset of tick marks
For example, this works: x = seq(-100, 1000, 25) y = x * x plot(x,y, xaxt=n) axis(1,x,FALSE,tcl=-0.3) axis(1,x[x %% 100 ==0]) It creates two axis objects and the values of the x-axis are the labels. The following scenario is more difficult, because it uses 'image' to plot a grid of values: a = matrix(rnorm(100),ncol=10) image(1:ncol(a), 1:nrow(a), a, xaxt=n, yaxt=n) # adding arbitrary categorical labels to y-axis ylabels=c('A','B','C','D','E','F','G','H','I','J') axis(2, at=1:10, labels=ylabels, las=HORIZONTAL-1) # adding arbitrary categorical labels to x-axis, # for every nth category; first add tick marks, # then the labels axis(1, at=1:10, FALSE) xlabels=c('A','','C','','E','','G','','I','') # this is length=10 axis(1, at=1:10, labels=xlabels) This works, but when using axis, the 'at=1:10' and the length(xlabels) must be the same length. Is it ever possible to specify axis when these values are not the same length? That is, the following does not work: x = seq(-100, 1000, 25) y = x * x plot(x,y, xaxt=n) axis(1,x,labels=x[x %% 100 ==0]) Error in axis(side, at, labels, tick, line, pos, outer, font, lty, lwd, : 'at' and 'label' lengths differ, 45 != 12 Would it be possible to have axis automatically find the right location within the x-axis for these labels? Perhaps something like: xt = (x %% 100 == 0) xlabels = x xlabels[!xt] = '' This would leave the intermittent label values. I guess a for loop is needed to generate xt for irregular intervals in the labels (eg, c(10, 50, 75, 150, 400)). If the axis command could do this, it would not be necessary to call it twice to get a subset of labels for all the tick marks. That is, it would create tick marks for the 'at' specification and labels at the 'labels' specification (with no restraint on these being equal lengths). Regards, Darren __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problems with optim, for-loops and machine precision
Two possibilities for why your 7 parameter model fits worse than the 6 are that you are finding a local maximum, which might suggest using a different parameterisation or the functions are accessing some global data and so aren't behaving as expected. This could be why they work properly when run on their own. I would also check what happens if convergence fails for the 7 parameter model, in your code this isn't handled well. Also if the constraint on parameters of =0 is actually 0, it may be better to work with parameters on the log scale, avoiding the constraints. I have found with simulations it is useful to save the fitted objects, so they can be inspected later, or for the parameters to be extracted after the models are fitted. This method allows restarting in case of computer crashes. Ken Simon Ruegg [EMAIL PROTECTED] 01/10/07 11:18 PM Dear R experts, I have been encountering problems with the optim routine using for loops. I am determining the optimal parameters of several nested models by minimizing the negative Log-Likelihood (NLL) of a dataset. The aim is to find the model which best describes the data. To this end, I am simulating artificial data sets based on the model with the least number of parameters (6) and the parameters determined with the field data. For each artificial set I estimate the parameters of the model with 6 parameters and the next more complex model with 7 parameters (two of these parameters are equal in the 6-parameter model) by minimizing the corresponding NLL with optim. In theory the 7-parameter model should fit the data either equally or better than the 6-parameter model. Therefore the difference of the minimal NLLs should be 0 or larger. For 500 data sets I use the following code: require(odesolve) res=matrix(nrow=500,ncol=18) colnames(res)=c(a_23,beta_23,mu_23,d_23,I_23,M_23,NLL_23, a_21,beta_21,mu_21,e_21,d_21,I_21,M_21,NLL_21, NLL23_min_21,conv23,conv21) for (s in 1:500) { assign(data,read.table(paste(populations/TE23simset_,s,.txt,sep=)),e nv=MyEnv) #reading a data set M23=optim(rep(0.1,6),NLL23,method=L-BFGS-B,lower=0, upper=c(Inf,Inf,Inf,Inf,1,1),control=c(maxit=150)) if (M23$convergence==0) { M21=optim(rep(0.1,7),NLL21,method=L-BFGS-B,lower=0, upper=c(Inf,Inf,Inf,Inf,Inf,1,1),control=c(maxit=150)) } res[s,1]=M23$par[1] res[s,2]=M23$par[2] res[s,3]=M23$par[3] res[s,4]=M23$par[4] res[s,5]=M23$par[5] res[s,6]=M23$par[6] res[s,7]=M23$value res[s,8]=M21$par[1] res[s,9]=M21$par[2] res[s,10]=M21$par[3] res[s,11]=M21$par[4] res[s,12]=M21$par[5] res[s,13]=M21$par[6] res[s,14]=M21$par[7] res[s,15]=M21$value res[s,16]=M23$value-M21$value res[s,17]=M23$convergence res[s,18]=M21$convergence write.table(res,compare23_21TE061221.txt) rm(M23,M21) } } For some strange reason the results do not correspond to what I expect: about 10% of the solutions have a difference of NLL smaller than 0. I have verified the optimisation of these results manually and found that a minimal NLL was ignored and a higher NLL was returned at convergence. To check what was happening I inserted a printing line in the NLL function to print all parameters and the NLL as the procedure goes on. To my surprise optim then stopped at the minimal NLL which had been ignored before. I have then reduced the machine precision to .Machine$double.digits=8 thinking, that the printing was slowing down the procedure and by reducing the machine precision to speed up the calculations. For an individual calculation this solved my problem. However if I implemented the same procedure in the loop above, the same impossible results occurred again. Can anyone tell me where I should be looking for the problem, or what it is and how I could solve it? Thanks a lot for your help Sincerely Simon Simon Ruegg Dr.med.vet., PhD candidate Institute for Parasitology Winterthurstr. 266a 8057 Zurich Switzerland phone: +41 44 635 85 93 fax: +41 44 635 89 07 e-mail: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] TCL/TK and R documentation?
Michael Prager wrote: I am hoping something has changed since I last asked about this. Is there any good source of documentation, including examples, of using tcl/tk as a gui for R applications? Well, this probably hasn't changed, but did you see it in the first place? http://bioinf.wehi.edu.au/~wettenhall/RTclTkExamples/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A question about R environment
[Philippe Grosjean] Please, don't reinvent the wheel: putting functions in a dedicated environment is one of the things done by R packages (together with a good documentation of the function, and making them easily installable on any R implementation). [...] this is probably the time for you to read the Writing R extensions manual, and to start implementing your own R package! Hi, Philippe, and gang! I read this manual long ago and used it to create packages already. You really got the impression I did not read it? :-) You know, there are small wheels, and huge wheels. I do not see why I would use complex devices for tiny problems, merely because those complex devices exist. R packages undoubtedly have their virtues, of course. But just like many statistical tests, they do not always apply. Why go at length organising package directories populated with many files, resorting to initialisation scripts, using package validators, creating documentation files and processing them, go through the cycle of creating a package and installing it, all that merely for a few small quickies that fit very well in the ubiquitous .Rprofile file? Why worry about installation on any R implementation, for little things only meant for myself, and too simple to warrant publication anyway? Keep happy, all. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] axis date format in lattice
Hello list, plotting the following example 1 in lattice only labels the x-axis with one tick mark at the beginning of February. In contrast the standard plot() function labels the x-axis with equally spaced tick marks. #example 1 --- library(lattice) date - as.Date(c(1/10/2006, 01/11/2006, 01/12/2006, 01/01/2007, 01/02/2007),format=%d/%m/%Y) xx - c(1,3,5,6,7) xx1 - as.data.frame(cbind(date, xx)) xyplot(xx~date, scales=list(x=list(format = %b))) plot(xx~date) Interestingly the behaviour in lattice only seems to arise when there is a change in calendar years. This is demonstrated by the second example with dates from within 1 calendar year only where lattice works as expected. I was wondering if the behaviour in example 1 was due to a bug or am I missing something and, more importantly, is there a way around it? #---example 2 --- date - as.Date(c(1/08/2006, 01/09/2006, 01/10/2006, 01/11/2006, 01/12/2006),format=%d/%m/%Y) xx - c(1,3,5,6,7) xx1 - as.data.frame(cbind(date, xx)) xyplot(xx~date, scales=list(x=list(format = %b))) plot(xx~date) Cheers Karl _ Dr Karl J Sommer, Department of Primary Industries, Catchment Agriculture Services, PO Box 905 Mildura, VIC, Australia 3502 Tel: +61 (0)3 5051 4390 Fax +61 (0)3 5051 4534 Email: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a question of substitute
The 'Right Thing' is for oneway.test() to allow a variable for the first argument, and I have altered it in R-patched and R-devel to do so. So if your students can make use of R-patched that would be the best solution. If not, perhaps you could make a copy of oneway.test from R-patched available to them. Normally I would worry about namespace issues, but it seems unlikely they would matter here: if they did assignInNamespace is likely to work to insert the fix. Grothendieck's suggestions are steps towards a morass: they may work in simple cases but can make more complicated ones worse (such as looking for 'data' in the wrong place). These model fitting functions have rather precise requirements for where they look for their components: 'data' the environment of 'formula' the environment of the caller and that includes where they look for 'data'. It is easy to use substitute or such to make a literal formula out of 'formula', but doing so changes its environment. So one needs to either (a) fix up an environment within which to evaluate the modified call that emulates the scoping rules or (b) create a new 'data' that has references to all the variables needed, and just call the function with the new 'formula' and new 'data'. At first sight model.frame() looks the way to do (b), but it is not, since if there are function calls in the formula (e.g. log()) the model frame includes the derived variables and not the original ones. There are workarounds (e.g. in glmmPQL), like using all.vars, creating a formula from that, setting its environment to that of the original function and then calling model.frame. This comes up often enough that I have contemplated adding a solution to (b) to the stats package. Doing either of these right is really pretty complicated, and not something to dash off code in a fairly quick reply (or even to check that the code in glmmPQL was general enough to be applicable). On Tue, 9 Jan 2007, Adrian Dusa wrote: On Tuesday 09 January 2007 15:41, Prof Brian Ripley wrote: oneway.test expects a literal formula, not a variable containing a formula. The help page says formula: a formula of the form 'lhs ~ rhs' where 'lhs' gives the sample values and 'rhs' the corresponding groups. Furthermore, if you had foo.2 - function() oneway.test(value ~ group) it would still not work, as data: an optional matrix or data frame (or similar: see 'model.frame') containing the variables in the formula 'formula'. By default the variables are taken from 'environment(formula)'. I could show you several complicated workarounds, but why do you want to do this? Thank you for your reply. The data argument was exactly the next problem I faced. My workaround involves checking if(missing(data)) then uses different calls to oneway.test(). I am certainly interested in other solutions, this one is indeed limited. I do this for the students in the anova class, checking first the homogeneity of variances with fligner.test(), printing the p.value and based on that changing the var.equal argument in the oneway.test() It's just for convenience, but they do like having it all-in-one. Best regards, Adrian -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] posthoc tests with ANCOVA
## The WoodEnergy example in package HH (available on CRAN) is similar ## to the example you asked about. I have two factors and a ## covariate, and the interaction between the factors and covariate is ## significant. I construct the multiple comparisons of factor Stove ## at each level of factor Wood at a specified value of the covariate. ## open a trellis.device() with history recording on. ## There are many plots and you will need to move among them. require(HH) source(paste(file.path(.path.package(package=HH)[1], scripts), MMC.WoodEnergy-aov.R, sep=/)) ## This call prints an Error message that you can ignore because ## the relevant call is inside a try() expression. anova(energy.aov.4) ## source() needs a print() statement, demo() doesn't. source(paste(file.path(.path.package(package=HH)[1], scripts), MMC.WoodEnergy.R, sep=/)) ## The contrast matrix is developed beginning on line 91 of file ## MMC.WoodEnergy.R ## The MMC (mean-mean multiple comparisons) plots are described in R ## with ?MMC. The MMC plots in R use the glht package. ## Both files are commented. Please read the comments. ## Those source() commands in HH_1.17 will be replaced by demo() ## commands in the next version of HH. ## ## demo(MMC.WoodEnergy-aov, package=HH) ## demo(MMC.WoodEnergy, package=HH) ## glht is not currently able to work with aov objects that have the ## Error() function. You will need to respecify your model formula ## using the terms() function. See the maiz.aov example in the ?MMC ## help page. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] zero margin / marginless plots
Hi, I'd like to produce a marginless or zero margin plot so that the pixel coordinates represent the mathematics. xy-data.frame(x=c(0,1,1,0,0),y=c(0,1,0,0,1)) png('junk.png',width=300,height=300) par(mar=c(0,0,0,0)) plot(xy$x,xy$y,xlim=c(0,1),ylim=c(,1)) dev.off() The resultant file has about a 10 pixel margin around these lines, and I'm not sure what parameter or function is controlling this offset. Any hints? Thanks for your time, Dave -- Dr. David Forrest [EMAIL PROTECTED](804)684-7900w [EMAIL PROTECTED] (804)642-0662h http://maplepark.com/~drf5n/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] zero margin / marginless plots
On Wed, 2007-01-10 at 21:18 -0600, David Forrest wrote: Hi, I'd like to produce a marginless or zero margin plot so that the pixel coordinates represent the mathematics. xy-data.frame(x=c(0,1,1,0,0),y=c(0,1,0,0,1)) png('junk.png',width=300,height=300) par(mar=c(0,0,0,0)) plot(xy$x,xy$y,xlim=c(0,1),ylim=c(,1)) dev.off() The resultant file has about a 10 pixel margin around these lines, and I'm not sure what parameter or function is controlling this offset. Any hints? Thanks for your time, Dave By default, the axis ranges are extended by +/- 4%. You can change this by using: plot(xy$x, xy$y, xlim = c(0, 1), ylim = c(0, 1), xaxs = i, yaxs = i) where 'xaxs' and 'yaxs' set the axis ranges to the actual data ranges. See ?par for more information. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.