Re: [R] merge large number of raster objects
...and ?mget for retrieving multiple objects as a list. /Henrik On Thu, Jun 9, 2011 at 10:32 PM, Henrik Bengtsson h...@biostat.ucsf.edu wrote: See ?get and ?do.call. That should be enough. ?Reduce may be an alternative for do.call(), but could also be less memory efficient. My $.02 /Henrik On Thu, Jun 9, 2011 at 7:57 PM, Ben Zaitchik zaitc...@jhu.edu wrote: Hello, I have a large number of raster objects in memory, with names RH100, RH101, RH102, etc. (myobjects - ls(pattern='^RH')). These rasters are sections of a continuous map, and I would like to combine them using the RasterObject merge tool in package 'raster.' Merge expects an input list of raster objects (outmap-merge(x, y, ...), where x, y, and ... are raster objects). I can run the command successfully if I type in every raster object that I want to merge, but this is impractical for the large number of objects I'm combining. I would like to apply merge to a list of object names defined using some kind of wildcard-based list command, but I'm struggling to find the right data types in R. Is there some way to convert a vector of strings (e.g., as.vector(ls(pattern='^RH'))) to a vector of object names (as.names??) that could be specified as the input to the merge function? What I'd really like to do is something like outmap-merge(myobjects[1:40]), in order to merge the 40 raster objects, but I recognize that it might not be so simple. Advance apologies if this is already dealt with on the help list . . . I've gone in circles on wildcard, lapply, and names threads and haven't managed to get anything to work. Thank you, Ben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to assess the accuracy of fitted logistic regression using glm
Hi Professor Brian, Thanks for your reply. I think there are many statisticians here, and it is somehow R related, hoping someone can help me. I have done a simple test, using a sample csv data which I post if need. donut - read.csv(file=D:/donut.csv, header = TRUE); donut[[color]] - as.factor(donut[[color]]) donut[[shape]] - as.factor(donut[[shape]]) donut[[k]] - as.factor(donut[[k]]) donut[[k0]] - as.factor(donut[[k0]]) donut[[bias]] - as.factor(donut[[bias]]) lr - glm(color ~ shape + x + y, family = binomial, data = donut); summary(lr) Call: glm(formula = color ~ shape + x + y, family = binomial, data = donut) Deviance Residuals: Min 1Q Median 3Q Max -2.1079 -0.9476 0.5086 0.7518 1.4079 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) 2.530101.65500 1.529 0.1263 shape22 0.056281.54990 0.036 0.9710 shape23 -0.745681.44813 -0.515 0.6066 shape24 -2.618961.38016 -1.898 0.0578 . shape25 -2.076481.32818 -1.563 0.1180 x -0.458851.52863 -0.300 0.7640 y -0.593111.46999 -0.403 0.6866 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 50.446 on 39 degrees of freedom Residual deviance: 42.473 on 33 degrees of freedom AIC: 56.473 Number of Fisher Scoring iterations: 4 In the Coefficients section, is Pr(|z|) the P-value for that variable, and there are a few other questions: 1. How to determine the predict power of each variables? 2. How to determine the overall performance of the fitted model, here what's the difference between and Deviance Residuals and Residual deviance? 3. How to compare Null deviance and Residual deviance? 4. What does AIC mean, and how to use this measure? 5. What does the Signif. codes section mean? Regards, Xiaobo Gu On Mon, Jun 6, 2011 at 9:59 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On Mon, 6 Jun 2011, Xiaobo Gu wrote: Hi, I am trying glm with family = binomial to do binary logistic regression, but how can I assess the accuracy of the fitted model, the summary method can print a lot of information about the returned object, such as coefficients, because statistics is not my speciality, so can you share some rule of thumb to exam the fitted model from the practical perspective. It depends entirely on why you did the fit. People have written whole books on assessing the performance of classification procedures such as binary logistic regression. For example, the residual deviance is closely related to log-probability scoring: for some purposes that is a good performance measure, for others (e.g. when you are going to threshold the predicted probabilities) it can be very misleading. In short, you need statistical advice, not R advice (the purpose of this list). Regards, Xiaobo Gu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to convert a vector to an upper matrix
Hi all, If I have a vector A-1:6, I would like to convert A to an upper matrix M=[0 1 2 3 ; 0 0 4 5 ; 0 0 0 6; 0 0 0 0]. That is, if the length of the known vector is n, then the order m of the matrix I'd like to have should satisfy m(m-1)/2=n How could I do this in R? I really don't want to do it using a loop. Is there a function in R to do this ? Thanks! Best, Zhishi __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Remove Duplicate numbers from a vector through witing set of commands
Hi, I want to write a set of commands instead of unique(x) to remove duplicate numbers from a vector. Would you please help me. I don't know how to compare a vector numbers together. I am new to R. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Setting up a State Space Model in dlm
On Tue, 2011-06-07 at 17:24 +0100, Michael Ash wrote: This question pertains to setting up a model in the package dlm (dynamic linear models, http://cran.r-project.org/web/packages/dlm/index.html The author of the dlm package has just published a paper on state space models in R including details on setting up dlm: http://www.jstatsoft.org/v41/i04 That might help with your question - I haven't seen a reply on list, but am unable to help answer it either. HTH G I have read both the vignette and An R Package for Dynamic Linear Models (http://www.jstatsoft.org/v36/i12/paper), both of which are very helpful. There is also some discussion at https://stat.ethz.ch/pipermail/r-help/2009-May/198463.html I have what I think is a relatively straightforward state-space model but am unable to translate it into the terms of dlm. It would be very helpful to get a basic dlm setup for the problem and I would guess that I can then modify it with more lags, etc., etc. The main equation is pi[t] = a * pi[t-1] + b*(U[t] - UN[t]) + e[t] (see http://chart.apis.google.com/chart?cht=txchl=%5Cpi_t=a%5Cpi_{t-1}%2bb%28U_t-U^N_{t}%29%2Be_t for a pretty version) with pi and U observed, a and b fixed coefficients, and e a well-behaved error term (gaussian, say, variance unknown). The object of interest is the unobserved and time-varying component UN which evolves according to UN[t] = UN[t-1] + w[t] (see http://chart.apis.google.com/chart?cht=txchl=U%5EN_%7Bt%7D%20=%20U%5EN_%7Bt-1%7D%20%2B%20%5Cepsilon_t for a pretty version) that is, a random walk with well-behaved error term with var(w) known (or assumed). I'm interested in the estimates of a and b and also in estimating the time series of UN. Note that the term b*(U[t] - UN[t]) makes this a nonlinear model. Below is code that does not work as expected. I see the model as having four parameters, a, b, var(e), and UN. (Or do I have a parameter UN[t] for every period?) I do not fully understand the dlm syntax. Is FF specified properly? What should X look like? How does m0 relate to parm()? I would be grateful if someone would be willing to glance at the code. Thanks. Michael library(quantmod) library(dlm) ## Get and organize the data getSymbols(UNRATE,src=FRED) ## Unemployment rate getSymbols(GDPDEF,src=FRED) ## Quarterly GDP Implicit Price Deflator u - aggregate(UNRATE,as.yearqtr,mean) gdpdef - aggregate(GDPDEF,as.yearqtr,mean) pi - diff(log(gdpdef))*400 pilag - lag(pi,-1) tvnairu - cbind(pi,pilag,u) tvnairu.df - subset(data.frame(tvnairu), !is.na(pi) !is.na(u) !is.na(pilag)) ## First attempt buildNAIRU - function(x) { modNAIRU - dlm(FF=t(matrix(c(1,1,1,0))), GG=diag(4), W=matrix(c(0,0,0,0, 0,0,0,0, 0,0,0.04,0, 0,0,0,0),4,4), V=exp(x[4]), m0=rep(0,4), C0=diag(1e07,4), JFF = t(matrix(c(1,1,0,0))), X=cbind( tvnairu.df$pilag, tvnairu.df$u)) return(modNAIRU) } (fitNAIRU - dlmMLE(tvnairu.df$pi, parm=c(0,0,0,0) , build=buildNAIRU, hessian=TRUE, control=list(maxit=500))) (dlmNAIRU - buildNAIRU(fitNAIRU$par)) ## Second attempt buildNAIRU - function(x) { modNAIRU - dlm(FF=t(matrix(c(1,1,0,0))), GG=diag(4), W=matrix(c(0,0,0,0, 0,0,0,0, 0,0,0.04,0, 0,0,0,0 ),4,4), V=exp(x[4]), m0=c(0.7,-0.3,5,1), C0=diag(100,4), JFF = t(matrix(c(1,1,0,0))), X=cbind( tvnairu.df$pilag, tvnairu.df$u - x[3] )) return(modNAIRU) } (fitNAIRU - dlmMLE(tvnairu.df$pi, parm=c(0,0,0,0) , build=buildNAIRU, hessian=TRUE, control=list(maxit=500))) (dlmNAIRU - buildNAIRU(fitNAIRU$par)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with plotting plsr loadings
Amit Patel amitrh...@yahoo.co.uk writes: plot(BHPLS1, loadings, comps = 1:2, legendpos = topleft, labels = numbers, xlab = nm) Error in loadingplot.default(x, ...) : Could not convert variable names to numbers. str(BHPLS1_Loadings) loadings [1:8892, 1:60] -0.00717 0.00414 0.02611 0.00468 -0.00676 ... - attr(*, dimnames)=List of 2 ..$ : chr [1:8892] PCIList1 PCIList2 PCIList3 PCIList4 ... ..$ : chr [1:60] Comp 1 Comp 2 Comp 3 Comp 4 ... - attr(*, explvar)= Named num [1:60] 2.67 4.14 4.41 3.55 2.59 ... ..- attr(*, names)= chr [1:60] Comp 1 Comp 2 Comp 3 Comp 4 ... Can anyone see the problem?? By using `labels = numbers', you are asking the plot function to convert the names PCIList1 PCIList2 PCIList3 PCIList4 ... to numbers. It doesn't know how to do that. (See ?loadingplot for the details.) Your options are using `label = names', provide your own labels, not using the `labels' argument, or converting the names manually. -- Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to convert a vector to an upper matrix
Zhishi Wang wrote: Hi all, If I have a vector A-1:6, I would like to convert A to an upper matrix M=[0 1 2 3 ; 0 0 4 5 ; 0 0 0 6; 0 0 0 0]. That is, if the length of the known vector is n, then the order m of the matrix I'd like to have should satisfy m(m-1)/2=n How could I do this in R? I really don't want to do it using a loop. Is there a function in R to do this ? ?upper.tri ?uniroot A - 1:6 n - length(A) m - as.integer(uniroot(function(x) {x*(x-1)- 2*n}, c(1, n))$root) U - matrix(0,nrow=m,ncol=m) U[upper.tri(U)] - A U Berend -- View this message in context: http://r.789695.n4.nabble.com/how-to-convert-a-vector-to-an-upper-matrix-tp3587715p3587810.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scatterplot3d - help assign colors based on multiple conditions
On 09.06.2011 22:40, Karthik Kota wrote: Thanks a lot! This is very helpful. If I have to extend this to one more condition say assign blue if both the corresponding labels have _Tongue_dorsum, is there a straight forward function. Doing something like below is overriding the colors assigned by the first statement. col- ifelse(grepl(_Anterior_nares, xlabels) grepl(_Anterior_nares, ylabels), red, black) col- ifelse(grepl(_Tongue_dorsum, xlabels) grepl(_Tongue_dorsum, ylabels), blue, black) In that case go for it with your original way by, e.g., replacing col[grepl(_Anterior_nares, xlabels) grepl(_Anterior_nares, ylabels)] - mycols[4] Uwe Again, your time is really appreciated. Karthik On Jun 9, 2011, at 1:56 PM, Uwe Ligges wrote: On 09.06.2011 16:51, Karthik Kota wrote: Hi I am relatively new to R and am trying to figure out to plot 3d scatter plot using defined colors based on x-axis and y-axis values. Right now in the code below, I assign colors based on certain values in the names of the x-axis. Now if I want to extend the condition to assign a color based on the names of both x-axis and y-axis values, what should I be doing? Any help or ideas would be greatly appreciated. For e.g. in my 3 column matrix below, if I want to assign red to all the values whose first column and second column contain Anterior_nares and assign black to any other combination. Both question and answer are not really scatterplot3d related: You probably want col- ifelse(grepl(_Anterior_nares, xlabels) grepl(_Anterior_nares, ylabels), red, black) Best, Uwe Ligges Thanks! Karthik library(scatterplot3d) chd1=read.table(file=test.out, header=F, sep=\t) col=as.vector(chd1[,1]) xlabels=as.vector(chd1[,1]) ylabels=as.vector(chd1[,2]) mycols-c(red,blue,green,chocolate,orange, brown) col[grep(_Stool, xlabels) ]-mycols[1] #col[grep(_Stool, xlabels) grep(_Stool, ylabels) ]-mycols[1] col[grep(_Tongue_dorsum, xlabels) ]-mycols[2] col[grep(_Posterior_fornix, xlabels) ]-mycols[3] col[grep(_Anterior_nares, xlabels) ]-mycols[4] col[grep(_Buccal_mucosa, xlabels) ]-mycols[5] col[grep(_Supragingival_plaque, xlabels) ]-mycols[6] png(file=3dplot_test.png, w=700,h=700) scatterplot3d(chd1[, 1], chd1[, 2], chd1[, 3], main=test, xlab=sample, ylab=sample, zlab=kmers, color=col,type=p) dev.off () my test.out matrix looks something like this: A011132_Anterior_nares A011263_Anterior_nares 50130 A011132_Anterior_nares A011397_Stool 34748 A011132_Anterior_nares A012291_Tongue 40859 A011132_Anterior_nares A012663_Buccal_mucosa 76213 A011132_Anterior_nares A013155_Anterior_nares 36841 A011132_Anterior_nares A013269_Anterior_nares 45619 A011132_Anterior_nares A013637_Anterior_nares 56995 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge large number of raster objects
On 10/06/11 14:57, Ben Zaitchik wrote: Hello, I have a large number of raster objects in memory, with names RH100, RH101, RH102, etc. (myobjects - ls(pattern='^RH')). These rasters are sections of a continuous map, and I would like to combine them using the RasterObject merge tool in package 'raster.' Merge expects an input list of raster objects (outmap-merge(x, y, ...), where x, y, and ... are raster objects). I can run the command successfully if I type in every raster object that I want to merge, but this is impractical for the large number of objects I'm combining. I would like to apply merge to a list of object names defined using some kind of wildcard-based list command, but I'm struggling to find the right data types in R. Is there some way to convert a vector of strings (e.g., as.vector(ls(pattern='^RH'))) to a vector of object names (as.names??) that could be specified as the input to the merge function? What I'd really like to do is something like outmap-merge(myobjects[1:40]), in order to merge the 40 raster objects, but I recognize that it might not be so simple. Advance apologies if this is already dealt with on the help list . . . I've gone in circles on wildcard, lapply, and names threads and haven't managed to get anything to work. I think that outmap - do.call(merge,myobjects) should work for you. cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scatterplot3d - help assign colors based on multiple conditions
On 06/10/2011 06:40 AM, Karthik Kota wrote: Thanks a lot! This is very helpful. If I have to extend this to one more condition say assign blue if both the corresponding labels have _Tongue_dorsum, is there a straight forward function. Doing something like below is overriding the colors assigned by the first statement. col- ifelse(grepl(_Anterior_nares, xlabels) grepl(_Anterior_nares, ylabels), red, black) col- ifelse(grepl(_Tongue_dorsum, xlabels) grepl(_Tongue_dorsum, ylabels), blue, black) Hi Karthik, Your problem is that you are assigning all of the values to col twice. If you reverse the order of the statements you will get the red/black color set. Try this: col-rep(black,dim(cdh1)[1]) col[grepl(_Anterior_nares, xlabels) grepl(_Anterior_nares, ylabels)]-red col[grepl(_Tongue_dorsum, xlabels) grepl(_Tongue_dorsum, ylabels)]-blue If your two conditions specify disjunct sets, that is, no case satisfies both conditions, you should have the correct vector of colors. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lattice plot query
Dear Dennis Thanks for the comments, I read thru the script that was originally created and gathered that the issue was because of the use of index.cond in the xyplot function. I modified the script to drop the index.cond parameter in the function and now it works to give the subject order as indicated by ordered levels of the Subj factor. Following this however, I have another requirement for which i am seeking the help of R Group again. I am providing the illsutrative data and the script to run the lattice plot. What i require is to get the individual panels in each lattice plot carry a red or a blue background when the slope of the fitted line is -ve or +ve respectively. Woule be greatful to receive help on how this could be acheived. library(lattice) MyData-structure(list(Subj = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c(S1, S2, S3, S4), class = factor), Product = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(a, b), class = factor), Arm = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(L, R), class = factor), Attribute = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(A, B), class = factor), Time = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c(T0, T1), class = factor), value = c(44, 33, 31, 34, 36, 33, 38, 32, 33, 35, 44, 38, 33, 36, 37, 34, 32, 35, 37, 33, 35, 41, 39, 40, 35, 32, 32, 44, 42, 30, 43, 38, 13, 13, 14, 12, 15, 11, 15, 14, 10, 15, 11, 15, 15, 10, 13, 13, 15, 11, 11, 12, 14, 10, 13, 13, 10, 12, 10, 13, 13, 15, 13, 10)), .Names = c(Subj, Product, Arm, Attribute, Time, value), row.names = c(NA, 64L), class = data.frame, na.action = structure(65:352, .Names = c(65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352), class = omit)) MyData$Product-factor(MyData$Product, levels = c(a,b)) MyData$Time = factor(MyData$Time,levels=c(T0,T1)) MyData$Subj - factor(MyData$Subj, levels = c(paste('S', 1:4, sep = ''))) MyData-MyData Attributeindata-levels(MyData$Attribute) productindata-levels(MyData$Product) MyData-MyData[ order(MyData$Subj) ,] pdf(file=D:/latticeplotsbysubject.pdf) for(i in 1:length(Attributeindata)) { data-subset(MyData,Attribute==Attributeindata[i]) for(j in 1:length(productindata)) { data2-subset(data,Product==productindata[j]) print(xyplot(value ~ Time | Subj, data=data2,layout = c(6,3), aspect = xy,as.table = TRUE, type = c(g, p, r),drop.unused.levels=TRUE,ylab = paste(value of,Attributeindata[i],for
Re: [R] Calculating a mean based on a factor range
On 06/10/2011 08:10 AM, Sam Albers wrote: Hello all, I have been using an instrument that collects a temperature profile of a water column. The instrument records the temperature and depth any time it takes a reading. I was sampling many times at discrete depth rather than a complete profile of the water column (e.g. interested in 5m, 10m and 20m depth position only). The issue was that these measurement were taken with the instrument hanging off the side of a boat so a big enough wave moved the instrument such that it recorded a slightly different depth. For my purposes, however, this difference is negligible and I wish to consider all those different readings at close depth as a single depth. So for example: library(ggplot2) eg- read.csv(http://dl.dropbox.com/u/1574243/example_data.csv;, header=TRUE, sep=,) ## Calculating an average value from all the readings for each depth reading eg.avg- ddply(eg, c(site, depth), function(df) return(c(temp=mean(df$temperature), + num_samp=length(df$temperature) + ))) ## An example of my problem eg.avg[eg.avg$num_samp10 eg.avg==Station 3,] sitedepth temp num_samp 154 Station 3 1.09000 4.073667 30 159 Station 3 2.49744 3.950972 72 175 Station 3 7.96332 3.903188 69 208 Station 3 19.37708 4.066393 61 209 Station 3 19.54096 4.025385 13 ## So here you will notice that record 208 and 209, by my criteria, should be considered a sample at the same depth and lumped together. Yet I can't figure out a way to coerce R to calculate a mean value of temperature based on a threshold range depth (say +/- 0.25). Generally speaking this can be said to be calculating a mean (temperature) based on a factor (depth) range. Hi Sam, I suspect that you have a set of nominal depths in which you are interested. If so, I would suggest using cut as Peter suggested, but specifying the breaks like this: depth_cat-cut(depth,breaks=c(0,2.5,7.5,15,30), labels=c(0,5,10,25)) So that the values nearest to your nominal depths will be aggregated into the correct categories. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove Duplicate numbers from a vector through witing set of commands
?duplicated Sent from my iPad On Jun 10, 2011, at 1:22, Z B z20@googlemail.com wrote: Hi, I want to write a set of commands instead of unique(x) to remove duplicate numbers from a vector. Would you please help me. I don't know how to compare a vector numbers together. I am new to R. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error with geomap in googleVis
Hi Mike, I believe you are trying to put two charts on the same page. The demo(googleVis) provides you with an example for this. So applying the same approach to your data sets would require: library(googleVis) df-data.frame(foo=c(Brazil,Canada), bar=c(123,456)) map1-gvisGeoMap(df,locationvar='foo',numvar='bar') m-gvisMotionChart(Fruits,idvar=Fruit,timevar=Year) ## Build a new gvis object based on the exsiting ones ComboChart - structure(list(type='GeoMapMotionChart', chartid='myComboChart', html=list(header=map1$html$header, chart=list(map1$html$chart, # Geo Map m$html$chart), # Motion Chart caption=Combination of various googleVis outputs BR /, footer=map1$html$footer)), class=c(gvis, list) ) plot(ComboChart) I hope this helps. Kind regards, Markus -Original Message- From: Mike Marchywka [mailto:marchy...@hotmail.com] Sent: 09 June 2011 19:12 To: Gesmann, Markus; mjphi...@tpg.com.au; r-h...@stat.math.ethz.ch Subject: RE: [R] error with geomap in googleVis I still got blanks with Firefox with the two examples below, I put html up here if you want to look at it, http://98.129.232.232/xxx.html I just downloaded googlevis from mirror 68 and it claimed it was 0.2.5 ( I thought, but maybe I should check again). install.packages(googleVis,dep=T) library(googleVis) df-data.frame(foo=c(Brazil,Canada), bar=c(123,456)) map1-gvisGeoMap(df,locationvar='foo',numvar='bar') cat(map1$html$header,filename=xxx.html,append=F) cat(map1$html$header,file=xxx.html,append=F) cat(map1$html$chart,file=xxx.html,append=T) cat(map1$html$caption,file=xxx.html,append=T) cat(map1$html$footer,file=xxx.html,append=T) m-gvisMotionChart(Fruits,idvar=Fruit,timevar=Year) str(m) cat(m$html$header,file=xxx.html,append=F) cat(m$html$chart,file=xxx.html,append=T) cat(m$html$caption,file=xxx.html,append=T) cat(m$html$footer,file=xxx.html,append=T) Subject: RE: [R] error with geomap in googleVis Date: Thu, 9 Jun 2011 14:06:22 +0100 From: markus.gesm...@lloyds.com To: marchy...@hotmail.com; mjphi...@tpg.com.au; r-h...@stat.math.ethz.ch Hi all, This issue occurs with googleVis 0.2.4 and RJSONIO 0.7.1. Version 0.2.5 of the googleVis package has been uploaded to CRAN two days ago and should have fixed this issue. Can you please try to update to that version, e.g. from http://cran.r-project.org/web/packages/googleVis/ Further version 0.2.5 provides new interfaces to more interactive Google charts: - gvisLineChart - gvisBarChart - gvisColumnChart - gvisAreaChart - gvisScatterChart - gvisPieChart - gvisGauge - gvisOrgChart - gvisIntensityMap Additionally a new demo 'AnimatedGeoMap' has been added which shows how a Geo Map can be animated with additional JavaScript. Thanks to Manoj Ananthapadmanabhan and Anand Ramalingam, who provided the idea and initial code. For more information and examples see: http://code.google.com/p/google-motion-charts-with-r/ I hope this helps Markus -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Mike Marchywka Sent: 09 June 2011 11:19 To: mjphi...@tpg.com.au; r-h...@stat.math.ethz.ch Subject: Re: [R] error with geomap in googleVis To: r-h...@stat.math.ethz.ch From: mjphi...@tpg.com.au Date: Wed, 8 Jun 2011 10:14:01 + Subject: Re: [R] error with geomap in googleVis SNV Krishna primps.com.sg writes: Hi All, I am unable to get the plot geomap in googleVis package. data is as follows head(index.ret) country ytd 1 Argentina -10.18 2 Australia -3.42 3 Austria -2.70 4 Belgium 1.94 5 Brazil -7.16 6 Canada 0.56 map1 = gvisGeoMap(index.ret,locationvar = 'country', numvar = 'ytd') plot(map1) But it just displays a blank page, showing an error symbol at the right bottom corner. I tried demo(googleVis), it also had a similar problem. The demo showed all other plots/maps except for those geomaps. Could any one please hint me what/where could be the problem? Many thanks for the idea and support. I had never used this until yesterday but it seems to generate html. I didn't manage to get a chart to display but if you are familiar with this package and html perhaps you could look at map1$html and see if anything is obvious. One great thing about html/js is that it is human readable and you can integrate it well with other page material without much in the way of special tools. Regards, SNV Krishna [[alternative HTML version deleted]] Hi All, I have also encountered this problem. I have tested the problem in Windows XP 3.0. I have latest java and flash and I have tried
Re: [R] error with geomap in googleVis
Subject: RE: [R] error with geomap in googleVis Date: Fri, 10 Jun 2011 11:06:47 +0100 From: markus.gesm...@lloyds.com To: marchy...@hotmail.com; r-h...@stat.math.ethz.ch Hi Mike, I believe you are trying to put two charts on the same page. The demo(googleVis) provides you with an example for this. So applying the same approach to your data sets would require: Thanks, I thought however I tried to different examples the first being my own which failed and then the Fruits example as a second test. I was lazy and used install package from probably wrong mirror as in the past on 'dohs I had problems with CMD INSTALL. I reloaded as per your suggestion and tried first example. The output did change but it is still lacking a chart. Do I need API key or something? Ive never used google visualization before and was just reacting to OP as I had wanted to try it. Process is below, final result may be here, http://98.129.232.232/xxx.html , but I will be varying as I get time. Thanks again. --2011-06-10 05:17:47-- http://cran.r-project.org/src/contrib/googleVis_0.2.5.t ar.gz Resolving cran.r-project.org... 137.208.57.37 Connecting to cran.r-project.org|137.208.57.37|:80... connected. HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Fri, 10 Jun 2011 10:17:47 GMT Server: Apache/2.2.9 (Debian) Last-Modified: Tue, 07 Jun 2011 06:35:56 GMT ETag: a83451-b36e9-4a5196ea64b00 Accept-Ranges: bytes Content-Length: 734953 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: application/x-gzip Length: 734953 (718K) [application/x-gzip] Saving to: `googleVis_0.2.5.tar.gz' 100%[==] 734,953 527K/s in 1.4s 2011-06-10 05:17:49 (527 KB/s) - `googleVis_0.2.5.tar.gz' saved [734953/734953] [marchywka@351915-www1 downloads]$ R CMD INSTALL googleVis_0.2.5.tar.gz * installing to library `/usr/local/lib64/R/library' * installing *source* package `googleVis' ... ** R ** data ** moving datasets to lazyload DB ** demo ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ... ** testing if installed package can be loaded * DONE (googleVis) [marchywka@351915-www1 downloads]$ R R version 2.13.0 (2011-04-13) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-unknown-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. library(googleVis) df-data.frLoading required package: RJSONIO ame(foo=c(BrazilTo suppress the following message use the statement: suppressPackageStartupMessages(library(googleVis)) Welcome to googleVis version 0.2.5 Type ?googleVis to access the overall documentation and vignette('googleVis') for the package vignette. You can execute the demo of the package via: demo(googleVis) More information is available on the googleVis project web-site: http://code.google.com/p/google-motion-charts-with-r/ Please read also the Google Visualisation API Terms of Use: http://code.google.com/apis/visualization/terms.html Feel free to send us an email rvisualisat...@gmail.com if you would like to be keept informed of new versions, or if you have any feedback, ideas, suggestions or would like to collaborate. ,C df-data.frame(foo=c(Brazil,Canada), bar=c(123,456)) map1-gvisGeoMap(df,locationvar='foo',numvar='bar') cat(map1$html) Error in cat(list(...), file, sep, fill, labels, append) : argument 1 (type 'list') cannot be handled by 'cat' cat(map1$html$header) !DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Strict//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd; html xmlns=http://www.w3.org/1999/xhtml; head titleGeoMapID23b504e5/title meta http-equiv=content-type content=text/html;charset=utf-8 / style type=text/css body { color: #44; font-family: Arial,Helvetica,sans-serif; font-size: 75%; } a { color: #4D87C7; text-decoration: none; } /style /head body cat(map1$html$header,file=xxx.html,appeand=F) cat(map1$html$chart,file=xxx.html,appeand=t) Error in cat(list(...), file, sep, fill, labels, append) : argument 2 (type 'closure') cannot be handled by 'cat' str(map1$html) List of 4 $ header : chr !DOCTYPE html PUBLIC \-//W3C//DTD XHTML 1.0 Strict//EN\\n \http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\;\nht| __truncated__ $ chart : Named chr [1:7] !-- GeoMap generated in R 2.13.0 by googleVis
Re: [R] Logistic Regression
Which statistical principles are you invoking on which to base such analyses? Frank Sergio Della Franca wrote: Dear R-Helpers, I want to perform a logistic regression on my dataset (y). I used the following code: logistic-glm(formula=interest_variable~.,family = binomial(link = logit), data = y) This run correctly. Then i want to develop the logistic regression with three different method: -forward -backward -stepwise I used these procedure: forward-step(logistica,direction=forward) backward-step(logistica,direction=backward) stepwise-step(logistica,direction=both) Even these run correctly, but i obtained the same results with the three different procedures. Then I tought i made some mistakes. My question is: Is correct what i did? Is correct that three different methods return the same results? If i made some mistakes, what is the correct method to correctly perform the three different logistics regression? Thank you in advance. Sergio Della Franca. [[alternative HTML version deleted]] __ r-h...@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Logistic-Regression-tp821301p3588135.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] k-nn hierarchical clustering
Hi, Yes, check for function ann in package yaImpute. Regards, Carlos Ortega www.qualityexcellence.es On Thu, Jun 9, 2011 at 1:51 PM, Christian Hennig chr...@stats.ucl.ac.ukwrote: Hi there, is there any R-function for k-nearest neighbour agglomerative hierarchical clustering? By this I mean standard agglomerative hierarchical clustering as in hclust or agnes, but with the k-nearest neighbour distance between clusters used on the higher levels where there are at least k1 distances between two clusters (single linkage is 1-nearest neighbour clustering)? Best regards, Christian *** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: what is the mistake?? the coding still not function. no result display
Hi [R] what is the mistake?? the coding still not function. no result display #lda.r # #Author:Amsha Nahid, Jairus Bowne, Gerard Murray #Purpose:Perform Linear Discriminant Analysis (LDA) # #Input:Data matrix as specified in Data-matrix-format.pdf #Output:LDA plot # #Notes:Missing values (if any) are replaced by the half of the lowest # value in the entire data matrix. # #Load necessary libraries, and install them if they are missing # tryCatch(library(MASS), error=function(err) + # if this produces an error: + install.packages(MASS,repos=http://cran.ms.unimelb.edu.au/;)) # #Prepare the data matrix # # Read in the .csv file data-read.csv(C:/Users/nadya/Desktop/praktikal UTM/TASK2/new(40data)S2 100 EMS EPI 300-399.csv, sep=,, row.names=1, header=TRUE) # Get groups information groups-data[,1] # Remove groups for data processing lda_data-data[,-1] # Replace any missing values (see Notes) lda_data[is.na(lda_data)]-0.5*(min(lda_data,na.rm=TRUE)) # #Perform the LDA # lda_result-lda(lda_data,groups) Error in lda.default(x, grouping, ...) : variables 1 3 5 8 10 15 17 20 27 29 34 appear to be constant within groups The problem seems to be with your data, which in some groups are constant. Try split(lda_data, groups) or sapply(split(lda_data, groups), length) and see your result. Regards Petr # #Generate the figures (on screen) # #Image generation - function definition pic_onscr-function(matrix, title, cex_val=1) + {x11() + par(mgp=c(5,2,0), # axis margins + # (title, labels, line) + mar=c(7,4,4,2), # plot margins (b,l,t,r) + las=1) # horizontal labels + plot(matrix,# data to plot + cex=cex_val,# font size + dimen=2 # dimensions to plot + ) + title(main=title) # title of plot + } # Plot LDA scores with sample names pic_onscr(lda_result,Linear Discriminant Analysis) Error in plot(matrix, cex = cex_val, dimen = 2) : error in evaluating the argument 'x' in selecting a method for function 'plot': Error: object 'lda_result' not found # For plotting with larger font size, use a different value of cex: # pic_onscr(lda_result, LDA Plot, dimen=2, cex=3) # #Generate figures as image files # #(Uncomment blocks as necessary) # jpeg # # pic_jpg-function(filename, matrix, title, cex_val=1) # {# Start jpeg device with basic settings # jpeg(filename, # quality=100,# image quality (percent) # bg=white, # background colour # res=300,# image resolution (dpi) # units=in, width=8.3, height=5.8 # image dimensions (inches) # ) # par(mgp=c(5,2,0), # axis margins # # (title, labels, line) # mar=c(7,4,4,2), # plot margins (b,l,t,r) # las=1 # horizontal labels # ) # # Draw the plot # plot(matrix,# data to plot # cex=cex_val,# font size # dimen=2 # dimensions to plot # ) # title(main=title) # title of plot # # dev.off() # } # pic_jpg(LDA.jpg, lda_result, Linear Discriminant Analysis) # end jpeg # # png # # pic_png-function(filename, matrix, title, cex_val=1) # {# Start png device with basic settings # png(filename, # bg=white, # background colour # res=300,# image resolution (dpi) # units=in, width=8.3, height=5.8 # image dimensions (inches) # ) # par(mgp=c(5,2,0), # axis margins # # (title, labels, line) # mar=c(7,4,4,2), # plot margins (b,l,t,r) # las=1 # horizontal labels # ) # # Draw the plot # plot(matrix,# data to plot # cex=cex_val,# font size # dimen=2 # dimensions to plot # ) # title(main=title) # title of plot # # dev.off() # } # pic_png(LDA.png, lda_result, Linear
[R] Odp: How to subset based on column name that is a number ?
Hi [R] How to subset based on column name that is a number ? Hi, I have a data frame with column names 1, 2, 3, ... and I'd like to extract a subset based on the values in the first column. None of the methods I tried worked (below). x - subset(dframe, 1 = = My Text) x - subset(dframe, 1 = = My Text) x - subset(dframe, names(dframe)[1] = = My Text) Q - dframe[1 = = FY11_Q4,] Q - dframe['1'==FY11_Q4,] Q - dframe[names(dframe)[1]==FY11_Q4,] Did you try dframe[dframe[,1]==My Text,] Regards Petr Might anyone have a suggestion? Many thanks, Mauricio [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge large number of raster objects
Thank you Henrik and Rolf. My migraine just disappeared. : ) myobjects = ls(pattern='^RH') dataobj = mget(myTemp,.GlobalEnv) map = Reduce(merge,dataobj) did the trick. do.call() didn't cut it for some reason . . . it returned an 'x' is missing error. But Reduce works quite nicely for the rasters I'm working with. Thanks again, Ben On 6/10/2011 2:06 AM, Henrik Bengtsson wrote: ...and ?mget for retrieving multiple objects as a list. /Henrik On Thu, Jun 9, 2011 at 10:32 PM, Henrik Bengtssonh...@biostat.ucsf.edu wrote: See ?get and ?do.call. That should be enough. ?Reduce may be an alternative for do.call(), but could also be less memory efficient. My $.02 /Henrik On Thu, Jun 9, 2011 at 7:57 PM, Ben Zaitchikzaitc...@jhu.edu wrote: Hello, I have a large number of raster objects in memory, with names RH100, RH101, RH102, etc. (myobjects- ls(pattern='^RH')). These rasters are sections of a continuous map, and I would like to combine them using the RasterObject merge tool in package 'raster.' Merge expects an input list of raster objects (outmap-merge(x, y, ...), where x, y, and ... are raster objects). I can run the command successfully if I type in every raster object that I want to merge, but this is impractical for the large number of objects I'm combining. I would like to apply merge to a list of object names defined using some kind of wildcard-based list command, but I'm struggling to find the right data types in R. Is there some way to convert a vector of strings (e.g., as.vector(ls(pattern='^RH'))) to a vector of object names (as.names??) that could be specified as the input to the merge function? What I'd really like to do is something like outmap-merge(myobjects[1:40]), in order to merge the 40 raster objects, but I recognize that it might not be so simple. Advance apologies if this is already dealt with on the help list . . . I've gone in circles on wildcard, lapply, and names threads and haven't managed to get anything to work. Thank you, Ben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic Regression
First a word of caution: Forward, backward, and stepwise regression analyses are not well received among statisticians. There are many reasons for this. Some of the reasons include: (1) The p value computed at each step is computed ignoring all the previous steps. This can lead to incorrect inferences. The p value should be conditioned (i.e. computed taking into account) all the previous steps, i.e. the path used to get to the current model. (2) When entering interaction terms, the three methods do not make sure that the main effects included in the interaction are in the model before the interaction is added. (3) The analysis strategy substitutes the modeler's knowledge of the problem at hand for a thoughtless mechanical procedure. (4) When your data are colinear (i.e. there is significant correlation among your independent variables) the three techniques you used may choose different models not because one model is better than the other, but rather because of colinearity. (5) None of the three techniques gives absolute truth; each can give a glimpse of truth but they can also lead to a false sense of truth, a trip down a rabbit hole if you will. Given these caveats, it remains to explain why your three analyses gave the same results. Fortunately the explanation is simple. The three analyses (one which starts with no terms in the model and then adds one at a time [forward], one that starts with all terms in the model and then removes terms one at a time [backward], and one that starts with no terms in the model and then adds and removes terms one at a time [stepwise]) all wound up with the same results. This is not a problem, it simply reflects the relations in your data. In fact the fact that the three methods all give the same result make me, at least, feel a bit better about the results you obtained with a method that is far from optimal. In general it is considered better to model your data using standard modeling techniques that make use of your knowledge of the field rather than using one of the three techniques you used. This being said, sometimes when one has many independent variables, the three techniques can help one understand what is happening, but any inference drawn from the models must be taken with many a grain of salt. John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Frank Harrell f.harr...@vanderbilt.edu 6/10/2011 7:06 AM Which statistical principles are you invoking on which to base such analyses? Frank Sergio Della Franca wrote: Dear R-Helpers, I want to perform a logistic regression on my dataset (y). I used the following code: logistic-glm(formula=interest_variable~.,family = binomial(link = logit), data = y) This run correctly. Then i want to develop the logistic regression with three different method: -forward -backward -stepwise I used these procedure: forward-step(logistica,direction=forward) backward-step(logistica,direction=backward) stepwise-step(logistica,direction=both) Even these run correctly, but i obtained the same results with the three different procedures. Then I tought i made some mistakes. My question is: Is correct what i did? Is correct that three different methods return the same results? If i made some mistakes, what is the correct method to correctly perform the three different logistics regression? Thank you in advance. Sergio Della Franca. [[alternative HTML version deleted]] __ r-h...@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Logistic-Regression-tp821301p3588135.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gam() (in mgcv) with multiple interactions
On Jun 9, 2011, at 11:35 AM, Simon Wood wrote: I think that the main problem here is that smooths are not constrained to pass through the origin, so the covariate taking the value zero doesn't correspond to no effect in the way that you would like it to. Another way of putting this is that smooths are translation invariant, you get essentially the same inference from the model y_i = f(x_i) + e_i as from y_i = f(x_i + k) + e_i (which implies that x_i=0 can have no special status). OK, I understand the translation invariance, I think, and why x_i=0 has no special status. I don't understand the consequence that you are saying follows from that, though. Onward... All mgcv does in the case of te(a) + te(b) + te(d) + te(a, b) + te(a, d) is to remove the bases for te(a), te(b) and te(d) from the basis of te(a,b) and te(a,d). Further constraining te(a,b) and te(a,d) so that te(0,b) = te(a,0) = 0 etc wouldn't make much sense (in general 0 might not even be in the range of a and b). Hmm. Perhaps my question would be better phrased as a question of model interpretation. Can I think of the smooth found for te(a) as the main effect of a? If so, should I just not be bothered by the fact that te(a,b) at b=0 has a different shape from te(a)? Or is the main effect of a really, here, te(a) + te(a,b | b=0) + te(a,d | d=0) (if my notation makes any sense, I'm not much at math)? Or is the whole concept of main effect meaningless for these kinds of models -- in which case, how do I interpret what te(a) means? Or perhaps I should not be trying to interpret a by itself; perhaps I can only interpret the interactions, not the main effects. In that case, do I interpret te(a,b) by itself, or do I need to conceptually add in te(a) to understand what te(a,b) is telling me? My head is spinning. Clearly I just don't understand what these GAM models even mean, on a fundamental conceptual level. In general I find functional ANOVA not entirely intuitive to think about, but there is a very good book on it by Chong Gu (Smoothing spline ANOVA, 2002, Springer), and the associated package gss is on CRAN. Is what I'm doing functional ANOVA? I realize ANOVA and regression are fundamentally related; but I think of ANOVA as involving discrete levels (factors) in the independent variables, like treatment groups. My independent variables are all continuous, so I would not have thought of this as ANOVA. Anyhow, OK. I will go get that book today, and see if I can figure all this out. Thanks for your help! Ben Haller McGill University http://biology.mcgill.ca/grad/ben/ Begin forwarded message: From: Simon Wood s.w...@bath.ac.uk Date: June 9, 2011 11:35:11 AM EDT To: r-help@r-project.org, rh...@sticksoftware.com Subject: Re: [R] gam() (in mgcv) with multiple interactions I think that the main problem here is that smooths are not constrained to pass through the origin, so the covariate taking the value zero doesn't correspond to no effect in the way that you would like it to. Another way of putting this is that smooths are translation invariant, you get essentially the same inference from the model y_i = f(x_i) + e_i as from y_i = f(x_i + k) + e_i (which implies that x_i=0 can have no special status). All mgcv does in the case of te(a) + te(b) + te(d) + te(a, b) + te(a, d) is to remove the bases for te(a), te(b) and te(d) from the basis of te(a,b) and te(a,d). Further constraining te(a,b) and te(a,d) so that te(0,b) = te(a,0) = 0 etc wouldn't make much sense (in general 0 might not even be in the range of a and b). In general I find functional ANOVA not entirely intuitive to think about, but there is a very good book on it by Chong Gu (Smoothing spline ANOVA, 2002, Springer), and the associated package gss is on CRAN. best, Simon On 07/06/11 17:00, Ben Haller wrote: Hi! I'm learning mgcv, and reading Simon Wood's book on GAMs, as recommended to me earlier by some folks on this list. I've run into a question to which I can't find the answer in his book, so I'm hoping somebody here knows. My outcome variable is binary, so I'm doing a binomial fit with gam(). I have five independent variables, all continuous, all uniformly distributed in [0, 1]. (This dataset is the result of a simulation model.) Let's call them a,b,c,d,e for simplicity. I'm interested in interactions such as a*b, so I'm using tensor product smooths such as te(a,b). So far so good. But I'm also interested in, let's say, a*d. So ok, I put te(a,d) in as well. Both of these have a as a marginal basis (if I'm using the right terminology; all I mean is, both interactions involve a), and I would have expected them to share that basis; I would have expected them to be constrained such that the effect of a when b=0, for one, would be the same as the effect of a when d=0, for the other. This would be just as, in a GLM with formula a*b + a*d, that
Re: [R] How to assess the accuracy of fitted logistic regression using glm
On 10.06.2011 08:54, Xiaobo Gu wrote: Hi Professor Brian, Thanks for your reply. I think there are many statisticians here, and it is somehow R related, hoping someone can help me. I have done a simple test, using a sample csv data which I post if need. donut- read.csv(file=D:/donut.csv, header = TRUE); donut[[color]]- as.factor(donut[[color]]) donut[[shape]]- as.factor(donut[[shape]]) donut[[k]]- as.factor(donut[[k]]) donut[[k0]]- as.factor(donut[[k0]]) donut[[bias]]- as.factor(donut[[bias]]) lr- glm(color ~ shape + x + y, family = binomial, data = donut); summary(lr) Call: glm(formula = color ~ shape + x + y, family = binomial, data = donut) Deviance Residuals: Min 1Q Median 3Q Max -2.1079 -0.9476 0.5086 0.7518 1.4079 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) 2.530101.65500 1.529 0.1263 shape22 0.056281.54990 0.036 0.9710 shape23 -0.745681.44813 -0.515 0.6066 shape24 -2.618961.38016 -1.898 0.0578 . shape25 -2.076481.32818 -1.563 0.1180 x -0.458851.52863 -0.300 0.7640 y -0.593111.46999 -0.403 0.6866 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 50.446 on 39 degrees of freedom Residual deviance: 42.473 on 33 degrees of freedom AIC: 56.473 Number of Fisher Scoring iterations: 4 In the Coefficients section, is Pr(|z|) the P-value for that variable, and there are a few other questions: 1. How to determine the predict power of each variables? 2. How to determine the overall performance of the fitted model, here what's the difference between and Deviance Residuals and Residual deviance? 3. How to compare Null deviance and Residual deviance? 4. What does AIC mean, and how to use this measure? 5. What does the Signif. codes section mean? To answer your question, we'd need to write half a book, at least. This cannot be answered in an e-mail message. Hence please re-read Brian Ripley's advice and try to get statistical advice from a local consultant or read elementary textbooks on the subject. Uwe Ligges Regards, Xiaobo Gu On Mon, Jun 6, 2011 at 9:59 PM, Prof Brian Ripleyrip...@stats.ox.ac.uk wrote: On Mon, 6 Jun 2011, Xiaobo Gu wrote: Hi, I am trying glm with family = binomial to do binary logistic regression, but how can I assess the accuracy of the fitted model, the summary method can print a lot of information about the returned object, such as coefficients, because statistics is not my speciality, so can you share some rule of thumb to exam the fitted model from the practical perspective. It depends entirely on why you did the fit. People have written whole books on assessing the performance of classification procedures such as binary logistic regression. For example, the residual deviance is closely related to log-probability scoring: for some purposes that is a good performance measure, for others (e.g. when you are going to threshold the predicted probabilities) it can be very misleading. In short, you need statistical advice, not R advice (the purpose of this list). Regards, Xiaobo Gu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: package.skeleton() does not create 'data' folder
Dear all, somebody please look into my problem here? Does it not contain sufficient information? Please let me know how can I post my question more complete. Thanks, -- Forwarded message -- From: Nipesh Bajaj bajaj141...@gmail.com Date: Fri, Jun 10, 2011 at 12:57 AM Subject: package.skeleton() does not create 'data' folder To: r-help@r-project.org Hi again, yesterday I mailed this query however I could not see this on the mail list. Therefore, I am reposting it again. I was using package.skeleton() function to create the skeleton of my package in windows. Here is my attempt: rm(list = ls()) setwd(F:/R_PackageBuild) package.skeleton(trial1, namespace = TRUE, code_files = F:/R_PackageBuild/trial.r) In the trial.r file, there are 2 objects, one is a function and another is data. Here they are: fn1 - Vectorize(function(x,y,z) { return(x + y +z) }, SIMPLIFY = TRUE) Data - rnorm(20) However my problem is that package.skeleton() does not create any data folder in the skeleton tree. However in the man folder there are 3 Rd files (as expected), naming: Data, fn1, trial1-package However on the contrary if my code is like below then, package.skeleton() creates data folder. fn1 - Vectorize(function(x,y,z) { + return(x + y +z) + }, SIMPLIFY = TRUE) Data - rnorm(20) setwd(F:/R_PackageBuild) package.skeleton(trial2) So is it that if I use 'code_files ' argument then, R would not create data folder? Can somebody help me what I am missing in this process? Till now, I create ___manually the data folder and within that folder manually put a RData file where only object is that 'Data'. However I believe there must be more elegant way to doing that. While searching over net to settle this issue, I found a thread with hesding 'How to create .rda file to be used in package building', however this is not answering my question. Thanks, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: package.skeleton() does not create 'data' folder
On 10/06/2011 9:01 AM, Nipesh Bajaj wrote: Dear all, somebody please look into my problem here? Does it not contain sufficient information? Please let me know how can I post my question more complete. You could try following the posting guidelines, which suggest this question belongs in the R-devel list, rather than R-help. Nevertheless, an answer below. Thanks, -- Forwarded message -- From: Nipesh Bajajbajaj141...@gmail.com Date: Fri, Jun 10, 2011 at 12:57 AM Subject: package.skeleton() does not create 'data' folder To: r-help@r-project.org Hi again, yesterday I mailed this query however I could not see this on the mail list. Therefore, I am reposting it again. I was using package.skeleton() function to create the skeleton of my package in windows. Here is my attempt: rm(list = ls()) setwd(F:/R_PackageBuild) package.skeleton(trial1, namespace = TRUE, code_files = F:/R_PackageBuild/trial.r) In the trial.r file, there are 2 objects, one is a function and another is data. Here they are: fn1- Vectorize(function(x,y,z) { return(x + y +z) }, SIMPLIFY = TRUE) Data- rnorm(20) However my problem is that package.skeleton() does not create any data folder in the skeleton tree. However in the man folder there are 3 Rd files (as expected), naming: Data, fn1, trial1-package However on the contrary if my code is like below then, package.skeleton() creates data folder. fn1- Vectorize(function(x,y,z) { + return(x + y +z) + }, SIMPLIFY = TRUE) Data- rnorm(20) setwd(F:/R_PackageBuild) package.skeleton(trial2) So is it that if I use 'code_files ' argument then, R would not create data folder? Can somebody help me what I am missing in this process? Till now, I create ___manually the data folder and within that folder manually put a RData file where only object is that 'Data'. However I believe there must be more elegant way to doing that. Why? You only call package.skeleton once in the life of your package. You will be making hundreds of other changes to the package. What's wrong with making this one? If you want to know how package.skeleton decides whether to create a data folder, just look at its source. Duncan Murdoch While searching over net to settle this issue, I found a thread with hesding 'How to create .rda file to be used in package building', however this is not answering my question. Thanks, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Setting up a State Space Model in dlm
Thank you. The new article linked below is excellent. BTW, I have had trouble getting the sample code to run.In particular, there is a variable k that does not appear to be set when it is used at line 227. Here is the line where it is called: fit - optim(par = rep(0, 2 * k), fn = logLik, method = BFGS, control = list(maxit = 500)) Here is the error message: $ R --vanilla kfas.R kfas.Rlog Error in rep.int(1, length(par)) : object 'k' not found Calls: optim - rep.int Execution halted Any advice? Thanks. Best, Michael On Fri, Jun 10, 2011 at 8:22 AM, Gavin Simpson gavin.simp...@ucl.ac.uk wrote: On Tue, 2011-06-07 at 17:24 +0100, Michael Ash wrote: This question pertains to setting up a model in the package dlm (dynamic linear models, http://cran.r-project.org/web/packages/dlm/index.html The author of the dlm package has just published a paper on state space models in R including details on setting up dlm: http://www.jstatsoft.org/v41/i04 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on selecting genes showing highest variance
# Let's say your expression data is in a matrix # named expression in which the rows are genes # and the columns are samples myvars - apply(expression,1, var,na.rm=TRUE) myvars - sort(myvars,decreasing=TRUE) myvars - myvars[1:200] expression - expression[names(myvars),] dim(expression) Also check out the genefilter package in bioconductor. You may find the bioconductor mailing list is better for questions like this one. On Tue, Jun 7, 2011 at 9:47 AM, GIS Visitor 33 gis...@gis.a-star.edu.sg wrote: Hi I have a problem for which I would like to know a solution. I have a gene expression data and I would like to choose only lets say top 200 genes that had the highest expression variance across patients. How do i do this in R? I tried x=apply(leukemiadata,1,var) x1=x[order(-1*x)] but the problem here is x and x1 are numeric data , If I choose the first 200 after sorting in descending, so I do not know how to choose the associated samples with just the numeric values. Kindly help! Regards Ap __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2 avoid automatic color selection
Hey all, I'm trying to replicate some plots with ggplot2. The problem is that I need to specify the color for every attribute (drug). If I use the code below the colors get automatically assigned but I need to plot drug1 in black drug2 in blue etc. How do I do that? q = qplot(days,vol,data = cellLine7064, color = drug, geom = c(line, point)) best regards, Immanuel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 avoid automatic color selection
See ?scale_colour_manual Best, Ista On Fri, Jun 10, 2011 at 9:55 AM, Immanuel B mane.d...@googlemail.com wrote: Hey all, I'm trying to replicate some plots with ggplot2. The problem is that I need to specify the color for every attribute (drug). If I use the code below the colors get automatically assigned but I need to plot drug1 in black drug2 in blue etc. How do I do that? q = qplot(days,vol,data = cellLine7064, color = drug, geom = c(line, point)) best regards, Immanuel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R program writing standard/practices
Dear Experts, I notice that there are different ways of writing programs. Was wondering if there is anything like a standard which could be used to write good/complete R programs, maintain quality, easy to debug, a standard/practice that can be consistent in an enterprise environment. Also, are there any beacon lights that could help navigate through lines of code, programming styles, understand the scope of a program, identify gaps etc? Thanks, Santosh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear multivariate regression with Robust error
On 6/10/2011 12:23 AM, Barjesh Kochar wrote: Dear all, i am doing linear regression with robust error to know the effect of a (x) variable on (y)other if i execute the command i found positive trend. But if i check the effect of number of (x.x1,x2,x3)variables on same (y)variable then the positive effect shwon by x variable turns to negative. so plz help me in this situation. Barjesh Kochar Research scholar You don't give any data or provide any code (as the posting guide requests) , so I have to guess that you have just rediscovered Simpson's paradox -- that the coefficient of a variable in a marginal regression can have an opposite sign to that in a joint model with other predictors. I have no idea what you mean by 'robust error'. One remedy is an added-variable plot which will show you the partial contributions of each predictor in the joint model, as well as whether there are any influential observations that are driving the estimated coefficients. -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele StreetWeb: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] giving factor names
Hi Kieran, I'm not very familiar with lattice, but here's a workaround that works for me. Basically, I just created a new data.frame column that was a factor (combo$zf), and forced its levels to be what you're looking for here. require(lattice) x-c(1,2,3) y-c(2,4,6) z-c(0.1,0.5,2) combo-expand.grid(x,y,z) combo-data.frame(combo) names(combo)-c(x,y,z) outcome-function(l) { (l[1]*l[2])/l[3] } resp-apply(combo,1,outcome) ## Create new column and assign levels combo$zf - as.factor(combo$z) levels(combo$zf) - paste(z=, levels(combo$zf), sep=) ## Now I use the new variable as the conditioning variable in the plot levelplot(resp~x*y|zf, data=combo ,pretty=TRUE,region=TRUE,contour=FALSE) In the future, it would help if you could specify the packages you're using, since I had to do a little research to find where the levelplot function is from. Hope this helps, Ethan On Tue, Jun 7, 2011 at 9:30 AM, kieran martin kieranjmar...@gmail.comwrote: Hi, I've been driving myself insane with this problem. I have a trellis plot of contours, and I want each level to have something like z=value for each one. I can get each one to say z, or each one to say the value (by using as.factor) but not both. Heres an artificial example to show what I mean (as my actual data set is much larger!) x-c(1,2,3) y-c(2,4,6) z-c(0.1,0.5,2) combo-expand.grid(x,y,z) combo-data.frame(combo) names(combo)-c(x,y,z) outcome-function(l) { (l[1]*l[2])/l[3] } resp-apply(combo,1,outcome) levelplot(resp~x*y|z,data=combo ,pretty=TRUE,region=TRUE,contour=FALSE) , so in this final graph I want the z=0.1, z=0.5 and z=2 in turn. Thanks, Kieran Martin University of Southampton [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sorting Data Frame Without Loop
Hello all! I am currently trying to sort a data frame in a particular way, but I am having some difficulties with this. Specifically I want to sort the below dataset in such a way that there is only one line per ProteinID and if there are multiple GeneID or GeneName entries for a single proteinID, that they be concatenated with a comma separating them. The way I have done it earlier worked fine for small datasets, but as I am working with around 30,000 entries, it proved too slow and I'm not sure how to do it in another way. Here is an example of the input. ProteinID GeneID GeneName M-012847-00 NM_057175 NARG1 M-012847-00 NM_057175 TBDN100 M-015544-00 NM_153008 FLJ30277 M-015544-00 NM_153027 FLJ31659 M-024202-00 NM_207330 NIPAL1 M-024202-00 NM_207330 NPAL1 Here is an example showing what I want: ProteinID GeneID GeneName M-012847-00 NM_057175 NARG1, TBDN100 M-015544-00 NM_153008, NM_153027 FLJ30277, FLJ31659 M-024202-00 NM_207330 NIPAL1, NPAL1 Here is the code I have been using so far. I have only managed to get this to work by using a loop, which I know is not the best way, but at the moment I'm stuck. colA - c(M-012847-00 ,M-012847-00 ,M-015544-00, M-015544-00, M-024202-00,M-024202-00) colB - c(NM_057175, NM_057175, NM_153008, NM_153027, NM_207330, NM_207330) colC - c( 'NARG1', 'TBDN100', 'FLJ30277', 'FLJ31659', 'NIPAL1', 'NPAL1') dupes - data.frame(ProteinID=colA, GeneID=colB, GeneName=colC) idVec - character() geneIDVec - character() geneNameVec - character() dataType - ProteinID annotation - data.frame() for (id in unique(dupes[[dataType]])) { print (id) idVec - c(idVec, id) geneIDVec - c(geneIDVec, paste(unique(dupes$GeneID[dupes[[dataType]] == id]), collapse=, )) geneNameVec - c(geneNameVec, paste(unique(dupes$GeneName[dupes[[dataType]] == id]), collapse=, )) annotation[[dataType]][annotation[[dataType]] == id] - NA } filtered - data.frame(ProteinID=idVec, GeneID=geneIDVec, GeneName=geneNameVec) Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.graph: vertex labels in GML file?
I can successfully read a GML file using read.graph (e.g. those at http://www-personal.umich.edu/~mejn/netdata/ , the football data set in particular). However, I don't seem to be able to read the vertex labels that are in the GML file; or maybe I am, but I cannot get at them? I suspect I'm not reading the original labels since the command get.vertex.attribute(net, 'name', index=V(net)) just provides a numeric list of vertex names rather than the vertex labels as they appeared in the original GML file. Any suggestions would be greatly appreciated. Cheers, =Dave __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear multivariate regression with Robust error
Date: Fri, 10 Jun 2011 09:53:20 +0530 From: bkkoc...@gmail.com To: r-help@r-project.org Subject: [R] Linear multivariate regression with Robust error Dear all, i am doing linear regression with robust error to know the effect of a (x) variable on (y)other if i execute the command i found positive trend. But if i check the effect of number of (x.x1,x2,x3)variables on same (y)variable then the positive effect shwon by x variable turns to negative. so plz help me in this situation. take y as goodness and x and x1 have something to do with a product. The first analysis is from company A, second is from company B and the underlying relationship is given with some noise LOL, ( I'm still on first cup of cofee, this was fist example to come to mind as these question keep coming up here everyday ) x=1:100 x1=x*x y-x-x1+runif(100) lm(y~x) Call: lm(formula = y ~ x) Coefficients: (Intercept) x 1718 -100 lm(y~x+x1) Call: lm(formula = y ~ x + x1) Coefficients: (Intercept) x x1 0.5253 1.0024 -1. Barjesh Kochar Research scholar __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multilevel pseudo maximum likelihood
Dear all, I posted this two years ago, getting no answers or suggestions - now I am trying again, hoping something new is available in R. I am interested in an application of linear multilevel model with unequal selection probabilities at both levels. Do you know if there is an R function for multilevel pseudo-maximum likelihood estimation? Or is it possible to obtain these estimates using the nlme package? In practice what I need is to insert in the likelihood the individual and cluster level design weights. Thanks in advance, Caterina __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to log-2 transform a matrix
CGH-as.matrix(CGH) mode(CGH)-numeric log2(CGH) -- View this message in context: http://r.789695.n4.nabble.com/How-to-log-2-transform-a-matrix-tp839455p3588441.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting the Number of Letters in a row
I'm trying to find the total number of letters in a row of a data frame. Let's say I have the following data frame. f1 - data.frame(keyword=c(I live in Denver, I live in Kansas City, MO, Pizza is good)) The following function gives me the number of characters in each string. So for I live in Denver, I get 1, 4, 2, and 6. However, I want to know the total number of characters (13). sapply(strsplit(as.character(f1$keyword), ), nchar) Thanks, Abraham [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting Data Frame Without Loop
Is this what you want: colA - c(M-012847-00 ,M-012847-00 ,M-015544-00, M-015544-00, M-024202-00,M-024202-00) colB - c(NM_057175, NM_057175, NM_153008, NM_153027, NM_207330, NM_207330) colC - c( 'NARG1', 'TBDN100', 'FLJ30277', 'FLJ31659', 'NIPAL1', 'NPAL1') dupes - data.frame(ProteinID=colA, GeneID=colB, GeneName=colC) require(sqldf) sqldf( + select ProteinID + , group_concat(GeneID) as GeneID + , group_concat(GeneName) as GeneName + from dupes + group by ProteinID + , method = 'raw') ProteinID GeneID GeneName 1 M-012847-00 NM_057175,NM_057175 NARG1,TBDN100 2 M-015544-00 NM_153008,NM_153027 FLJ30277,FLJ31659 3 M-024202-00 NM_207330,NM_207330 NIPAL1,NPAL1 On Fri, Jun 10, 2011 at 4:35 AM, hi Berven thebe...@hotmail.com wrote: Hello all! I am currently trying to sort a data frame in a particular way, but I am having some difficulties with this. Specifically I want to sort the below dataset in such a way that there is only one line per ProteinID and if there are multiple GeneID or GeneName entries for a single proteinID, that they be concatenated with a comma separating them. The way I have done it earlier worked fine for small datasets, but as I am working with around 30,000 entries, it proved too slow and I'm not sure how to do it in another way. Here is an example of the input. ProteinID GeneID GeneName M-012847-00 NM_057175 NARG1 M-012847-00 NM_057175 TBDN100 M-015544-00 NM_153008 FLJ30277 M-015544-00 NM_153027 FLJ31659 M-024202-00 NM_207330 NIPAL1 M-024202-00 NM_207330 NPAL1 Here is an example showing what I want: ProteinID GeneID GeneName M-012847-00 NM_057175 NARG1, TBDN100 M-015544-00 NM_153008, NM_153027 FLJ30277, FLJ31659 M-024202-00 NM_207330 NIPAL1, NPAL1 Here is the code I have been using so far. I have only managed to get this to work by using a loop, which I know is not the best way, but at the moment I'm stuck. colA - c(M-012847-00 ,M-012847-00 ,M-015544-00, M-015544-00, M-024202-00,M-024202-00) colB - c(NM_057175, NM_057175, NM_153008, NM_153027, NM_207330, NM_207330) colC - c( 'NARG1', 'TBDN100', 'FLJ30277', 'FLJ31659', 'NIPAL1', 'NPAL1') dupes - data.frame(ProteinID=colA, GeneID=colB, GeneName=colC) idVec - character() geneIDVec - character() geneNameVec - character() dataType - ProteinID annotation - data.frame() for (id in unique(dupes[[dataType]])) { print (id) idVec - c(idVec, id) geneIDVec - c(geneIDVec, paste(unique(dupes$GeneID[dupes[[dataType]] == id]), collapse=, )) geneNameVec - c(geneNameVec, paste(unique(dupes$GeneName[dupes[[dataType]] == id]), collapse=, )) annotation[[dataType]][annotation[[dataType]] == id] - NA } filtered - data.frame(ProteinID=idVec, GeneID=geneIDVec, GeneName=geneNameVec) Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting Data Frame Without Loop
This will give you the unique gene values: colA - c(M-012847-00 ,M-012847-00 ,M-015544-00, M-015544-00, M-024202-00,M-024202-00) colB - c(NM_057175, NM_057175, NM_153008, NM_153027, NM_207330, NM_207330) colC - c( 'NARG1', 'TBDN100', 'FLJ30277', 'FLJ31659', 'NIPAL1', 'NPAL1') dupes - data.frame(ProteinID=colA, GeneID=colB, GeneName=colC) require(sqldf) sqldf( + select ProteinID + , group_concat(distinct GeneID) as GeneID + , group_concat(distinct GeneName) as GeneName + from dupes + group by ProteinID + , method = 'raw') ProteinID GeneID GeneName 1 M-012847-00 NM_057175 NARG1,TBDN100 2 M-015544-00 NM_153008,NM_153027 FLJ30277,FLJ31659 3 M-024202-00 NM_207330 NIPAL1,NPAL1 On Fri, Jun 10, 2011 at 4:35 AM, hi Berven thebe...@hotmail.com wrote: Hello all! I am currently trying to sort a data frame in a particular way, but I am having some difficulties with this. Specifically I want to sort the below dataset in such a way that there is only one line per ProteinID and if there are multiple GeneID or GeneName entries for a single proteinID, that they be concatenated with a comma separating them. The way I have done it earlier worked fine for small datasets, but as I am working with around 30,000 entries, it proved too slow and I'm not sure how to do it in another way. Here is an example of the input. ProteinID GeneID GeneName M-012847-00 NM_057175 NARG1 M-012847-00 NM_057175 TBDN100 M-015544-00 NM_153008 FLJ30277 M-015544-00 NM_153027 FLJ31659 M-024202-00 NM_207330 NIPAL1 M-024202-00 NM_207330 NPAL1 Here is an example showing what I want: ProteinID GeneID GeneName M-012847-00 NM_057175 NARG1, TBDN100 M-015544-00 NM_153008, NM_153027 FLJ30277, FLJ31659 M-024202-00 NM_207330 NIPAL1, NPAL1 Here is the code I have been using so far. I have only managed to get this to work by using a loop, which I know is not the best way, but at the moment I'm stuck. colA - c(M-012847-00 ,M-012847-00 ,M-015544-00, M-015544-00, M-024202-00,M-024202-00) colB - c(NM_057175, NM_057175, NM_153008, NM_153027, NM_207330, NM_207330) colC - c( 'NARG1', 'TBDN100', 'FLJ30277', 'FLJ31659', 'NIPAL1', 'NPAL1') dupes - data.frame(ProteinID=colA, GeneID=colB, GeneName=colC) idVec - character() geneIDVec - character() geneNameVec - character() dataType - ProteinID annotation - data.frame() for (id in unique(dupes[[dataType]])) { print (id) idVec - c(idVec, id) geneIDVec - c(geneIDVec, paste(unique(dupes$GeneID[dupes[[dataType]] == id]), collapse=, )) geneNameVec - c(geneNameVec, paste(unique(dupes$GeneName[dupes[[dataType]] == id]), collapse=, )) annotation[[dataType]][annotation[[dataType]] == id] - NA } filtered - data.frame(ProteinID=idVec, GeneID=geneIDVec, GeneName=geneNameVec) Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting the Number of Letters in a row
something like this: sapply(strsplit(as.character( f1$keyword), ), function(x){sum(nchar(x))}) hth, Ingmar On Fri, Jun 10, 2011 at 6:48 PM, Abraham Mathew abra...@thisorthat.comwrote: I'm trying to find the total number of letters in a row of a data frame. Let's say I have the following data frame. f1 - data.frame(keyword=c(I live in Denver, I live in Kansas City, MO, Pizza is good)) The following function gives me the number of characters in each string. So for I live in Denver, I get 1, 4, 2, and 6. However, I want to know the total number of characters (13). sapply(strsplit(as.character(f1$keyword), ), nchar) Thanks, Abraham [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting the Number of Letters in a row
Hi Abraham, Try foo - function(x){ x - as.character(x) sapply(strsplit(x, ), function(s) sum(nchar(s))) } foo(f1$keyword) HTH, Jorge On Fri, Jun 10, 2011 at 12:48 PM, Abraham Mathew wrote: I'm trying to find the total number of letters in a row of a data frame. Let's say I have the following data frame. f1 - data.frame(keyword=c(I live in Denver, I live in Kansas City, MO, Pizza is good)) The following function gives me the number of characters in each string. So for I live in Denver, I get 1, 4, 2, and 6. However, I want to know the total number of characters (13). sapply(strsplit(as.character(f1$keyword), ), nchar) Thanks, Abraham [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotting NLS profiles
Hello list, I'm trying to plot nls profiles, but the plot.profile.nls function in R doesn't seem to accept any plot.default variables. Specifically, I'd like to be able to change the x-axis title and the colors to black and white. Has anyone had any luck with this? If not, is there a way to override to plotting colors, perhaps in par()? Thanks, Sam fm1 - nls(demand ~ SSasympOrig(Time, A, lrc), data = BOD) pr1 - profile(fm1, alpha = 0.05) opar - par(mfrow = c(2,2), oma = c(1.1, 0, 1.1, 0), las = 1) plot(pr1, conf = c(95, 90, 80, 50)/100) # works fine plot(pr1, conf = c(95, 90, 80, 50)/100,xlab=expression(theta),col=1) # doesn't change __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Double x grid in ggplot2
I am trying to overlay raw data with a boxplot as follows: pp = qplot(factor(time, levels=0:60, ordered=TRUE), error, data=dfsub, size=I(1), main = title, ylab=Error (min), xlab=Time before ON (min), alpha=I(1/10), ylim=c(-30,40),geom=jitter) + facet_wrap(~ runway, ncol=2) + geom_boxplot(alpha=.5, color=blue, outlier.colour=green, outlier.size=1) + scale_x_discrete(breaks = seq(from=0, to=60, by=10)) print(pp) But I think ggplot2 is getting confused about factors versus numbers sapply(dfsub,class) time errorrunwayflight numeric numeric factor factor because when I do the plot, the vertical grid lines are in pairs, one line at (0, 10, 20,...) and the other at (-1, 9, 19,) If I remove the scale_x_discrete, I get a grid line at every minute, and things look right, but the labels for the minutes all overlap. Thanks for the help, Jim Rome __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Item analysis
Here are two different ways with your data as the data.frame xx # Basic R aggregate(xx$Tip, list(xx$Time), sum) # Using the rshape package library(reshape2) yy - melt(xx, id=c(Time), measure.vars=c(Tip)) dcast(yy , Time ~ variable , sum) --- On Thu, 6/9/11, Trying To learn again tryingtolearnag...@gmail.com wrote: From: Trying To learn again tryingtolearnag...@gmail.com Subject: [R] Item analysis To: r-help@r-project.org Received: Thursday, June 9, 2011, 4:41 PM Hi all, For several reasons I have no used CARN R in monthsI have an idea and I want to retry to learn CRAN R. I know I need to formulate more intelligent questions but I will expose and if someone can help me I would be very gratefull I promise to try to learn again The question I have a data file like this: Date, Time, Tip 13/11/2008,23:16:00,432 13/01/2009,23:17:00,633 13/11/2009,23:16:00,134 13/12/2009,23:17:00,234 13/01/2010,23:16:00,111 I want to make an statistic (the sum of tip) but to extract one sum by each different minute in Time, so you see, in this easy example I will have a final table with two items: Item Sum 16 (432+134+111) 17 (633+234) Of course in my file (is bigger) I have minutes from 0 to 59. Would it be too much difficult if I want this analysis but on table for each day (indentified in the column Date) of the week. So you see on 13/11/2008 was thursday so you go each day and sum if this day is thrusday from 0 to 59 minutes... MAny thanks for all, each tip you give me I will send thousands of thanks¡ [[alternative HTML version deleted]] -Inline Attachment Follows- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem loading packages in R 2.13.0 on Mac
I am having problem loading packages in the newest version of R (2.13.0) on my Mac. I have tried to install various packages (e.g. lawstat, Rcmdr, car) and load them using the Package Installer and Package Manager menu options but I get the follow error: library(lawstat) Loading required package: mvtnorm Error: package 'mvtnorm' could not be loaded In addition: Warning message: In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = lib.loc) : there is no package called 'mvtnorm' When I click on the box for loading lawstat in the Package Manager, it immediately reverts back to an unchecked box. I have tried to load mvtnorm and then load lawstat but I get the same error. Help? Thanks, Adrienne __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R program writing standard/practices
Check this out: http://www1.maths.lth.se/help/R/RCC/ -- View this message in context: http://r.789695.n4.nabble.com/R-program-writing-standard-practices-tp3588716p3588911.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Test if data uniformly distributed (newbie)
Hello, I have a bunch of files containing 300 data points each with values from 0 to 1 which also sum to 1 (I don't think the last element is relevant though). In addition, each data point is annotated as an a or a b. I would like to know in which files (if any) the data is uniformly distributed. I used Google and found out that a Kolmogorov-Smirnov or a Chi-square goodness-of-fit test could be used. Then I looked up ?kolmogorov and found ks.test, but the example there is for the normal distribution and I am not sure how to adapt it for the uniform distribution. I did ?runif and read about the uniform distribution but it doesn't say what the cumulative distribution is. Is it punif, like pnorm? I thought of that because I found a message on this list where someone was told to use pnorm instead of dnorm. But the help page on the uniform distribution says punif is the distribution function. Are the cumulative distribution and the distribution function the same thing? Having several names for the same thing has always confused me very much in statistics. Also, I am not sure whether I need to specify any parameters for the distribution and which. I thought maybe I should specify min=0 and max=1 but those appear to be the defaults. Do I need to specify q, the vector of quantiles? So is ks.test(x, punif) correct or not for what I am attempting to do? After this I will also need to find out whether the a's and b's are distributed randomly in each file. I would be greatful for any pointers although I have not researched this issue yet. Kairavi. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting NLS profiles
On Fri, 10 Jun 2011, R Help wrote: Hello list, I'm trying to plot nls profiles, but the plot.profile.nls function in R doesn't seem to accept any plot.default variables. Specifically, I'd like to be able to change the x-axis title and the colors to black and white. Has anyone had any luck with this? If not, is there a way to override to plotting colors, perhaps in par()? No. The authors hardcoded all these. Take a copy of the function and modify it to suit your purposes. Thanks, Sam fm1 - nls(demand ~ SSasympOrig(Time, A, lrc), data = BOD) pr1 - profile(fm1, alpha = 0.05) opar - par(mfrow = c(2,2), oma = c(1.1, 0, 1.1, 0), las = 1) plot(pr1, conf = c(95, 90, 80, 50)/100) # works fine plot(pr1, conf = c(95, 90, 80, 50)/100,xlab=expression(theta),col=1) # doesn't change __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting NLS profiles
R Help wrote: I'm trying to plot nls profiles, but the plot.profile.nls function in R doesn't seem to accept any plot.default variables. Package nlstools has a few nice graphical display methods. Dieter -- View this message in context: http://r.789695.n4.nabble.com/Plotting-NLS-profiles-tp3589020p3589119.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting NLS profiles
On Fri, 10 Jun 2011, Prof Brian Ripley wrote: On Fri, 10 Jun 2011, R Help wrote: Hello list, I'm trying to plot nls profiles, but the plot.profile.nls function in R doesn't seem to accept any plot.default variables. Specifically, I'd like to be able to change the x-axis title and the colors to black and white. Has anyone had any luck with this? If not, is there a way to override to plotting colors, perhaps in par()? No. The authors hardcoded all these. Take a copy of the function and modify it to suit your purposes. But as they hardcoded the colours to numbers, see ?palette . Thanks, Sam fm1 - nls(demand ~ SSasympOrig(Time, A, lrc), data = BOD) pr1 - profile(fm1, alpha = 0.05) opar - par(mfrow = c(2,2), oma = c(1.1, 0, 1.1, 0), las = 1) plot(pr1, conf = c(95, 90, 80, 50)/100) # works fine plot(pr1, conf = c(95, 90, 80, 50)/100,xlab=expression(theta),col=1) # doesn't change -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scatterplot3d - help assign colors based on multiple conditions
Thanks! That did the trick. On Jun 10, 2011, at 3:16 AM, Jim Lemon wrote: On 06/10/2011 06:40 AM, Karthik Kota wrote: Thanks a lot! This is very helpful. If I have to extend this to one more condition say assign blue if both the corresponding labels have _Tongue_dorsum, is there a straight forward function. Doing something like below is overriding the colors assigned by the first statement. col- ifelse(grepl(_Anterior_nares, xlabels) grepl(_Anterior_nares, ylabels), red, black) col- ifelse(grepl(_Tongue_dorsum, xlabels) grepl(_Tongue_dorsum, ylabels), blue, black) Hi Karthik, Your problem is that you are assigning all of the values to col twice. If you reverse the order of the statements you will get the red/black color set. Try this: col-rep(black,dim(cdh1)[1]) col[grepl(_Anterior_nares, xlabels) grepl(_Anterior_nares, ylabels)]-red col[grepl(_Tongue_dorsum, xlabels) grepl(_Tongue_dorsum, ylabels)]-blue If your two conditions specify disjunct sets, that is, no case satisfies both conditions, you should have the correct vector of colors. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting NLS profiles
Wow, that's really clever, I didn't know you could manipulate the palette like that. Thanks Dr Ripley, Sam On Fri, Jun 10, 2011 at 3:45 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On Fri, 10 Jun 2011, Prof Brian Ripley wrote: On Fri, 10 Jun 2011, R Help wrote: Hello list, I'm trying to plot nls profiles, but the plot.profile.nls function in R doesn't seem to accept any plot.default variables. Specifically, I'd like to be able to change the x-axis title and the colors to black and white. Has anyone had any luck with this? If not, is there a way to override to plotting colors, perhaps in par()? No. The authors hardcoded all these. Take a copy of the function and modify it to suit your purposes. But as they hardcoded the colours to numbers, see ?palette . Thanks, Sam fm1 - nls(demand ~ SSasympOrig(Time, A, lrc), data = BOD) pr1 - profile(fm1, alpha = 0.05) opar - par(mfrow = c(2,2), oma = c(1.1, 0, 1.1, 0), las = 1) plot(pr1, conf = c(95, 90, 80, 50)/100) # works fine plot(pr1, conf = c(95, 90, 80, 50)/100,xlab=expression(theta),col=1) # doesn't change -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem loading packages in R 2.13.0 on Mac
Hi Adrienne, I'm not familiar with your interface, but it sounds like R thinks the package mvtnorm is not installed. You can see the packages you've installed with: row.names(installed.packages()) Is mvtnorm in the output of that command? You could test with the command, mvtnorm %in% row.names(installed.packages()) If the result of the above command is FALSE, you can install it with: install.packages(mvtnorm) Best, Ethan On Fri, Jun 10, 2011 at 12:00 PM, Adrienne Keller adrienne.kel...@umontana.edu wrote: I am having problem loading packages in the newest version of R (2.13.0) on my Mac. I have tried to install various packages (e.g. lawstat, Rcmdr, car) and load them using the Package Installer and Package Manager menu options but I get the follow error: library(lawstat) Loading required package: mvtnorm Error: package 'mvtnorm' could not be loaded In addition: Warning message: In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = lib.loc) : there is no package called 'mvtnorm' When I click on the box for loading lawstat in the Package Manager, it immediately reverts back to an unchecked box. I have tried to load mvtnorm and then load lawstat but I get the same error. Help? Thanks, Adrienne __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem loading packages in R 2.13.0 on Mac
Adrienne Keller wrote: I am having problem loading packages in the newest version of R (2.13.0) on my Mac. I have tried to install various packages (e.g. lawstat, Rcmdr, car) and load them using the Package Installer and Package Manager menu options but I get the follow error: library(lawstat) Loading required package: mvtnorm Error: package 'mvtnorm' could not be loaded In addition: Warning message: In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = lib.loc) : there is no package called 'mvtnorm' When I click on the box for loading lawstat in the Package Manager, it immediately reverts back to an unchecked box. I have tried to load mvtnorm and then load lawstat but I get the same error. The error message would imply that the package mvtnorm is not installed. 1. This is a Mac issue. Use the R-SIG-Mac mailing list. 2. Assuming you are using the R Mac GUI, you use the Package Installer to install packages. If you want dependencies installed don't forget to check the Install dependencies checkbox in the lower right corner of the Package Installer window. Select lawstat and then click Install Selected. 3. Now you should be able to load. Berend -- View this message in context: http://r.789695.n4.nabble.com/Problem-loading-packages-in-R-2-13-0-on-Mac-tp3589079p3589219.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test if data uniformly distributed (newbie)
Yes, punif is the function to use, however the KS test (and the others) are based on an assumption of independence, and if you know that your data points sum to 1, then they are not independent (and not uniform if there are more than 2). Also note that these tests only rule out distributions (with a given type I error rate), but cannot confirm that the data comes from a given distribution (just that either they do, or there is not enough power to distinguish between the actual and the test distributions). What is your ultimate question/goal? Why do you care if the data is uniform or not? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Kairavi Bhakta Sent: Friday, June 10, 2011 11:24 AM To: r-help@r-project.org Subject: [R] Test if data uniformly distributed (newbie) Hello, I have a bunch of files containing 300 data points each with values from 0 to 1 which also sum to 1 (I don't think the last element is relevant though). In addition, each data point is annotated as an a or a b. I would like to know in which files (if any) the data is uniformly distributed. I used Google and found out that a Kolmogorov-Smirnov or a Chi-square goodness-of-fit test could be used. Then I looked up ?kolmogorov and found ks.test, but the example there is for the normal distribution and I am not sure how to adapt it for the uniform distribution. I did ?runif and read about the uniform distribution but it doesn't say what the cumulative distribution is. Is it punif, like pnorm? I thought of that because I found a message on this list where someone was told to use pnorm instead of dnorm. But the help page on the uniform distribution says punif is the distribution function. Are the cumulative distribution and the distribution function the same thing? Having several names for the same thing has always confused me very much in statistics. Also, I am not sure whether I need to specify any parameters for the distribution and which. I thought maybe I should specify min=0 and max=1 but those appear to be the defaults. Do I need to specify q, the vector of quantiles? So is ks.test(x, punif) correct or not for what I am attempting to do? After this I will also need to find out whether the a's and b's are distributed randomly in each file. I would be greatful for any pointers although I have not researched this issue yet. Kairavi. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test if data uniformly distributed (newbie)
OK, that is not the correct format for the KS test (which is expecting data ranging from 0 to 1 with a fairly flat histogram). You could possibly test this with a Chi-squared test. Can you tell us more about how the numbers you are looking at are generated? The Chi-squared test could be used on counts of 1-5 and compared to the assumption that each is equally likely, but there still is the question of power and how close to uniform is uniform enough. You would need huge samples to find a difference if the true distribution is only slightly non uniform. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 From: kairavibha...@googlemail.com [mailto:kairavibha...@googlemail.com] On Behalf Of Kairavi Bhakta Sent: Friday, June 10, 2011 2:16 PM To: Greg Snow; r-help@r-project.org Subject: RE: [R] Test if data uniformly distributed (newbie) Thanks for your answer. The reason I want the data to be uniform: It's the first step in a machine learning project I am working on. If I know the data isn't uniformly distributed, then this means there is probably something wrong and the following steps will be biased by the non-uniform input data. I'm not checking an assumption for another statistical test. Actually, the data has been normalized because it is supposed to represent a probability distribution. That's why it sums to 1. My assumption is that, for a vector of 5, the data at that point should look like 0.20 0.20 0.20 0.20 0.20, but of course there is variation, and I would like to test whether the data comes close enough or not. At the moment I am only testing whether there are more a's than b's in the top and bottom portion of the each file (with a wilcoxon test, I have 8 reps of the model I am trying to build). But that sort of felt like a very adhoc solution and I figured maybe testing for uniformity would be better, or at least a important addition. I've also been looking into testing for the randomness of the sequence of a's and b's instead of the wilcoxon test, although that may or may not involve R. Kairavi. Yes, punif is the function to use, however the KS test (and the others) are based on an assumption of independence, and if you know that your data points sum to 1, then they are not independent (and not uniform if there are more than 2). Also note that these tests only rule out distributions (with a given type I error rate), but cannot confirm that the data comes from a given distribution (just that either they do, or there is not enough power to distinguish between the actual and the test distributions). What is your ultimate question/goal? Why do you care if the data is uniform or not? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599 801.408.8111 [Hide Quoted Text] -Original Message- From: r-help-boun...@r-project.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599 [mailto:r-help-bounces@r-https://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599 project.orghttp://project.org] On Behalf Of Kairavi Bhakta Sent: Friday, June 10, 2011 11:24 AM To: r-help@r-project.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599 Subject: [R] Test if data uniformly distributed (newbie) Hello, I have a bunch of files containing 300 data points each with values from 0 to 1 which also sum to 1 (I don't think the last element is relevant though). In addition, each data point is annotated as an a or a b. I would like to know in which files (if any) the data is uniformly distributed. I used Google and found out that a Kolmogorov-Smirnov or a Chi-square goodness-of-fit test could be used. Then I looked up ?kolmogorov and found ks.test, but the example there is for the normal distribution and I am not sure how to adapt it for the uniform distribution. I did ?runif and read about the uniform distribution but it doesn't say what the cumulative distribution is. Is it punif, like pnorm? I thought of that because I found a message on this list where someone was told to use pnorm instead of dnorm. But the help page on the uniform distribution says punif is the distribution function. Are the cumulative distribution and the distribution function the same thing? Having several names for the same thing has always confused me very much in statistics. Also, I am not sure whether I need to specify any parameters for the distribution and which. I thought maybe I should specify min=0 and max=1 but those appear to be the defaults. Do I need to specify q, the vector of quantiles? So is ks.test(x, punif) correct or not for what I am attempting to do? After this I will also need to find out whether the a's and b's are distributed randomly in each file. I would be greatful for any pointers although I have not
Re: [R] 'PROC CONTENTS' = ?
See the contents function in the Hmisc package. karena wrote: Is there any function in R which does the same thing as what 'PROC CONTENTS' does in SAS? thanks, karena - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/PROC-CONTENTS-tp3589035p3589368.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] running R commands asynchronously
I am interested in running R commands asynchronously. My first choice is in the same R session that I am currently in. Here, the goal would be to run something like RunAsynchSameSession(myfunction(), outputname.rda) Once RunAsynchSameSession had started myfunction(), RunAsynchSameSession would complete immediately. myfunction would keep going. It is OK if execution of the myfunction() command prevents new input to R until it has completed. The important feature is that RunAsynchSameSession must tell the progam that called it that it was done. Second choice is to start an independent process, BATCH or something similar, and save the resulting data objects in an .rda file. RunAsynchBatch(myfile.r, outputname.rda) The RunAsynchBatch would start a batch process and complete immediately after starting the batch file. The batch file would run independently until it was completed. While I know how to do this, for example with system(wait=FALSE), I would appreciate seeing a worked collection of statements, including getting outputname.rda back to the original R session. I am working on Windows. Rich [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do I make proper use of the by() function?
Dear list, I have a function that uses values from two vectors and spits out one new vector based on all the values of the two original vectors, and with the same length as them. Ok, I would now like to be able to apply that function simply on two columns in a data frame, divided by the levels of factors in the data frame. I'm trying to use by() for this, but the output is too hard to use. What I am doing is this: by(df, list(df$Factor1,df$Factor2),function(x) my_function(x$col1,x$col2),simplify=TRUE) and the output is too complex to be used in a simple way. Actually, I just want something like a data frame, where the results vectors are placed in one column and the conditions under which they were produced (i.e. the values of the factors according to which the data set were divided) in other columns. This does not seem to be provided by by(), and aggregate only provides me with the ability to use values from one column, right? So, are there other functions I could use? Thanks! /Fredrik -- Life is like a trumpet - if you don't put anything into it, you don't get anything out of it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Double x grid in ggplot2
This toy example works as you seem to expect: library(ggplot2) dat - data.frame(time = factor(rep(0:59, each = 100), levels = 0:59), error = rnorm(6000)) qplot(time, error, data = dat, size = I(1), main = title, ylab=Error (min), xlab=Time before ON (min), alpha=I(1/10), geom=jitter) + geom_boxplot(alpha=.5, color=blue, outlier.colour=green, outlier.size=1) + scale_x_discrete(breaks = seq(from=0, to=60, by=10)) sessionInfo() R version 2.13.0 Patched (2011-04-19 r55523) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets grid methods [8] base other attached packages: [1] raster_1.8-31 geoR_1.6-35 sp_0.9-81 googleVis_0.2.5 [5] RJSONIO_0.5-0 sos_1.3-0 brew_1.0-6 lattice_0.19-26 [9] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.5.2 loaded via a namespace (and not attached): [1] digest_0.5.0 tools_2.13.0 I don't see any of the problems that you've experienced, so you should provide a minimal reproducible example that illustrates the problem - it doesn't have to be your actual data, just something that mimics it and reproduces the error you're seeing. There's likely to be something in the way your data are constructed that generates the error, but since you failed to provide it, who can tell? Q: Why do you have levels 0 to 60 instead of either 0 to 59 or 1 to 60? Do you think that might be the cause of the problem? Dennis On Fri, Jun 10, 2011 at 11:08 AM, James Rome jamesr...@gmail.com wrote: I am trying to overlay raw data with a boxplot as follows: pp = qplot(factor(time, levels=0:60, ordered=TRUE), error, data=dfsub, size=I(1), main = title, ylab=Error (min), xlab=Time before ON (min), alpha=I(1/10), ylim=c(-30,40), geom=jitter) + facet_wrap(~ runway, ncol=2) + geom_boxplot(alpha=.5, color=blue, outlier.colour=green, outlier.size=1) + scale_x_discrete(breaks = seq(from=0, to=60, by=10)) print(pp) But I think ggplot2 is getting confused about factors versus numbers sapply(dfsub,class) time error runway flight numeric numeric factor factor because when I do the plot, the vertical grid lines are in pairs, one line at (0, 10, 20,...) and the other at (-1, 9, 19,) If I remove the scale_x_discrete, I get a grid line at every minute, and things look right, but the labels for the minutes all overlap. Thanks for the help, Jim Rome __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear multivariate regression with Robust error
I am with Michael. It is almost impossible to figure out what you are trying. However, I assume, like Michael, that you regress y on x2 and find, say, a negative effect. But when you regress y on x1 and x2, then you find a positive effect of x2. The short answer to your question is that in this case your restricted model (the one only containing x2) suffers from omitted variable bias. Here is an example: Let's assume you are interested in the effect of x2 in this example! Let's say we have 100 observations and that y depends on x1 and x2. Furthermore, let us assume that x1 and x2 are positively correlated. x1=rnorm(100) e1=rnorm(100) #random error term x2=x1+rnorm(100) #x2 is correlated with x2 e=rnorm(100) #random error term y=-3*x1+x2+e #dependent variable Note that x1 has a negative relationship to y, but x2 has a positive relationship to y. Note also that the effect of x1 on y is larger in size (minus 3) than the effect of x2 on y (positive 1). Now let's run some regressions. First, let's run y on x1 only. An unbiased estimate should reproduce the coefficient of -3 within the confidence interval. However, the estimated x1 is much smaller than we would expect. The reason is that because we omit x2, x1 picks up some of the effect of x2 because x1 and x2 are correlated. Hence the coefficient for x1 is diluted. reg1-lm(y~x1) summary(reg1) Now, let's run y on x2. An unbiased estimate should reproduce the coefficient of 1 within the confidence interval. However, the estimated effect of x2 is negative and significant. Obviously, the estimate for x2 is severely biased. The reasons are the following. First, x2 correlates with x1. Hence, when you regress y only on x2, the coefficient will pickup some of the effect of x1 on y. This will generally lead to biased estimates of the coefficient for x2. The reason why the coefficient has the opposite sign that it is supposed to have (and why it is not just a little bit biased like the coefficient on x1 in the previous regression) is that 1. x1 and x2 correlated positively, 2. x1 has a negative effect, while x2 has a positive effect on y (opposite signs), and 3. the effect of x1 is much larger in size than the effect of x2. reg2-lm(y~x2) summary(reg2) Hence, if we accounted for x1 and x2 in our regression of y, both coefficients should be consistently estimated because then we do not suffer from the omission of important predictors of y that are correlated among each other. reg3-lm(y~x1+x2) summary(reg3) Taahtaah. Problem solved (most likely). So the answer to your question is that the correct coefficient is likely the one in which you include the other control variables. You should read up on omitted variable bias. If that is not the problem, you have to give us more information/reproducible code. Hope that helps, Daniel -- View this message in context: http://r.789695.n4.nabble.com/Linear-multivariate-regression-with-Robust-error-tp3587531p3589083.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting the Number of Letters in a row
Is this what you want? sapply(sapply(strsplit(as.character(f1$keyword),' '),nchar),sum) Weidong Gu On Fri, Jun 10, 2011 at 1:25 PM, Jorge Ivan Velez jorgeivanve...@gmail.com wrote: Hi Abraham, Try foo - function(x){ x - as.character(x) sapply(strsplit(x, ), function(s) sum(nchar(s))) } foo(f1$keyword) HTH, Jorge On Fri, Jun 10, 2011 at 12:48 PM, Abraham Mathew wrote: I'm trying to find the total number of letters in a row of a data frame. Let's say I have the following data frame. f1 - data.frame(keyword=c(I live in Denver, I live in Kansas City, MO, Pizza is good)) The following function gives me the number of characters in each string. So for I live in Denver, I get 1, 4, 2, and 6. However, I want to know the total number of characters (13). sapply(strsplit(as.character(f1$keyword), ), nchar) Thanks, Abraham [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test if data uniformly distributed (newbie)
Thanks for your answer. The reason I want the data to be uniform: It's the first step in a machine learning project I am working on. If I know the data isn't uniformly distributed, then this means there is probably something wrong and the following steps will be biased by the non-uniform input data. I'm not checking an assumption for another statistical test. Actually, the data has been normalized because it is supposed to represent a probability distribution. That's why it sums to 1. My assumption is that, for a vector of 5, the data at that point should look like 0.20 0.20 0.20 0.20 0.20, but of course there is variation, and I would like to test whether the data comes close enough or not. At the moment I am only testing whether there are more a's than b's in the top and bottom portion of the each file (with a wilcoxon test, I have 8 reps of the model I am trying to build). But that sort of felt like a very adhoc solution and I figured maybe testing for uniformity would be better, or at least a important addition. I've also been looking into testing for the randomness of the sequence of a's and b's instead of the wilcoxon test, although that may or may not involve R. Kairavi. Yes, punif is the function to use, however the KS test (and the others) are based on an assumption of independence, and if you know that your data points sum to 1, then they are not independent (and not uniform if there are more than 2). Also note that these tests only rule out distributions (with a given type I error rate), but cannot confirm that the data comes from a given distribution (just that either they do, or there is not enough power to distinguish between the actual and the test distributions). What is your ultimate question/goal? Why do you care if the data is uniform or not? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599# 801.408.8111 [Hide Quoted Text] -Original Message- From: r-help-boun...@r-project.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599#[mailto: r-help-bounces@r-https://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599# project.org] On Behalf Of Kairavi Bhakta Sent: Friday, June 10, 2011 11:24 AM To: r-help@r-project.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599# Subject: [R] Test if data uniformly distributed (newbie) Hello, I have a bunch of files containing 300 data points each with values from 0 to 1 which also sum to 1 (I don't think the last element is relevant though). In addition, each data point is annotated as an a or a b. I would like to know in which files (if any) the data is uniformly distributed. I used Google and found out that a Kolmogorov-Smirnov or a Chi-square goodness-of-fit test could be used. Then I looked up ?kolmogorov and found ks.test, but the example there is for the normal distribution and I am not sure how to adapt it for the uniform distribution. I did ?runif and read about the uniform distribution but it doesn't say what the cumulative distribution is. Is it punif, like pnorm? I thought of that because I found a message on this list where someone was told to use pnorm instead of dnorm. But the help page on the uniform distribution says punif is the distribution function. Are the cumulative distribution and the distribution function the same thing? Having several names for the same thing has always confused me very much in statistics. Also, I am not sure whether I need to specify any parameters for the distribution and which. I thought maybe I should specify min=0 and max=1 but those appear to be the defaults. Do I need to specify q, the vector of quantiles? So is ks.test(x, punif) correct or not for what I am attempting to do? After this I will also need to find out whether the a's and b's are distributed randomly in each file. I would be greatful for any pointers although I have not researched this issue yet. Kairavi. [[alternative HTML version deleted]] __ R-help@r-project.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599#mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/postinghttp://www.r-project.org/posting - guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with Median test and Coxon-Mann-Whittney test
Hi All, I have the following dataset (triplicates values of 5 independent measurements and duplicate vaues of a control): 12 3 4 5 C 181.8 58.2 288.9 273.2290.953.9 120.3 116.8108.9 281.3 446 39.6 86.1 148.5 52.9 126 150.3 My aim was to find if mean values of Samples 1 - 5 are significantly higher than the mean value for C (control). At first, I carried out mean, SD and t-test (one-tail). Although SD error bars are large, two of the samples have mean values that are significantly higher than that of C. Second, I carried out median and Coxon-Mann-Whittney test because of my worry about the size of my data and the high variations in the replicates. I was surprised however, that median and Coxon-Mann-Whittney tests did not reveal statistical significant results. I will be happy if anyone could adivce me on the best way to analyse this dataset. Regards, Rob [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] change value in one cell
Hello all, I am wondering if there is a way to change the value of one cell in R. For instance let's say I have a hypothetical data frame that looks like this: Obs X Y Z 11 0 1 20 0 1 31 1 1 40 1 1 51 1 0 60 0 0 I would like to change the value of the 4th observation in the Y column from 1 to 0. It should look like this: Obs X Y Z 11 0 1 20 0 1 31 1 1 40 0 1 51 1 0 60 0 0 Is it possible to change the value in one command? Thanks, Carlos -- View this message in context: http://r.789695.n4.nabble.com/change-value-in-one-cell-tp3589456p3589456.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RES: Linear multivariate regression with Robust error
---BeginMessage--- Hi Barjesh, I am not sure which data you analyze, but once I had a similar situation and it was a multicolinearity issue. I realized this after finding a huge correlation between the independent variables, then I dropped one of them and signs of slopes made sense. Beforehand, have a glance at your correlation matrix of independent variables to see the relationship between them. Not sure how helpful my advice will be, but it did the trick for me back then ;) Cheers, Filipe Botelho -Mensagem original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Em nome de Michael Friendly Enviada em: sexta-feira, 10 de junho de 2011 12:51 Para: Barjesh Kochar Cc: r-help@r-project.org Assunto: Re: [R] Linear multivariate regression with Robust error On 6/10/2011 12:23 AM, Barjesh Kochar wrote: Dear all, i am doing linear regression with robust error to know the effect of a (x) variable on (y)other if i execute the command i found positive trend. But if i check the effect of number of (x.x1,x2,x3)variables on same (y)variable then the positive effect shwon by x variable turns to negative. so plz help me in this situation. Barjesh Kochar Research scholar You don't give any data or provide any code (as the posting guide requests) , so I have to guess that you have just rediscovered Simpson's paradox -- that the coefficient of a variable in a marginal regression can have an opposite sign to that in a joint model with other predictors. I have no idea what you mean by 'robust error'. One remedy is an added-variable plot which will show you the partial contributions of each predictor in the joint model, as well as whether there are any influential observations that are driving the estimated coefficients. -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele StreetWeb: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ---End Message--- This message and its attachments may contain confidential and/or privileged information. If you are not the addressee, please, advise the sender immediately by replying to the e-mail and delete this message. Este mensaje y sus anexos pueden contener información confidencial o privilegiada. Si ha recibido este e-mail por error por favor bórrelo y envíe un mensaje al remitente. Esta mensagem e seus anexos podem conter informação confidencial ou privilegiada. Caso não seja o destinatário, solicitamos a imediata notificação ao remetente e exclusão da mensagem.__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Memory(RAM) issues
Dear All, I have been working with R(desktop version) in VISTA. I have the latest version R 2.13.0. I have been working with few data-sets of size 12Lakh * 15 and my code is quite computing intensive ( applying MCMC gibbs sampler on a posterior of 144 variables) that often runs into a memory issue such as memory can not allocate the vector ,full size(shows to have reached something like 1.5 GB) reached or something to this effect. I have a RAM of 2GB. I checked with the option like memory.size and it says a 64 bit R if sat on 64 bit windows then max memory capability is 8TB. Now I don't have background to understand the definitions and differences between 32 and 64 bit machines and other technical requirements like servor etc but it would be of great help if anyone can let me have a feel of it. Could any of you tell me whether some servor version of R would resolve my issue or not (I am not sure now what kind of servor my company would allow R to be installed at this point ,may be linux type) and if that's the case could any of you guide me about how to go about installing that onto a sevor. Thank you, Abhisek [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is there an implementation loess with more than 4 parametric predictors or a trick to similar effect?
Dear R experts, I have a problem that is a related to the question raised in this earlier post https://stat.ethz.ch/pipermail/r-help/2007-January/124064.html My situation is different in that I have only 2 predictors (coordinates x,y) for local regression but a number of global (parametric) offsets that I need to consider. Essentially, I have a spatial distortion overlaid over a number of measurements. These measurements can be grouped by the same underlying value for each group. The groups are known, but the values are not. We need to estimate the spatial trend, which we then want to remove. In our application, the spatial trend is two-dimensional (x,y), and there are about 20 groups of about 50 measurements each. The measurements are randomly placed. Taking the first group as reference, there are thus 19 unknown offsets. The below code for toy data (spatial trend in one dimension x) works for two or three offset groups, although I have not yet found out how to extract the fitted values for the globally fit parameters (the offsets). Unfortunately, the loess call fails for four or more offset groups with the error message Error in simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize, : only 1-4 predictors are allowed Does anyone know of an implementation of local regression with global (parametric) offset groups that could be applied here, or is there a better way of dealing with this? Any comments would be greatly appreciated! Best regards, David Kreil. ### # # loess with parametric offsets - toy data demo # x-seq(0,9,.1); x.N-length(x); o-c(0.4,-0.8,1.2#,-0.2 # works for three but not four ); # these are the (unknown) offsets o.N-length(o); f-sapply(seq(o.N), function(n){ ifelse((seq(x.N)= n *x.N/(o.N+1) seq(x.N) (n-1)*x.N/(o.N+1)), 1,0); }); f-f[sample(NROW(f)),]; y-sin(x)+rnorm(length(x),0,.1)+f%*%o; s.fs-sapply(seq(NCOL(f)),function(i){paste('f',i,sep='')}); s-paste(c('y~x',s.fs),collapse='+'); d-data.frame(x,y,f) names(d)-c('x','y',s.fs); l-loess(formula(s),parametric=s.fs,drop.square=s.fs,normalize=F,data=d, span=0.4); yp-predict(l,newdata=d); plot(x,y-f%*%o,ylim=c(-3,3)); # spatial distortion lines(x,yp-f%*%o,pch='+');# estimate of that points(x,y,pch='+',col='red');# input data points(x,yp,pch='o',col='blue'); # fit of that [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] change value in one cell
you probably need to read the Introduction to R to understand indexing: x Obs X Y Z 1 1 1 0 1 2 2 0 0 1 3 3 1 1 1 4 4 0 1 1 5 5 1 1 0 6 6 0 0 0 x[4, Y] - 0 x Obs X Y Z 1 1 1 0 1 2 2 0 0 1 3 3 1 1 1 4 4 0 0 1 5 5 1 1 0 6 6 0 0 0 On Fri, Jun 10, 2011 at 5:42 PM, jour4life jour4l...@gmail.com wrote: Hello all, I am wondering if there is a way to change the value of one cell in R. For instance let's say I have a hypothetical data frame that looks like this: Obs X Y Z 1 1 0 1 2 0 0 1 3 1 1 1 4 0 1 1 5 1 1 0 6 0 0 0 I would like to change the value of the 4th observation in the Y column from 1 to 0. It should look like this: Obs X Y Z 1 1 0 1 2 0 0 1 3 1 1 1 4 0 0 1 5 1 1 0 6 0 0 0 Is it possible to change the value in one command? Thanks, Carlos -- View this message in context: http://r.789695.n4.nabble.com/change-value-in-one-cell-tp3589456p3589456.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] running R commands asynchronously
Tk windows run asynchronous to the rest of R, so you could write a small function that lets you type the command into a Tk window and runs it while you continue to work in the main R, then the window could signal you when the function finishes. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Richard M. Heiberger Sent: Friday, June 10, 2011 3:30 PM To: r-help Subject: [R] running R commands asynchronously I am interested in running R commands asynchronously. My first choice is in the same R session that I am currently in. Here, the goal would be to run something like RunAsynchSameSession(myfunction(), outputname.rda) Once RunAsynchSameSession had started myfunction(), RunAsynchSameSession would complete immediately. myfunction would keep going. It is OK if execution of the myfunction() command prevents new input to R until it has completed. The important feature is that RunAsynchSameSession must tell the progam that called it that it was done. Second choice is to start an independent process, BATCH or something similar, and save the resulting data objects in an .rda file. RunAsynchBatch(myfile.r, outputname.rda) The RunAsynchBatch would start a batch process and complete immediately after starting the batch file. The batch file would run independently until it was completed. While I know how to do this, for example with system(wait=FALSE), I would appreciate seeing a worked collection of statements, including getting outputname.rda back to the original R session. I am working on Windows. Rich [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R program writing standard/practices
On 11/06/11 03:35, Santosh wrote: Dear Experts, I notice that there are different ways of writing programs. Was wondering if there is anything like a standard which could be used to write good/complete R programs, maintain quality, easy to debug, a standard/practice that can be consistent in an enterprise environment. Also, are there any beacon lights that could help navigate through lines of code, programming styles, understand the scope of a program, identify gaps etc? This issue was recently discussed on this list. One suggested source of info was http://4dpiecharts.com/r-code-style-guide/ cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NDCG in R?
Dose R have a function to calculate NDCG? http://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problems with geom_vline, histograms, scale=free and facets in ggplot
Dear list, I´m having a little trouble with adding vertical lines to a histogram. I need to draw a matrix of histograms (using facets), and in each histogram add a vertical line indicating the mean value of the data in each facet. According to the last example in the geom_hline help, to display different lines in each facet I need to provide a data frame. I modified this example to make a plot similar to what I need: # BEGIN R CODE # This example works and does exactly what I need library(reshape) library(ggplot2) # First, create a data frame with the mean value, to add to each facet mean.data=tapply(mtcars$wt,INDEX=list(mtcars$vs,mtcars$am),mean) mean.data=melt(mean.data) colnames(mean.data)=c(vs,am,wt.mean) # Now do the plot p - qplot(wt, data=mtcars) p - p + facet_grid(vs ~ am,scale=free) p - p + geom_vline(aes(xintercept = wt.mean), mean.data,col=red) p ### ### # This one does not work # Create a test data set set.seed(100) my.data=data.frame(code1=rep(c(A,B,C),each=30), code2=rep(rep(c(D,E,F),each=10),3), n=runif(90)) # Create a data frame with the mean value, to add to each facet mean.data=tapply(my.data$n,INDEX=list(my.data$code1,my.data$code2),mean) mean.data=melt(mean.data) colnames(mean.data)=c(code1,code2,n.mean) # Now do the plot p = qplot(n, data=my.data) + facet_grid(code2 ~ code1, scales=free) p + geom_vline(aes(xintercept = n.mean), mean.data,col=red) #With this I get the following error: Error in if (length(range) == 1 || diff(range) == 0) { : missing value where TRUE/FALSE needed. # If I remove the scales=free argument. I get the lines but also a lot of extra empty facets. p = qplot(n, data=my.data) + facet_grid(code2 ~ code1) p + geom_vline(aes(xintercept = n.mean), mean.data,col=red) # END R CODE ## I can´t figure out why the second example does not work when having the scales=free argument, and why I get the extra facets when removing it. Is this a bug? Any help will be very welcomed. Julian -- Julian Mariano Burgos Hafrannsóknastofnunin/Marine Research Institute Skúlagata 4, 121 Reykjavík, Iceland Sími/Telephone : +354-5752037 Bréfsími/Telefax: +354-5752001 Netfang/Email: jul...@hafro.is, jmbur...@uw.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory(RAM) issues
It will be helpful on this forum to use metric measures: 12 Lakh is 1.2 million, thus your data is 1.2 million observations x 15 variables. I do not know the intricacies of R. You may have to wait for someone with that knowledge to respond. Including some relevant portions of error messages and code in your query can also help someone to respond to your message. Anupam. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Abhisek Saha Sent: Saturday, June 11, 2011 6:25 AM To: r-help@r-project.org Subject: [R] Memory(RAM) issues Dear All, I have been working with R(desktop version) in VISTA. I have the latest version R 2.13.0. I have been working with few data-sets of size 12Lakh * 15 and my code is quite computing intensive ( applying MCMC gibbs sampler on a posterior of 144 variables) that often runs into a memory issue such as memory can not allocate the vector ,full size(shows to have reached something like 1.5 GB) reached or something to this effect. I have a RAM of 2GB. I checked with the option like memory.size and it says a 64 bit R if sat on 64 bit windows then max memory capability is 8TB. Now I don't have background to understand the definitions and differences between 32 and 64 bit machines and other technical requirements like servor etc but it would be of great help if anyone can let me have a feel of it. Could any of you tell me whether some servor version of R would resolve my issue or not (I am not sure now what kind of servor my company would allow R to be installed at this point ,may be linux type) and if that's the case could any of you guide me about how to go about installing that onto a sevor. Thank you, Abhisek [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I make proper use of the by() function?
You may want to read about several apply functions. They may help you do what you are trying. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Fredrik Karlsson Sent: Saturday, June 11, 2011 3:13 AM To: R-Help List Subject: [R] How do I make proper use of the by() function? Dear list, I have a function that uses values from two vectors and spits out one new vector based on all the values of the two original vectors, and with the same length as them. Ok, I would now like to be able to apply that function simply on two columns in a data frame, divided by the levels of factors in the data frame. I'm trying to use by() for this, but the output is too hard to use. What I am doing is this: by(df, list(df$Factor1,df$Factor2),function(x) my_function(x$col1,x$col2),simplify=TRUE) and the output is too complex to be used in a simple way. Actually, I just want something like a data frame, where the results vectors are placed in one column and the conditions under which they were produced (i.e. the values of the factors according to which the data set were divided) in other columns. This does not seem to be provided by by(), and aggregate only provides me with the ability to use values from one column, right? So, are there other functions I could use? Thanks! /Fredrik -- Life is like a trumpet - if you don't put anything into it, you don't get anything out of it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NDCG in R?
Hi: I couldn't find one by searching with RSiteSearch() or package sos, but it's not hard to write a simple function that reproduces the example on the Wikipedia page: ndcg - function(x) { # x is a vector of relevance scores ideal_x - rev(sort(x)) DCG - function(y) y[1] + sum(y[-1]/log(2:length(y), base = 2)) DCG(x)/DCG(ideal_x) } rel - c(3,2,3,0,1,2) # example from Wikipedia ndcg(rel) [1] 0.9315085 The difference between this and the result on the Wikipedia page is due to rounding. You can see this by pulling out the DCG function and evaluating it separately: DCG - function(y) y[1] + sum(y[-1]/log(2:length(y), base = 2)) DCG(rel) [1] 8.097171 DCG(rev(sort(rel))) [1] 8.692536 The Wikipedia page clearly indicates that there are several considerations that need to be addressed in the generation and processing of relevance scores. The above function obviously does not cover all the bases, but it does cover the situation corresponding to the given example. It's a start... HTH, Dennis On Fri, Jun 10, 2011 at 6:14 PM, Jim Cheng jxch...@gmail.com wrote: Dose R have a function to calculate NDCG? http://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] running R commands asynchronously
On Fri, 10 Jun 2011, Greg Snow wrote: Tk windows run asynchronous to the rest of R, so you could write a small function that lets you type the command into a Tk window and runs it while you continue to work in the main R, then the window could signal you when the function finishes. That is really not advised. R is not designed to run asynchronous R commands and is not protected against quite a lot of things which can happen if you try that (many low-level routines are not re-entrant, for example). Allowing callbacks to either run simple things (as happens from e.g. the GUI menus) whilst a task is running, or to do almost all the running (as happens with a menu system built in Tk) is fairly safe just because only one task is likely to run at a time. For over a decade Duncan TL has promised a facility to run tasks in parallel in R (as I recall the estimate at DSC 2001 was 12 months). So the only safe way at present (and the foreseeable future) is to run separate processes. Packages snow and multicore provide means to do that (and to wait for and collect results): in the case of multicore the parallel tasks work on a copy of the current session and so you do come close to the appearance of running aysnchronous tasks. (By choosing to use Windows you exclude yourself from multicore.) -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Richard M. Heiberger Sent: Friday, June 10, 2011 3:30 PM To: r-help Subject: [R] running R commands asynchronously I am interested in running R commands asynchronously. My first choice is in the same R session that I am currently in. Here, the goal would be to run something like RunAsynchSameSession(myfunction(), outputname.rda) Once RunAsynchSameSession had started myfunction(), RunAsynchSameSession would complete immediately. myfunction would keep going. It is OK if execution of the myfunction() command prevents new input to R until it has completed. The important feature is that RunAsynchSameSession must tell the progam that called it that it was done. Second choice is to start an independent process, BATCH or something similar, and save the resulting data objects in an .rda file. RunAsynchBatch(myfile.r, outputname.rda) The RunAsynchBatch would start a batch process and complete immediately after starting the batch file. The batch file would run independently until it was completed. While I know how to do this, for example with system(wait=FALSE), I would appreciate seeing a worked collection of statements, including getting outputname.rda back to the original R session. I am working on Windows. Rich -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.