Re: [R] NADA Package install disappearance
On 22/05/2013 21:06, Rich Shepard wrote: On Wed, 22 May 2013, rwillims wrote: I have been using the NADA package to do some statistical analysis, however I have just found that the package is no longer available for install. I've downloaded an older version ( NADA_1.5-4.tar.gz ) and tried to use install.packages to install it in two versions of R ( 3.0.0 and 2.15.1) and I have gotten the same error message for both: package ‘path/NADA_1.5-4.tar.gz’ is not available (for R version 2.15.1). Rachel, I'm running R-3.0.0 and had no problems re-installing NADA from the osuosl.org ftp server. I've no idea what version of NADA is installed. Have you tried another repository? Rich A good idea for things like this is to check the CRAN package web page: http://cran.r-project.org/package=NADA That shows it is archived, and last updated over a year ago. You should be able to download the source file, and install.packages('NADA_1.5-4.tar.gz', repos=NULL) worked for me in 3.0.1. The package was archived because it was unmaintained and uses the long-obsolete \synopsis syntax that is being removed. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] convert a character string to a name
Hi, From time to time I need to do the aggregation. To illustrate, I present a toy example as below. In this example, the task is to aggregate x and y by z with the function mean. Could I call the aggregation function with x_test, where x_test=c(x,y)? Thanks Miao dftest-data.frame(x=1:12, y=(1:12)%%4, z=(1:12)%%2) dftest x y z 1 1 1 1 2 2 2 0 3 3 3 1 4 4 0 0 5 5 1 1 6 6 2 0 7 7 3 1 8 8 0 0 9 9 1 1 10 10 2 0 11 11 3 1 12 12 0 0 aggregate(cbind(x,y)~z, data=dftest, FUN=mean) z x y 1 0 7 1 2 1 6 2 x_test=c(x,y) aggregate(cbind(x_test)~z, data=dftest, FUN=mean) Error in model.frame.default(formula = cbind(x_test) ~ z, data = dftest) : variable lengths differ (found for 'z') a1aggregate(cbind(factor(x_test))~z, data=dftest, FUN=mean) Error in model.frame.default(formula = cbind(factor(x_test)) ~ z, data = dftest) : variable lengths differ (found for 'z') aggregate(factor(x_test)~z, data=dftest, FUN=mean) Error in model.frame.default(formula = factor(x_test) ~ z, data = dftest) : variable lengths differ (found for 'z') [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows without loops
Merge should do the trick. How to best use it will depend on what you want to do with the data after. The following is an example of what you could do. This will perform best, if the rows are missing at random and do not cluster. DF1 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:5,7:9)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DF2 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:8)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DFm - merge(DF1, DF2, by=c(X.DATE, X.TIME), all=TRUE) while(any(is.na(DFm))){ if (any(is.na(DFm[1,]))) stop(Complete first row required!) ind - which(is.na(DFm), arr.ind=TRUE) prind - matrix(c(ind[,row]-1, ind[,col]), ncol=2) DFm[is.na(DFm)] - DFm[prind] } DFm Best, Nello -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Adeel Amin Sent: Donnerstag, 23. Mai 2013 07:01 To: r-help@r-project.org Subject: [R] adding rows without loops I'm comparing a variety of datasets with over 4M rows. I've solved this problem 5 different ways using a for/while loop but the processing time is murder (over 8 hours doing this row by row per data set). As such I'm trying to find whether this solution is possible without a loop or one in which the processing time is much faster. Each dataset is a time series as such: DF1: X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 070045 35 6 01052007 080042 32 7 01052007 090045 32 ... ... ... n DF2 X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 35 6 01052007 070042 32 7 01052007 080045 32 ... ... n+4000 In other words there are 4000 more rows in DF2 then DF1 thus the datasets are of unequal length. I'm trying to ensure that all dataframes have the same number of X.DATE and X.TIME entries. Where they are missing, I'd like to insert a new row. In the above example, when comparing DF2 to DF1, entry 01052007 0600 entry is missing in DF1. The solution would add a row to DF1 at the appropriate index. so new dataframe would be X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 27 6 01052007 070045 35 7 01052007 080042 32 8 01052007 090045 32 Value and Value2 would be the same as row 4. Of course this is simple to accomplish using a row by row analysis but with of 4M rows the processing time destroying and rebinding the datasets is very time consuming and I believe highly un-R'ish. What am I missing? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert a character string to a name
with(dftest,aggregate(cbind(x,y),list(z),FUN=mean)) # Group.1 x y #1 0 7 1 #2 1 6 2 #or library(plyr) ddply(dftest,.(z),numcolwise(mean)) # z x y #1 0 7 1 #2 1 6 2 A.K. - Original Message - From: jpm miao miao...@gmail.com To: r-help r-help@r-project.org Cc: Sent: Thursday, May 23, 2013 3:05 AM Subject: [R] convert a character string to a name Hi, From time to time I need to do the aggregation. To illustrate, I present a toy example as below. In this example, the task is to aggregate x and y by z with the function mean. Could I call the aggregation function with x_test, where x_test=c(x,y)? Thanks Miao dftest-data.frame(x=1:12, y=(1:12)%%4, z=(1:12)%%2) dftest x y z 1 1 1 1 2 2 2 0 3 3 3 1 4 4 0 0 5 5 1 1 6 6 2 0 7 7 3 1 8 8 0 0 9 9 1 1 10 10 2 0 11 11 3 1 12 12 0 0 aggregate(cbind(x,y)~z, data=dftest, FUN=mean) z x y 1 0 7 1 2 1 6 2 x_test=c(x,y) aggregate(cbind(x_test)~z, data=dftest, FUN=mean) Error in model.frame.default(formula = cbind(x_test) ~ z, data = dftest) : variable lengths differ (found for 'z') a1aggregate(cbind(factor(x_test))~z, data=dftest, FUN=mean) Error in model.frame.default(formula = cbind(factor(x_test)) ~ z, data = dftest) : variable lengths differ (found for 'z') aggregate(factor(x_test)~z, data=dftest, FUN=mean) Error in model.frame.default(formula = factor(x_test) ~ z, data = dftest) : variable lengths differ (found for 'z') [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert a character string to a name
If you want to use the character string: attach(dftest) aggregate(cbind(sapply(x_test, get))~z, data=dftest, FUN=mean) # or with(dftest,aggregate(cbind(sapply(x_test, get)),list(z),FUN=mean)) detach(dftest) Cheers, Nello -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of arun Sent: Donnerstag, 23. Mai 2013 09:19 To: jpm miao Cc: R help Subject: Re: [R] convert a character string to a name with(dftest,aggregate(cbind(x,y),list(z),FUN=mean)) # Group.1 x y #1 0 7 1 #2 1 6 2 #or library(plyr) ddply(dftest,.(z),numcolwise(mean)) # z x y #1 0 7 1 #2 1 6 2 A.K. - Original Message - From: jpm miao miao...@gmail.com To: r-help r-help@r-project.org Cc: Sent: Thursday, May 23, 2013 3:05 AM Subject: [R] convert a character string to a name Hi, From time to time I need to do the aggregation. To illustrate, I present a toy example as below. In this example, the task is to aggregate x and y by z with the function mean. Could I call the aggregation function with x_test, where x_test=c(x,y)? Thanks Miao dftest-data.frame(x=1:12, y=(1:12)%%4, z=(1:12)%%2) dftest x y z 1 1 1 1 2 2 2 0 3 3 3 1 4 4 0 0 5 5 1 1 6 6 2 0 7 7 3 1 8 8 0 0 9 9 1 1 10 10 2 0 11 11 3 1 12 12 0 0 aggregate(cbind(x,y)~z, data=dftest, FUN=mean) z x y 1 0 7 1 2 1 6 2 x_test=c(x,y) aggregate(cbind(x_test)~z, data=dftest, FUN=mean) Error in model.frame.default(formula = cbind(x_test) ~ z, data = dftest) : variable lengths differ (found for 'z') a1aggregate(cbind(factor(x_test))~z, data=dftest, FUN=mean) Error in model.frame.default(formula = cbind(factor(x_test)) ~ z, data = dftest) : variable lengths differ (found for 'z') aggregate(factor(x_test)~z, data=dftest, FUN=mean) Error in model.frame.default(formula = factor(x_test) ~ z, data = dftest) : variable lengths differ (found for 'z') [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ordered and unordered variables
Hi Try to put your question on stackexchange. Or maybe it is already answered there. I am not an statistical expert but based on common sense (which can be counter intuitive sometimes) I will use ordered factor if I expect influence of tension value on breaks. Anyway I will probably consult more experienced people around or some textbook. Regards Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of meng Sent: Thursday, May 23, 2013 4:44 AM To: Uwe Ligges Cc: R help Subject: Re: [R] ordered and unordered variables It's not homework. I met this question during my practical work via R. The boss is an expert of biology,but he doesn't know statistics.So I must find the right method to this work. At 2013-05-22 17:30:34,Uwe Ligges lig...@statistik.tu-dortmund.de wrote: On 22.05.2013 07:09, meng wrote: Thanks. As to the data warpbreaks, if I want to analysis the impact of tension(L,M,H) on breaks, should I order the tension or not? No homework questions on this list, please ask your teacher. Best, Uwe Ligges Many thanks. At 2013-05-21 20:55:18,David Winsemius dwinsem...@comcast.net wrote: On May 20, 2013, at 10:35 PM, meng wrote: Hi all: If the explainary variables are ordinal,the result of regression is different from unordered variables.But I can't understand the result of regression from ordered variable. The data is warpbreaks,which belongs to R. If I use the unordered variable(tension):Levels: L M H The result is easy to understand: Estimate Std. Error t value Pr(|t|) (Intercept)36.39 2.80 12.995 2e-16 *** tensionM -10.00 3.96 -2.525 0.014717 * tensionH -14.72 3.96 -3.718 0.000501 *** If I use the ordered variable(tension):Levels: L M H I don't know how to explain the result: Estimate Std. Error t value Pr(|t|) (Intercept) 28.148 1.617 17.410 2e-16 *** tension.L-10.410 2.800 -3.718 0.000501 *** tension.Q 2.155 2.800 0.769 0.445182 What's tension.L and tension.Q stands for?And how to explain the result then? Ordered factors are handled by the R regression mechanism with orthogonal polynomial contrasts: .L for linear and .Q for quadratic. If the term had 4 levels there would also have been a .C (cubic) term. Treatment contrasts are used for unordered factors. Generally one would want to do predictions for explanations of the results. Trying to explain the individual coefficient values from polynomial contrasts is similar to and just as unproductive as trying to explain the individual coefficients involving interaction terms. -- David Winsemius Alameda, CA, USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert a character string to a name
try this: dftest-data.frame(x=1:12, y=(1:12)%%4, z=(1:12)%%2) aggregate(cbind(x,y)~z, data=dftest, FUN=mean) z x y 1 0 7 1 2 1 6 2 x_test=c(x,y) a - formula(paste0('cbind(' + , x_test[1] + , ',' + , x_test[2] + , ') ~ z' + )) a cbind(x, y) ~ z aggregate(a, data = dftest, FUN = mean) z x y 1 0 7 1 2 1 6 2 On Thu, May 23, 2013 at 3:05 AM, jpm miao miao...@gmail.com wrote: Hi, From time to time I need to do the aggregation. To illustrate, I present a toy example as below. In this example, the task is to aggregate x and y by z with the function mean. Could I call the aggregation function with x_test, where x_test=c(x,y)? Thanks Miao dftest-data.frame(x=1:12, y=(1:12)%%4, z=(1:12)%%2) dftest x y z 1 1 1 1 2 2 2 0 3 3 3 1 4 4 0 0 5 5 1 1 6 6 2 0 7 7 3 1 8 8 0 0 9 9 1 1 10 10 2 0 11 11 3 1 12 12 0 0 aggregate(cbind(x,y)~z, data=dftest, FUN=mean) z x y 1 0 7 1 2 1 6 2 x_test=c(x,y) aggregate(cbind(x_test)~z, data=dftest, FUN=mean) Error in model.frame.default(formula = cbind(x_test) ~ z, data = dftest) : variable lengths differ (found for 'z') a1aggregate(cbind(factor(x_test))~z, data=dftest, FUN=mean) Error in model.frame.default(formula = cbind(factor(x_test)) ~ z, data = dftest) : variable lengths differ (found for 'z') aggregate(factor(x_test)~z, data=dftest, FUN=mean) Error in model.frame.default(formula = factor(x_test) ~ z, data = dftest) : variable lengths differ (found for 'z') [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert a character string to a name
Sorry, didn't read your question properly #Just a modification without attach(): aggregate(cbind(sapply(x_test,get,dftest))~z,data=dftest,FUN=mean) # z x y #1 0 7 1 #2 1 6 2 #if you need to aggregate() all the columns except the grouping column aggregate(.~z,data=dftest,FUN=mean) # z x y #1 0 7 1 #2 1 6 2 A.K. - Original Message - From: Blaser Nello nbla...@ispm.unibe.ch To: arun smartpink...@yahoo.com; jpm miao miao...@gmail.com Cc: R help r-help@r-project.org Sent: Thursday, May 23, 2013 3:29 AM Subject: RE: [R] convert a character string to a name If you want to use the character string: attach(dftest) aggregate(cbind(sapply(x_test, get))~z, data=dftest, FUN=mean) # or with(dftest,aggregate(cbind(sapply(x_test, get)),list(z),FUN=mean)) detach(dftest) Cheers, Nello -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of arun Sent: Donnerstag, 23. Mai 2013 09:19 To: jpm miao Cc: R help Subject: Re: [R] convert a character string to a name with(dftest,aggregate(cbind(x,y),list(z),FUN=mean)) # Group.1 x y #1 0 7 1 #2 1 6 2 #or library(plyr) ddply(dftest,.(z),numcolwise(mean)) # z x y #1 0 7 1 #2 1 6 2 A.K. - Original Message - From: jpm miao miao...@gmail.com To: r-help r-help@r-project.org Cc: Sent: Thursday, May 23, 2013 3:05 AM Subject: [R] convert a character string to a name Hi, From time to time I need to do the aggregation. To illustrate, I present a toy example as below. In this example, the task is to aggregate x and y by z with the function mean. Could I call the aggregation function with x_test, where x_test=c(x,y)? Thanks Miao dftest-data.frame(x=1:12, y=(1:12)%%4, z=(1:12)%%2) dftest x y z 1 1 1 1 2 2 2 0 3 3 3 1 4 4 0 0 5 5 1 1 6 6 2 0 7 7 3 1 8 8 0 0 9 9 1 1 10 10 2 0 11 11 3 1 12 12 0 0 aggregate(cbind(x,y)~z, data=dftest, FUN=mean) z x y 1 0 7 1 2 1 6 2 x_test=c(x,y) aggregate(cbind(x_test)~z, data=dftest, FUN=mean) Error in model.frame.default(formula = cbind(x_test) ~ z, data = dftest) : variable lengths differ (found for 'z') a1aggregate(cbind(factor(x_test))~z, data=dftest, FUN=mean) Error in model.frame.default(formula = cbind(factor(x_test)) ~ z, data = dftest) : variable lengths differ (found for 'z') aggregate(factor(x_test)~z, data=dftest, FUN=mean) Error in model.frame.default(formula = factor(x_test) ~ z, data = dftest) : variable lengths differ (found for 'z') [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Transform Coordinate System of an ASCII-Grid
Dear all, I have an ASCII-Grid for Switzerland in the Swiss National Coordinate System of CH1903. Now for a Webapplication of the ASCII-Grid, I need to deliver the ASCII-Grid in the WGS84 System. Via coordinates(ascii) I can export the coordinates and convert them with a formula into WGS84. My problem is now, how can I implement these into the ASCII-Grid, so that the whole grid-structure is from now on gonna be saved in the WGS84-coordinate format? (important: I don't want to change the projection, I want to actually change the numeric format of the coordinates) Thank you so much for your help, jas -- View this message in context: http://r.789695.n4.nabble.com/Transform-Coordinate-System-of-an-ASCII-Grid-tp4667786.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using metafor for meta-analysis of before-after studies (escalc, SMCC)
The mean percentage change and the raw mean change are not directly comparable, even after standardization based on the SD of the percentage change or raw change values. So, I would not mix those in the same analysis. Best, Wolfgang -Original Message- From: Qiang Yue [mailto:qiangm...@gmail.com] Sent: Wednesday, May 22, 2013 20:38 To: Viechtbauer Wolfgang (STAT); r-help Subject: Re: RE: [R] using metafor for meta-analysis of before-after studies (escalc, SMCC) Dear Dr. Viechtbauer: Thank you very much for sparing your precious time to answer my question. I still want to make sure for the third question below: for studies which only reported percentage changes (something like: the metabolite concentration increased by 20%+/-5% after intervention), we can not use the percentage change to calculate SMCC, but have to get the raw change first? With best wishes. Qiang Yue From: Viechtbauer Wolfgang (STAT) Date: 2013-05-21 10:09 To: Moon Qiang; r-help Subject: RE: [R] using metafor for meta-analysis of before-after studies (escalc, SMCC) Please see my answers below. Best, Wolfgang -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Moon Qiang Sent: Thursday, May 16, 2013 19:12 To: r-help Subject: [R] using metafor for meta-analysis of before-after studies (escalc, SMCC) Hello. I am trying to perform meta-analysis on some before-after studies. These studies are designed to clarify if there is any significant metabolic change before and after an intervention. There is only one group in thes e studies, i.e., no control group. I followed the e-mail communication of R-help (https://stat.ethz.ch/pipermail/r-help/2012- April/308946.html ) and the Metafor Manual (version 1.8-0, released 2013-04- 11, relevant contents can be found on pages of 59-61 under 'Outcome Measures for Individual Groups '). I made a trial analysis and attached the output here, I wonde r if anyone can look through it and give me some comments. I have three questions about the analysis: 1) Most studies reported the before-and-after raw change as Mean+/- SD, but few of them have reported the values of before-intervention (mean_r and sd_r) and the values of after- intervention (mean_s and sd_s), and none of them reported the r value (correlation for the before- and after- intervention measurements). Based on the guideline of the Metafor manual , I set the raw mean change as m1i (i.e., raw mean change=mean_s=m1i), and s et the standard deviation of raw change as sd1i (i.e., the standard deviati on of raw change =sd_s=sd1i), and set all other arguments including m2i, sd2i, ri as 0, and then calculated the standardized mean change using change score (SMCC). I am not sure if all these settings are correct. This is correct. The escalc() function still will compute (m1i- m2i)/sqrt(sd1i^2 + sd2i^2 - 2*ri*sd1i*sd2i), but since m2i=sd2i=ri=0, this is equivalent to mean_chan ge / SD_change, which is what you want. Make sure that mean_s is NOT the standard error (SE) of the change scores, but really the SD. 2) A few studies have specified individual values of m1i, m2i, sd1i, sd2 i , but did not report the change score or its sd. So can I set r=0 and use these values to calculate SMCC? Since SMCC is not calculated in the same way like 1), will this be a problem? Yes, this will be a problem, since you now really assume that r=0, which i s not correct. Maybe you can back- calculate r from other information (e.g., the p or t value from a t-test - - see https://stat.ethz.ch/pipermail/r-help/2012- April/308946.html). Or you could try to get r from the authors (then you c ould also just directly ask for the change score mean and SD). If that is not successful, you will have to impute some kind of reasonable value for r and do a sensitivity analysis in the end. 3) some studies reported the percentage mean changes instead of raw mean change (percentage change=(value of after-intervention - value of before intervention) / value of before intervention), I think it may not be the right way to simply substitute the raw mean change with the percentage mean changes. Is there any method to deal with this problem? Don't know anything off the top of my head. Any comments are welcome. With best regards. -- Qiang Yue __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Merge
-- Forwarded message -- From: Keniajin Wambui kiang...@gmail.com Date: Thu, May 23, 2013 at 11:36 AM Subject: Merge To: r-help@r-project.org I am using R 3.01 on R Studio to merge two data sets with approx 120 variables and the other with 140 variables but with a serialno as the unique identifier. i.e Serialno name year outcome 1 ken1989 d 2 mary 1989a 4 john1989 a 5 tom 1989 a 6 jolly 1989 d and Serialno name year disch_type 11 mwai1990 d 21 wanjiku 1990a 43 maina1990 a 55john 1990 a 67 welly 1990 d How can I merge them to a common data set without having name.x and name.y or year.x and year.y after merging -- Mega Six Solutions Web Designer and Research Consultant Kennedy Mwai 25475211786 -- Mega Six Solutions Web Designer and Research Consultant Kennedy Mwai 25475211786 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Transform Coordinate System of an ASCII-Grid
Hello, You question is a bit unclear. Do you just want to change to decimal degrees? Can you please provide an example of your code and include a small example ascii. On Thu, May 23, 2013 at 5:44 PM, jas jacqueline.schwei...@wuestundpartner.com wrote: Dear all, I have an ASCII-Grid for Switzerland in the Swiss National Coordinate System of CH1903. Now for a Webapplication of the ASCII-Grid, I need to deliver the ASCII-Grid in the WGS84 System. Via coordinates(ascii) I can export the coordinates and convert them with a formula into WGS84. My problem is now, how can I implement these into the ASCII-Grid, so that the whole grid-structure is from now on gonna be saved in the WGS84-coordinate format? (important: I don't want to change the projection, I want to actually change the numeric format of the coordinates) Thank you so much for your help, jas -- View this message in context: http://r.789695.n4.nabble.com/Transform-Coordinate-System-of-an-ASCII-Grid-tp4667786.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Daisy Englert Duursma Department of Biological Sciences Room E8C156 Macquarie University, North Ryde, NSW 2109 Australia Tel +61 2 9850 9256 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] point.in.polygon help
It would be useful to know what your ultimate goal is. On Wed, May 22, 2013 at 6:29 AM, karengrace84 kgfis...@alumni.unc.edu wrote: I am new to mapping with R, and I would like to use the point.in.polygon function from the sp package, but I am unsure of how to get my data in the correct format for the function. The generic form of the function is as follows: point.in.polygon(point.x, point.y, pol.x, pol.y, mode.checked=FALSE) I have no problem with the point.x and point.y inputs. I have a list of gps longitudes and latitudes that will go in fine. My problem is with the pol.x and pol.y input. My polygon is currently in the form of a SpatialPolygonsDataFrame created by inputting shp files with the rgdal package. How do I get a numerical array of the x- and y-coordinates from my polygon that will go into the point.in.polygon function? -- View this message in context: http://r.789695.n4.nabble.com/point-in-polygon-help-tp4667645.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Daisy Englert Duursma Department of Biological Sciences Room E8C156 Macquarie University, North Ryde, NSW 2109 Australia Tel +61 2 9850 9256 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SEM: multigroup model
Dear R Gurus, I am trying to run a multigroup SEM using Prof. John Fox's SEM package. The two groups are Ready to Eat denoted by RTE and Ready to Cook denoted by RTC. I ran a omnibus CFA on the data of consumer perceptions preferences and am satisfied with what I got. When I tried to do a multigroup SEM - my understanding is limited to the SEM manual in CRAN - using the code below, I get the following message: Error in summary.msemObjectiveML(sem.MG) : no 'dimnames' attribute for array Execution halted The relevant part of my code follows: mod.mg - multigroupModel(sbmod.cfa,groups=c(RTC,RTE)) sem.MG - sem(mod.mg,data=srt,group=RTind, formula = ~ inv + imp + tch + emo + loy + usg + sig + dif + ndif + vda + vdb + vdc + vdd + vde + vdf + vdg + vdh + riskT + riskP + riskS + riskFi + riskFu + riskPs ) summary(sem.MG) I was expecting two sets of fit indices for RTE RTC and want to do an ANOVA across the models; as well as possibly check for loading equivalence. Can somebody please throw some light on where I am making a mistake? Thanks Amarnath Bose -- *Amarnath Bose* * **Associate Professor * *Decision Sciences Department* *Birla Institute of Management Technology * Tel: +91 120 2323001 - 10 Ext.: 398 Cell: +91 9873179813 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Transform Coordinate System of an ASCII-Grid
On Thu, May 23, 2013 at 8:44 AM, jas jacqueline.schwei...@wuestundpartner.com wrote: Dear all, I have an ASCII-Grid for Switzerland in the Swiss National Coordinate System of CH1903. Now for a Webapplication of the ASCII-Grid, I need to deliver the ASCII-Grid in the WGS84 System. Via coordinates(ascii) I can export the coordinates and convert them with a formula into WGS84. My problem is now, how can I implement these into the ASCII-Grid, so that the whole grid-structure is from now on gonna be saved in the WGS84-coordinate format? (important: I don't want to change the projection, I want to actually change the numeric format of the coordinates) You can't change the numeric format of the coordinates without changing the projection (unless changing from km to m). In your original coordinate system your grid is a bunch of rectangles with straight sides and right angles. In your WGS84 system the squares are no longer square, the sides are no longer straight, and the angles are no longer 90 degrees. This is all too complicated for a simple grid data structure to comprehend. The solution may be to reproject your grid. This is a transformation of values, much like stretching an image file, from one grid to another. raster:projectRaster can do this for you. For a dataset with a small extent, for some small values of small, you may be able to get away with transforming the corner coordinates and ignoring the fact that the earth is not flat. But this will make everyone who thinks the earth is round cry. You should also look into the raster package for more info. You've not said what you're using to read the data. You should probably ask in r-sig-geo anyway, where the mappers hang out. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SEM: multigroup model
Dear Amarnath Bose, There's nothing obviously wrong with the commands that you report -- in fact, your commands have the same structure as the multigroup SEM example in ?sem -- so the usual advice about including reproducible code producing the error applies. If you like, you could send me your data and the complete R script that you used. Best, John --- John Fox Senator McMaster Professor of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Amarnath Bose Sent: Thursday, May 23, 2013 6:54 AM To: r-help@r-project.org Subject: [R] SEM: multigroup model Dear R Gurus, I am trying to run a multigroup SEM using Prof. John Fox's SEM package. The two groups are Ready to Eat denoted by RTE and Ready to Cook denoted by RTC. I ran a omnibus CFA on the data of consumer perceptions preferences and am satisfied with what I got. When I tried to do a multigroup SEM - my understanding is limited to the SEM manual in CRAN - using the code below, I get the following message: Error in summary.msemObjectiveML(sem.MG) : no 'dimnames' attribute for array Execution halted The relevant part of my code follows: mod.mg - multigroupModel(sbmod.cfa,groups=c(RTC,RTE)) sem.MG - sem(mod.mg,data=srt,group=RTind, formula = ~ inv + imp + tch + emo + loy + usg + sig + dif + ndif + vda + vdb + vdc + vdd + vde + vdf + vdg + vdh + riskT + riskP + riskS + riskFi + riskFu + riskPs ) summary(sem.MG) I was expecting two sets of fit indices for RTE RTC and want to do an ANOVA across the models; as well as possibly check for loading equivalence. Can somebody please throw some light on where I am making a mistake? Thanks Amarnath Bose -- *Amarnath Bose* * **Associate Professor * *Decision Sciences Department* *Birla Institute of Management Technology * Tel: +91 120 2323001 - 10 Ext.: 398 Cell: +91 9873179813 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Transform Coordinate System of an ASCII-Grid
Hello Berry, thank you for your reply. yes, the flat versus round earth projection is a difficulty, as my grid isn't that far spread out, I thought I would just use the method anyways. I usually use raster or maptools (readAsciiGrid). I am gonna look in to the mapper's forum, thank you for that tipp :) Jacqueline -- View this message in context: http://r.789695.n4.nabble.com/Transform-Coordinate-System-of-an-ASCII-Grid-tp4667786p4667799.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: Merge
Hello, Try the following. rm(list = ls()) dat1 - read.table(text = Serialno name year outcome 1 ken1989 d 2 mary 1989a 4 john1989 a 5 tom 1989 a 6 jolly 1989 d , header = TRUE, stringsAsFactors = FALSE) dat2 - read.table(text = Serialno name year disch_type 11 mwai1990 d 21 wanjiku 1990a 43 maina1990 a 55john 1990 a 67 welly 1990 d , header = TRUE, stringsAsFactors = FALSE) res - merge(dat1[, c(1, 4)], dat2[, c(1, 4)], all = TRUE) res - merge(merge(res, dat1, all.y = TRUE), merge(res, dat2, all.y = TRUE), all = TRUE) res - res[, c(1, 4, 5, 2, 3)] res Hope this helps, Rui Barradas Em 23-05-2013 09:41, Keniajin Wambui escreveu: -- Forwarded message -- From: Keniajin Wambui kiang...@gmail.com Date: Thu, May 23, 2013 at 11:36 AM Subject: Merge To: r-help@r-project.org I am using R 3.01 on R Studio to merge two data sets with approx 120 variables and the other with 140 variables but with a serialno as the unique identifier. i.e Serialno name year outcome 1 ken1989 d 2 mary 1989a 4 john1989 a 5 tom 1989 a 6 jolly 1989 d and Serialno name year disch_type 11 mwai1990 d 21 wanjiku 1990a 43 maina1990 a 55john 1990 a 67 welly 1990 d How can I merge them to a common data set without having name.x and name.y or year.x and year.y after merging -- Mega Six Solutions Web Designer and Research Consultant Kennedy Mwai 25475211786 -- Mega Six Solutions Web Designer and Research Consultant Kennedy Mwai 25475211786 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows without loops
Thank you Blaser: This is the exact solution I came up with but when comparing 8M rows even on an 8G machine, one runs out of memory. To run this effectively, I have to break the DF into smaller DFs, loop through them and then do a massive rmerge at the end. That's what takes 8+ hours to compute. Even the bigmemory package is causing OOM issues. -Original Message- From: Blaser Nello [mailto:nbla...@ispm.unibe.ch] Sent: Thursday, May 23, 2013 12:15 AM To: Adeel Amin; r-help@r-project.org Subject: RE: [R] adding rows without loops Merge should do the trick. How to best use it will depend on what you want to do with the data after. The following is an example of what you could do. This will perform best, if the rows are missing at random and do not cluster. DF1 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:5,7:9)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DF2 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:8)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DFm - merge(DF1, DF2, by=c(X.DATE, X.TIME), all=TRUE) while(any(is.na(DFm))){ if (any(is.na(DFm[1,]))) stop(Complete first row required!) ind - which(is.na(DFm), arr.ind=TRUE) prind - matrix(c(ind[,row]-1, ind[,col]), ncol=2) DFm[is.na(DFm)] - DFm[prind] } DFm Best, Nello -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Adeel Amin Sent: Donnerstag, 23. Mai 2013 07:01 To: r-help@r-project.org Subject: [R] adding rows without loops I'm comparing a variety of datasets with over 4M rows. I've solved this problem 5 different ways using a for/while loop but the processing time is murder (over 8 hours doing this row by row per data set). As such I'm trying to find whether this solution is possible without a loop or one in which the processing time is much faster. Each dataset is a time series as such: DF1: X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 070045 35 6 01052007 080042 32 7 01052007 090045 32 ... ... ... n DF2 X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 35 6 01052007 070042 32 7 01052007 080045 32 ... ... n+4000 In other words there are 4000 more rows in DF2 then DF1 thus the datasets are of unequal length. I'm trying to ensure that all dataframes have the same number of X.DATE and X.TIME entries. Where they are missing, I'd like to insert a new row. In the above example, when comparing DF2 to DF1, entry 01052007 0600 entry is missing in DF1. The solution would add a row to DF1 at the appropriate index. so new dataframe would be X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 27 6 01052007 070045 35 7 01052007 080042 32 8 01052007 090045 32 Value and Value2 would be the same as row 4. Of course this is simple to accomplish using a row by row analysis but with of 4M rows the processing time destroying and rebinding the datasets is very time consuming and I believe highly un-R'ish. What am I missing? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] point.in.polygon help
I am looking at fish tagging data. I have gps coordinates of where each fish was tagged and released, and I have a map of 10 coastal basins of the state of Louisiana. I am trying to determine which basin each fish was tagged in. -- View this message in context: http://r.789695.n4.nabble.com/point-in-polygon-help-tp4667645p4667808.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows...
Hi Rainer: Thanks for the reply. Posting the large dataset is a task. There are 8M rows between the two of them and the first discrepancy in the data doesn't happen until at least the 40,000th row on each dataframe. The examples I posted are a pretty good abstraction of the root of the issue. The problem isn't the data. The problem is Out Of Memory issues when doing any operations like merge, rbind, etc. The solution that Blaser suggested in his post works great, but the systems quickly run out of memory. What does work without OOM issues are for/while loops but on average take an inordinate time to compute and tie up a machine for hours and hours at time. Essentially I break the data apart, add rows and rebind. It's a brute force type of approach and run times are in excess of 48 hours for one full iteration across 25 data frames. Terrible. I am about to go down the road of using data.tables class as its far more memory efficient, but the documentation is cryptic. Your idea of creating a super set has some merit and it's what I was experimenting with prior to my original post. -Original Message- From: Rainer Schuermann [mailto:rainer.schuerm...@gmx.net] Sent: Thursday, May 23, 2013 12:19 AM To: Adeel Amin Subject: adding rows... Can I suggest that you post the output of dput( DF1 ) dput( DF2 ) rather than pictures of your data? Any solution attempt will depend upon the data types... Just shooting in the dark: Have you tried just row-binding the missing 4k lines to DF1 and then order DF1 as you like? It looks as if the data are ordered by time / date? Rgds, Rainer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: Merge
You could also do: library(plyr) res1-join(dat1,dat2,type=full) res1 # Serialno name year outcome disch_type #1 1 ken 1989 d NA #2 2 mary 1989 a NA #3 4 john 1989 a NA #4 5 tom 1989 a NA #5 6 jolly 1989 d NA #6 11 mwai 1990 NA d #7 21 wanjiku 1990 NA a #8 43 maina 1990 NA a #9 55 john 1990 NA a #10 67 welly 1990 NA d identical(res,res1) #[1] TRUE #or lst1-list(dat1,dat2) Reduce(function(...) merge(...,by=c(Serialno,name,year),all=TRUE),lst1) # Serialno name year outcome disch_type #1 1 ken 1989 d NA #2 2 mary 1989 a NA #3 4 john 1989 a NA #4 5 tom 1989 a NA #5 6 jolly 1989 d NA #6 11 mwai 1990 NA d #7 21 wanjiku 1990 NA a #8 43 maina 1990 NA a #9 55 john 1990 NA a #10 67 welly 1990 NA d A.K. - Original Message - From: Rui Barradas ruipbarra...@sapo.pt To: Keniajin Wambui kiang...@gmail.com Cc: r-help@r-project.org Sent: Thursday, May 23, 2013 8:36 AM Subject: Re: [R] Fwd: Merge Hello, Try the following. rm(list = ls()) dat1 - read.table(text = Serialno name year outcome 1 ken 1989 d 2 mary 1989 a 4 john 1989 a 5 tom 1989 a 6 jolly 1989 d , header = TRUE, stringsAsFactors = FALSE) dat2 - read.table(text = Serialno name year disch_type 11 mwai 1990 d 21 wanjiku 1990 a 43 maina 1990 a 55 john 1990 a 67 welly 1990 d , header = TRUE, stringsAsFactors = FALSE) res - merge(dat1[, c(1, 4)], dat2[, c(1, 4)], all = TRUE) res - merge(merge(res, dat1, all.y = TRUE), merge(res, dat2, all.y = TRUE), all = TRUE) res - res[, c(1, 4, 5, 2, 3)] res Hope this helps, Rui Barradas Em 23-05-2013 09:41, Keniajin Wambui escreveu: -- Forwarded message -- From: Keniajin Wambui kiang...@gmail.com Date: Thu, May 23, 2013 at 11:36 AM Subject: Merge To: r-help@r-project.org I am using R 3.01 on R Studio to merge two data sets with approx 120 variables and the other with 140 variables but with a serialno as the unique identifier. i.e Serialno name year outcome 1 ken 1989 d 2 mary 1989 a 4 john 1989 a 5 tom 1989 a 6 jolly 1989 d and Serialno name year disch_type 11 mwai 1990 d 21 wanjiku 1990 a 43 maina 1990 a 55 john 1990 a 67 welly 1990 d How can I merge them to a common data set without having name.x and name.y or year.x and year.y after merging -- Mega Six Solutions Web Designer and Research Consultant Kennedy Mwai 25475211786 -- Mega Six Solutions Web Designer and Research Consultant Kennedy Mwai 25475211786 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using metafor for meta-analysis of before-after studies (escalc, SMCC)
Dear Dr. Viechtbauer: Thanks so much! Now all these issues are clear. With best regards. Qiang Yue From: Viechtbauer Wolfgang (STAT) Date: 2013-05-23 05:06 To: qiangmoon; r-help Subject: RE: RE: [R] using metafor for meta-analysis of before-after studies (escalc, SMCC) The mean percentage change and the raw mean change are not directly comparable, even after standardization based on the SD of the percentage change or raw change values. So, I would not mix those in the same analysis. Best, Wolfgang -Original Message- From: Qiang Yue [mailto:qiangm...@gmail.com] Sent: Wednesday, May 22, 2013 20:38 To: Viechtbauer Wolfgang (STAT); r-help Subject: Re: RE: [R] using metafor for meta-analysis of before-after studies (escalc, SMCC) Dear Dr. Viechtbauer: Thank you very much for sparing your precious time to answer my question. I still want to make sure for the third question below: for studies which only reported percentage changes (something like: the metabolite concentration increased by 20%+/-5% after intervention), we can not use the percentage change to calculate SMCC, but have to get the raw change first? With best wishes. Qiang Yue From: Viechtbauer Wolfgang (STAT) Date: 2013-05-21 10:09 To: Moon Qiang; r-help Subject: RE: [R] using metafor for meta-analysis of before-after studies (escalc, SMCC) Please see my answers below. Best, Wolfgang -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Moon Qiang Sent: Thursday, May 16, 2013 19:12 To: r-help Subject: [R] using metafor for meta-analysis of before-after studies (escalc, SMCC) Hello. I am trying to perform meta-analysis on some before-after studies. These studies are designed to clarify if there is any significant metabolic change before and after an intervention. There is only one group in thes e studies, i.e., no control group. I followed the e-mail communication of R-help (https://stat.ethz.ch/pipermail/r-help/2012- April/308946.html ) and the Metafor Manual (version 1.8-0, released 2013-04- 11, relevant contents can be found on pages of 59-61 under 'Outcome Measures for Individual Groups '). I made a trial analysis and attached the output here, I wonde r if anyone can look through it and give me some comments. I have three questions about the analysis: 1) Most studies reported the before-and-after raw change as Mean+/- SD, but few of them have reported the values of before-intervention (mean_r and sd_r) and the values of after- intervention (mean_s and sd_s), and none of them reported the r value (correlation for the before- and after- intervention measurements). Based on the guideline of the Metafor manual , I set the raw mean change as m1i (i.e., raw mean change=mean_s=m1i), and s et the standard deviation of raw change as sd1i (i.e., the standard deviati on of raw change =sd_s=sd1i), and set all other arguments including m2i, sd2i, ri as 0, and then calculated the standardized mean change using change score (SMCC). I am not sure if all these settings are correct. This is correct. The escalc() function still will compute (m1i- m2i)/sqrt(sd1i^2 + sd2i^2 - 2*ri*sd1i*sd2i), but since m2i=sd2i=ri=0, this is equivalent to mean_chan ge / SD_change, which is what you want. Make sure that mean_s is NOT the standard error (SE) of the change scores, but really the SD. 2) A few studies have specified individual values of m1i, m2i, sd1i, sd2 i , but did not report the change score or its sd. So can I set r=0 and use these values to calculate SMCC? Since SMCC is not calculated in the same way like 1), will this be a problem? Yes, this will be a problem, since you now really assume that r=0, which i s not correct. Maybe you can back- calculate r from other information (e.g., the p or t value from a t-test - - see https://stat.ethz.ch/pipermail/r-help/2012- April/308946.html). Or you could try to get r from the authors (then you c ould also just directly ask for the change score mean and SD). If that is not successful, you will have to impute some kind of reasonable value for r and do a sensitivity analysis in the end. 3) some studies reported the percentage mean changes instead of raw mean change (percentage change=(value of after-intervention - value of before intervention) / value of before intervention), I think it may not be the right way to simply substitute the raw mean change with the percentage mean changes. Is there any method to deal with this problem? Don't know anything off the top of my head. Any comments are welcome. With best regards. -- Qiang Yue [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help
Re: [R] ordered and unordered variables
Meng, This really comes down to what question you are trying to answer. Before worrying about details of default contrasts and issues like that you first need to work out what is really the question of interest. The main difference between declaring a variable ordered or not is the default contrasts. Defaults are provided because there are many cases where which contrasts are used internally does not matter, so why make someone think about it. In cases where the choice of contrasts matter, it is rare that any default coding is the correct/best choice and you should really think through what contrasts answer the question of interest and use those custom contrasts. For example, to answer the question if Tension has any overall effect it does not matter which contrast encoding you use (as long as it is full rank), the test statistic and p-value for testing the whole effect will be the same. The predictions of the means of groups will also be the same regardless of which contrasts are used (and this is often a clearer way to present/explain the results). A case where the specific contrasts would matter would be if we want to see if we can reduce the number of groups by combining groups together, or interpolate to certain groups. The treatment contrasts will test if low and medium can be combined (which makes sense) and if low and high can be combined (which does not make sense unless the first is true and in fact the overall factor is not significant), what makes more sense would be to compare low to medium and medium to high (it could be that low is different from the other 2, but med and high can be combined). The polynomial contrasts give a different view, the quadratic term in this case tests whether the medium group is the average of the low group and the high group (so we could interpolate medium), this only makes sense if the medium tension is centered (in some sense) between the other 2, i.e. the difference from low to medium is exactly the same as the difference from medium to high, but if that were the case then I would expect a numerical term rather than an ordered factor. So, to summarize, it depends on the question of interest. For some questions the contrasts don't matter, in which case it does not matter, in other cases the correct contrasts to use are determined by the question and you should use the contrasts that answer that question (which are rarely a default). On Tue, May 21, 2013 at 11:09 PM, meng laomen...@163.com wrote: Thanks. As to the data warpbreaks, if I want to analysis the impact of tension(L,M,H) on breaks, should I order the tension or not? Many thanks. At 2013-05-21 20:55:18,David Winsemius dwinsem...@comcast.net wrote: On May 20, 2013, at 10:35 PM, meng wrote: Hi all: If the explainary variables are ordinal,the result of regression is different from unordered variables.But I can't understand the result of regression from ordered variable. The data is warpbreaks,which belongs to R. If I use the unordered variable(tension):Levels: L M H The result is easy to understand: Estimate Std. Error t value Pr(|t|) (Intercept)36.39 2.80 12.995 2e-16 *** tensionM -10.00 3.96 -2.525 0.014717 * tensionH -14.72 3.96 -3.718 0.000501 *** If I use the ordered variable(tension):Levels: L M H I don't know how to explain the result: Estimate Std. Error t value Pr(|t|) (Intercept) 28.148 1.617 17.410 2e-16 *** tension.L-10.410 2.800 -3.718 0.000501 *** tension.Q 2.155 2.800 0.769 0.445182 What's tension.L and tension.Q stands for?And how to explain the result then? Ordered factors are handled by the R regression mechanism with orthogonal polynomial contrasts: .L for linear and .Q for quadratic. If the term had 4 levels there would also have been a .C (cubic) term. Treatment contrasts are used for unordered factors. Generally one would want to do predictions for explanations of the results. Trying to explain the individual coefficient values from polynomial contrasts is similar to and just as unproductive as trying to explain the individual coefficients involving interaction terms. -- David Winsemius Alameda, CA, USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows without loops
Using the data generated with your code below, does rbind( DF1, DF2[ !(DF2$X.TIME %in% DF1$X.TIME), ] ) DF1 - DF1[ order( DF1$X.DATE, DF1$X.TIME ), ] do the job? Rgds, Rainer On Thursday 23 May 2013 05:54:26 Adeel - SafeGreenCapital wrote: Thank you Blaser: This is the exact solution I came up with but when comparing 8M rows even on an 8G machine, one runs out of memory. To run this effectively, I have to break the DF into smaller DFs, loop through them and then do a massive rmerge at the end. That's what takes 8+ hours to compute. Even the bigmemory package is causing OOM issues. -Original Message- From: Blaser Nello [mailto:nbla...@ispm.unibe.ch] Sent: Thursday, May 23, 2013 12:15 AM To: Adeel Amin; r-help@r-project.org Subject: RE: [R] adding rows without loops Merge should do the trick. How to best use it will depend on what you want to do with the data after. The following is an example of what you could do. This will perform best, if the rows are missing at random and do not cluster. DF1 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:5,7:9)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DF2 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:8)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DFm - merge(DF1, DF2, by=c(X.DATE, X.TIME), all=TRUE) while(any(is.na(DFm))){ if (any(is.na(DFm[1,]))) stop(Complete first row required!) ind - which(is.na(DFm), arr.ind=TRUE) prind - matrix(c(ind[,row]-1, ind[,col]), ncol=2) DFm[is.na(DFm)] - DFm[prind] } DFm Best, Nello -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Adeel Amin Sent: Donnerstag, 23. Mai 2013 07:01 To: r-help@r-project.org Subject: [R] adding rows without loops I'm comparing a variety of datasets with over 4M rows. I've solved this problem 5 different ways using a for/while loop but the processing time is murder (over 8 hours doing this row by row per data set). As such I'm trying to find whether this solution is possible without a loop or one in which the processing time is much faster. Each dataset is a time series as such: DF1: X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 070045 35 6 01052007 080042 32 7 01052007 090045 32 ... ... ... n DF2 X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 35 6 01052007 070042 32 7 01052007 080045 32 ... ... n+4000 In other words there are 4000 more rows in DF2 then DF1 thus the datasets are of unequal length. I'm trying to ensure that all dataframes have the same number of X.DATE and X.TIME entries. Where they are missing, I'd like to insert a new row. In the above example, when comparing DF2 to DF1, entry 01052007 0600 entry is missing in DF1. The solution would add a row to DF1 at the appropriate index. so new dataframe would be X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 27 6 01052007 070045 35 7 01052007 080042 32 8 01052007 090045 32 Value and Value2 would be the same as row 4. Of course this is simple to accomplish using a row by row analysis but with of 4M rows the processing time destroying and rebinding the datasets is very time consuming and I believe highly un-R'ish. What am I missing? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert a character string to a name
Here are a couple of approaches: dftest-data.frame(x=1:12, y=(1:12)%%4, z=(1:12)%%2) x_test=c(x,y) aggregate( dftest[,x_test], dftest['z'], FUN=mean ) z x y 1 0 7 1 2 1 6 2 ### Or tmp.f - as.formula( paste( 'cbind(', + paste( x_test, collapse=',' ), + ') ~ z' ) ) aggregate( tmp.f, data=dftest, FUN=mean ) z x y 1 0 7 1 2 1 6 2 The first just uses x_test to subset the data frame and sends the constructed subset to aggregate. The second constructs the formula from the strings and passes the formula to aggregate. On Thu, May 23, 2013 at 1:05 AM, jpm miao miao...@gmail.com wrote: Hi, From time to time I need to do the aggregation. To illustrate, I present a toy example as below. In this example, the task is to aggregate x and y by z with the function mean. Could I call the aggregation function with x_test, where x_test=c(x,y)? Thanks Miao dftest-data.frame(x=1:12, y=(1:12)%%4, z=(1:12)%%2) dftest x y z 1 1 1 1 2 2 2 0 3 3 3 1 4 4 0 0 5 5 1 1 6 6 2 0 7 7 3 1 8 8 0 0 9 9 1 1 10 10 2 0 11 11 3 1 12 12 0 0 aggregate(cbind(x,y)~z, data=dftest, FUN=mean) z x y 1 0 7 1 2 1 6 2 x_test=c(x,y) aggregate(cbind(x_test)~z, data=dftest, FUN=mean) Error in model.frame.default(formula = cbind(x_test) ~ z, data = dftest) : variable lengths differ (found for 'z') a1aggregate(cbind(factor(x_test))~z, data=dftest, FUN=mean) Error in model.frame.default(formula = cbind(factor(x_test)) ~ z, data = dftest) : variable lengths differ (found for 'z') aggregate(factor(x_test)~z, data=dftest, FUN=mean) Error in model.frame.default(formula = factor(x_test) ~ z, data = dftest) : variable lengths differ (found for 'z') [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Removing rows w/ smaller value from data frame
Hello, I have a column called max_date in my data frame and I only want to keep the bigger values for the same activity. How can I do that? data frame: activitymax_dt A2013-03-05 B 2013-03-28 A 2013-03-28 C 2013-03-28 B 2013-03-01 Thank you for your help -- View this message in context: http://r.789695.n4.nabble.com/Removing-rows-w-smaller-value-from-data-frame-tp4667816.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing rows w/ smaller value from data frame
Hi change max_dt do PISIX class and use standard comparison operator and use the result for selecting rows. s-seq(c(ISOdate(2000,3,20)), by = day, length.out = 10) ss[5] [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE Regards Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of ramoss Sent: Thursday, May 23, 2013 4:24 PM To: r-help@r-project.org Subject: [R] Removing rows w/ smaller value from data frame Hello, I have a column called max_date in my data frame and I only want to keep the bigger values for the same activity. How can I do that? data frame: activitymax_dt A2013-03-05 B 2013-03-28 A 2013-03-28 C 2013-03-28 B 2013-03-01 Thank you for your help -- View this message in context: http://r.789695.n4.nabble.com/Removing- rows-w-smaller-value-from-data-frame-tp4667816.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error message solution: cannot allocate vector of size 200Mb
Dear All, I wrote a program using R 2.15.2 but this error message cannot allocate vector of size 200Mb appeared. I want to ask in general how to handle this situation. I try to run the same program on other computers. It is perfectly fine. Can anybody help? Thank you very much in advance. Best Regards, Ray [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error message solution: cannot allocate vector of size 200Mb
Try in R 64 bit. Thanks Gyanendra Pokharel University of Guelph Guelph, ON On Thu, May 23, 2013 at 10:53 AM, Ray Cheung ray1...@gmail.com wrote: Dear All, I wrote a program using R 2.15.2 but this error message cannot allocate vector of size 200Mb appeared. I want to ask in general how to handle this situation. I try to run the same program on other computers. It is perfectly fine. Can anybody help? Thank you very much in advance. Best Regards, Ray [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] group data based on row value
The OP indicated that the middle group should be closed on both ends, i.e. [0.1, 0.6]. dat2 - rbind(dat, 0.1, 0.6) dat2$group - factor(ifelse(dat2$Var.1, A, ifelse(dat2$Var.6, C, B))) dat2 Var group 1 0.0 A 2 0.2 B 3 0.5 B 4 1.0 C 5 4.0 C 6 6.0 C 7 0.1 B 8 0.6 B Does it but would be clumsy for more than three groups. Depending on the precision of the numbers something like dat2$group - cut( dat2$Var, breaks=c(-Inf, 0.1-.0001, 0.6+.0001, Inf), labels=LETTERS[1:3]) dat2 Var group 1 0.0 A 2 0.2 B 3 0.5 B 4 1.0 C 5 4.0 C 6 6.0 C 7 0.1 B 8 0.6 B would also work. - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jeff Newmiller Sent: Wednesday, May 22, 2013 5:27 PM To: Ye Lin; R help Subject: Re: [R] group data based on row value dat$group - cut( dat$Var, breaks=c(-Inf,0.1, 0.6,Inf)) levels(dat$group) - LETTERS[1:3] --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Ye Lin ye...@lbl.gov wrote: hey, I want to divide my data into three groups based on the value in one column with group name. dat: Var 0 0.2 0.5 1 4 6 I tried: dat - cbind(dat, group=cut(dat$Var, breaks=c(0.1,0.6))) But it doesnt work, I want to group those 0.1 as group A, 0.1-0.6 as group B, 0.6 as group C Thanks for your help! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sample(c(0, 1)...) vs. rbinom
After a bit of playing around, I discovered that sample() does something similar in other situations: set.seed(105021) sample(1:5,1,prob=c(1,1,1,1,1)) [1] 3 set.seed(105021) sample(1:5,1) [1] 2 set.seed(105021) sample(1:5,5,prob=c(1,1,1,1,1)) [1] 3 4 2 1 5 set.seed(105021) sample(1:5,5) [1] 2 5 1 4 3 albyn On 2013-05-22 22:24, peter dalgaard wrote: On May 23, 2013, at 07:01 , Jeff Newmiller wrote: You seem to be building an elaborate structure for testing the reproducibility of the random number generator. I suspect that rbinom is calling the random number generator a different number of times when you pass prob=0.5 than otherwise. Nope. It's switching 0 and 1: set.seed(1); sample(0:1,10,replace=TRUE,prob=c(1-pp,pp)); set.seed(1); rbinom(10,1,pp) [1] 1 1 0 0 1 0 0 0 0 1 [1] 0 0 1 1 0 1 1 1 1 0 which is curious, but of course has no implication for the distributional properties. Curiouser, if you drop the prob= in sample. set.seed(1); sample(0:1,10,replace=TRUE); set.seed(1); rbinom(10,1,pp) [1] 0 0 1 1 0 1 1 1 1 0 [1] 0 0 1 1 0 1 1 1 1 0 However, it was never a design goal that two different random functions (or even two code paths within the same function) should give exactly the same values, even if they simulate the same distribution, so this is nothing more than a curiosity. Appendix A: some R code that exhibits the problem = ppp - seq(0, 1, by = 0.01) result - do.call(rbind, lapply(ppp, function(p) { set.seed(1) sampleRes - sample(c(0, 1), size = 1, replace = TRUE, prob=c(1-p, p)) set.seed(1) rbinomRes - rbinom(1, size = 1, prob = p) data.frame(prob = p, equivalent = all(sampleRes == rbinomRes)) })) result Appendix B: the output from the R code == prob equivalent 1 0.00 TRUE 2 0.01 TRUE 3 0.02 TRUE 4 0.03 TRUE 5 0.04 TRUE 6 0.05 TRUE 7 0.06 TRUE 8 0.07 TRUE 9 0.08 TRUE 10 0.09 TRUE 11 0.10 TRUE 12 0.11 TRUE 13 0.12 TRUE 14 0.13 TRUE 15 0.14 TRUE 16 0.15 TRUE 17 0.16 TRUE 18 0.17 TRUE 19 0.18 TRUE 20 0.19 TRUE 21 0.20 TRUE 22 0.21 TRUE 23 0.22 TRUE 24 0.23 TRUE 25 0.24 TRUE 26 0.25 TRUE 27 0.26 TRUE 28 0.27 TRUE 29 0.28 TRUE 30 0.29 TRUE 31 0.30 TRUE 32 0.31 TRUE 33 0.32 TRUE 34 0.33 TRUE 35 0.34 TRUE 36 0.35 TRUE 37 0.36 TRUE 38 0.37 TRUE 39 0.38 TRUE 40 0.39 TRUE 41 0.40 TRUE 42 0.41 TRUE 43 0.42 TRUE 44 0.43 TRUE 45 0.44 TRUE 46 0.45 TRUE 47 0.46 TRUE 48 0.47 TRUE 49 0.48 TRUE 50 0.49 TRUE 51 0.50 FALSE 52 0.51 TRUE 53 0.52 TRUE 54 0.53 TRUE 55 0.54 TRUE 56 0.55 TRUE 57 0.56 TRUE 58 0.57 TRUE 59 0.58 TRUE 60 0.59 TRUE 61 0.60 TRUE 62 0.61 TRUE 63 0.62 TRUE 64 0.63 TRUE 65 0.64 TRUE 66 0.65 TRUE 67 0.66 TRUE 68 0.67 TRUE 69 0.68 TRUE 70 0.69 TRUE 71 0.70 TRUE 72 0.71 TRUE 73 0.72 TRUE 74 0.73 TRUE 75 0.74 TRUE 76 0.75 TRUE 77 0.76 TRUE 78 0.77 TRUE 79 0.78 TRUE 80 0.79 TRUE 81 0.80 TRUE 82 0.81 TRUE 83 0.82 TRUE 84 0.83 TRUE 85 0.84 TRUE 86 0.85 TRUE 87 0.86 TRUE 88 0.87 TRUE 89 0.88 TRUE 90 0.89 TRUE 91 0.90 TRUE 92 0.91 TRUE 93 0.92 TRUE 94 0.93 TRUE 95 0.94 TRUE 96 0.95 TRUE 97 0.96 TRUE 98 0.97 TRUE 99 0.98 TRUE 100 0.99 TRUE 101 1.00 TRUE Appendix C: Session information === sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
[R] Could graph objects be stored in a two-dimensional list?
Hi, I have a few graph objects created by some graphic package (say, ggplot2, which I use frequently). Because of the existent relation between the graphs, I'd like to index them in two dimensions as p[1,1], p[1,2], p[2,1], p[2,2] for convenience. To my knowledge, the only data type capable of storing graph objects (and any R object) is list, but unfortunately it is available in only one dimension. Could the graphs be stored in any two-dimensional data type? One remedy that comes to my mind is to build a function f so that f(1,1)=1 f(1,2)=2 f(2,1)=3 f(2,2)=4 With functions f and f^{-1} (inverse function of f) , the two-dimensional indices could be mapped to and from a set of one-dimensional indices, and the functions are exactly the way R numbers elements in a matrix. Does R have this built-in function for a m by n matrix or more generally, m*n*p array? (I know this function is easy to write, but just want to make sure whether it exists already) Thanks, Miao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calcul of the mean in a period of time
HI GG, I should had checked with multiple t=0 only rows. Apologies! Check if this work: (Changed the thread name as the solution applies to that problem) dat2- read.csv(dat6.csv,header=TRUE,sep=\t,row.names=1) str(dat2) #'data.frame': 3896 obs. of 3 variables: # $ patient_id: int 2 2 2 2 2 2 2 2 2 2 ... # $ t : int 0 1 2 3 4 5 6 7 8 9 ... # $ basdai : num 2.83 4.05 3.12 3.12 2.42 ... library(plyr) dat2New-ddply(dat2,.(patient_id),summarize,t=seq(min(t),max(t))) res-join(dat2New,dat2,type=full) lst1-lapply(split(res,res$patient_id),function(x) {x1-x[x$t!=0,];do.call(rbind,lapply(split(x1,((x1$t-1)%/%3)+1),function(y) {y1-if(any(y$t==1)) rbind(x[x$t==0,],y) else y; data.frame(patient_id=unique(y1$patient_id),t=head(y1$t,1),basdai=mean(y1$basdai,na.rm=TRUE))}) ) }) dat3-dat2[unlist(with(dat2,tapply(t,patient_id,FUN=function(x) x==0 length(x)==1)),use.names=FALSE),] head(dat3,3) # patient_id t basdai #143 10 0 5.225 #555 37 0 2.450 #627 42 0 6.950 lst2-split(dat3,seq_len(nrow(dat3))) lst1[lapply(lst1,length)==0]-mapply(rbind,lst1[lapply(lst1,length)==0],lst2,SIMPLIFY=FALSE) res1-do.call(rbind,lst1) row.names(res1)- 1:nrow(res1) res2- res1[,-2] res2$period-with(res2,ave(patient_id,patient_id,FUN=seq_along)) #res2 #selected rows res2[c(48:51,189:192,210:215),] # patient_id basdai period #48 9 3.625000 8 #49 10 5.225000 1 #t=0 only row #50 11 6.018750 1 #51 11 6.00 2 #189 36 6.17 1 #190 37 2.45 1 #t=0 only row #191 38 3.10 1 #192 38 3.575000 2 #210 41 1.918750 1 #211 41 4.025000 2 #212 41 2.975000 3 #213 41 1.725000 4 #214 42 6.95 1 #t=0 only row #215 44 4.30 1 A.K. From: GUANGUAN LUO guanguan...@gmail.com To: arun smartpink...@yahoo.com Sent: Thursday, May 23, 2013 9:50 AM Subject: Re: how to calculate the mean in a period of time? Hello, Arun, sorry to trouble you again, I tried your method and i found that for patient_id==10 et patient_id==37 ect, the scores are repeated 51 times, I don't understand why this occured. Thank you so much. GG __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] apply function within different groups
Hi, I have a very big data frame and I would like to apply a function to one of the columns within different groups and obtain another dataframe My data frame is like this: group var1 var2 myvar group1 1 a 100 group2 2 b 200 group2 34 c 300 group3 5 d 400 group3 6 e 500 group4 7 f 600 and I woud like to apply this function to column myvar: mifunc = function(vec) { vec=as.vector(vec) for (i in 1:(length(vec)-1)){ vec[i]=vec[i+1]-1 } return(vec) } by the groups in column group. I would like to obtain the same dataframe but with f(myvar) instead of myvar. How can I do this? Thanks, Estefania [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing rows w/ smaller value from data frame
Hi, Try: datNew- read.table(text= activity max_dt A 2013-03-05 B 2013-03-28 A 2013-03-28 C 2013-03-28 B 2013-03-01 ,sep=,header=TRUE,stringsAsFactors=FALSE) datNew$max_dt- as.Date(datNew$max_dt) aggregate(max_dt~activity,data=datNew,max) # activity max_dt #1 A 2013-03-28 #2 B 2013-03-28 #3 C 2013-03-28 #or library(plyr) ddply(datNew,.(activity),summarize, max_dt=max(max_dt)) # activity max_dt #1 A 2013-03-28 #2 B 2013-03-28 #3 C 2013-03-28 #or ddply(datNew,.(activity),summarize, max_dt=tail(sort(max_dt),1)) # activity max_dt #1 A 2013-03-28 #2 B 2013-03-28 #3 C 2013-03-28 A.K. - Original Message - From: ramoss ramine.mossad...@finra.org To: r-help@r-project.org Cc: Sent: Thursday, May 23, 2013 10:23 AM Subject: [R] Removing rows w/ smaller value from data frame Hello, I have a column called max_date in my data frame and I only want to keep the bigger values for the same activity. How can I do that? data frame: activity max_dt A 2013-03-05 B 2013-03-28 A 2013-03-28 C 2013-03-28 B 2013-03-01 Thank you for your help -- View this message in context: http://r.789695.n4.nabble.com/Removing-rows-w-smaller-value-from-data-frame-tp4667816.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xml newbie
Dear r-helpers, I am trying to extract quantities of interest from my iTunes library xml file.  For example, i'd like to be able to run a simple regression of playcount on track number, under the theory that tracks near the beginning of albums get played more (either because they are better or because people listen to the beginnings of albums) I have an xml file that is of the following form: key13162/key dict keyTrack ID/keyinteger13162/integer keyName/keystringI'm A Wheel/string keyArtist/keystringWilco/string keyComposer/keystringJeff Tweedy/string keyAlbum/keystringA Ghost is Born/string keyGenre/keystringRock/string keyKind/keystringMatched AAC audio file/string keySize/keyinteger6248701/integer keyTotal Time/keyinteger154648/integer keyDisc Number/keyinteger1/integer keyDisc Count/keyinteger1/integer keyTrack Number/keyinteger9/integer keyTrack Count/keyinteger12/integer keyYear/keyinteger2004/integer keyDate Modified/keydate2012-07-26T22:29:15Z/date keyDate Added/keydate2010-01-27T00:02:21Z/date keyBit Rate/keyinteger256/integer keySample Rate/keyinteger44100/integer keyPlay Count/keyinteger3/integer keyPlay Date/keyinteger3434905791/integer keyPlay Date UTC/keydate2012-11-05T00:29:51Z/date keyArtwork Count/keyinteger1/integer keySort Album/keystringGhost is Born/string keyPersistent ID/keystringA8B0E5CF2E86A4C6/string keyTrack Type/keystringFile/string keyLocation/keystringfile://localhost/Users/Alex/Music/iTunes/iTunes%20Media/Music/Wilco/A%20Ghost%20is%20Born/09%20I'm%20A%20Wheel.m4a/string keyFile Folder Count/keyinteger5/integer keyLibrary Folder Count/keyinteger1/integer /dict From each entry, i'd like to extract: Track ID, Track Number and Play Count.  In this case, it would be 13162, 9, 3 my guess is that this can be done using library(XML). If anyone has any guidance, it would be appreciated.  Please note: a) I do not understand XML data structures, so please explain what you mean by children etc⦠b) Not every entry in my database has a track number and a play count -- i'd like to have NAs associated with the appropriate Track ID, which all entries have. c) it'd also be OK if this XML database just got turned into a normal r data frame. Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in png: unable to start png() device
Hi, I use R 2.14.0 on Win XP Pro SP3 and it behaves same - some times. After I draw a lot of plots (more then 200, 2 concurrent rgui processes running in parallel) to png then I get same error message. Bmp(), jpg(), png() - same error. Restart of Rgui helps nothing. Solutin - restart system and voila everything is ok. I suspect that there might be something wrong with allocation/deallocation of Windows resources in windows() function. Ondrej Novak [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing rows w/ smaller value from data frame
From your email, it seems like aggregate() is working. Could you please provide the sessionInfo()? My guess is that some other loaded library is masking the summarize(). For example, if I load library(Hmisc) #The following object is masked from ‘package:plyr’: # # is.discrete, summarize ddply(datNew,.(activity),summarize, max_dt=max(max_dt)) # #Error in is.list(by) : 'by' is missing ddply(datNew,.(activity),plyr::summarize,max_dt=max(max_dt)) # activity max_dt #1 A 2013-03-28 #2 B 2013-03-28 #3 C 2013-03-28 A.K. - Original Message - From: Mossadegh, Ramine N. ramine.mossad...@finra.org To: arun smartpink...@yahoo.com Cc: Sent: Thursday, May 23, 2013 10:44 AM Subject: RE: [R] Removing rows w/ smaller value from data frame Thank but I get : Error in is.list(by) : 'by' is missing When I tried ddply(datNew,.(activity),summarize, max_dt=max(max_dt)) -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Thursday, May 23, 2013 10:40 AM To: Mossadegh, Ramine N. Cc: R help Subject: Re: [R] Removing rows w/ smaller value from data frame Hi, Try: datNew- read.table(text= activity max_dt A 2013-03-05 B 2013-03-28 A 2013-03-28 C 2013-03-28 B 2013-03-01 ,sep=,header=TRUE,stringsAsFactors=FALSE) datNew$max_dt- as.Date(datNew$max_dt) aggregate(max_dt~activity,data=datNew,max) # activity max_dt #1 A 2013-03-28 #2 B 2013-03-28 #3 C 2013-03-28 #or library(plyr) ddply(datNew,.(activity),summarize, max_dt=max(max_dt)) # activity max_dt #1 A 2013-03-28 #2 B 2013-03-28 #3 C 2013-03-28 #or ddply(datNew,.(activity),summarize, max_dt=tail(sort(max_dt),1)) # activity max_dt #1 A 2013-03-28 #2 B 2013-03-28 #3 C 2013-03-28 A.K. - Original Message - From: ramoss ramine.mossad...@finra.org To: r-help@r-project.org Cc: Sent: Thursday, May 23, 2013 10:23 AM Subject: [R] Removing rows w/ smaller value from data frame Hello, I have a column called max_date in my data frame and I only want to keep the bigger values for the same activity. How can I do that? data frame: activity max_dt A 2013-03-05 B 2013-03-28 A 2013-03-28 C 2013-03-28 B 2013-03-01 Thank you for your help -- View this message in context: http://r.789695.n4.nabble.com/Removing-rows-w-smaller-value-from-data-frame-tp4667816.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Notice: This email, including attachments, may include non-public, proprietary, confidential or legally privileged information. If you are not an intended recipient or an authorized agent of an intended recipient, you are hereby notified that any dissemination, distribution or copying of the information contained in or transmitted with this e-mail is unauthorized and strictly prohibited. If you have received this email in error, please notify the sender by replying to this message and permanently delete this e-mail, its attachments, and any copies of it immediately. You should not retain, copy or use this e-mail or any attachment for any purpose, nor disclose all or any part of the contents to any other person. Thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply function within different groups
Hi, May be this helps: dat1- read.table(text= group var1 var2 myvar group1 1 a 100 group2 2 b 200 group2 34 c 300 group3 5 d 400 group3 6 e 500 group4 7 f 600 ,sep=,header=TRUE,stringsAsFactors=FALSE) library(plyr) ddply(dat1,.(group),summarize, f_myvar=mifunc(myvar)) # group f_myvar #1 group1 NA #2 group2 299 #3 group2 300 #4 group3 499 #5 group3 500 #6 group4 NA A.K. - Original Message - From: Estefanía Gómez Galimberti tef...@yahoo.com To: r help help r-help@r-project.org Cc: Sent: Thursday, May 23, 2013 11:30 AM Subject: [R] apply function within different groups Hi, I have a very big data frame and I would like to apply a function to one of the columns within different groups and obtain another dataframe My data frame is like this: group var1 var2 myvar group1 1 a 100 group2 2 b 200 group2 34 c 300 group3 5 d 400 group3 6 e 500 group4 7 f 600 and I woud like to apply this function to column myvar: mifunc = function(vec) { vec=as.vector(vec) for (i in 1:(length(vec)-1)){ vec[i]=vec[i+1]-1 } return(vec) } by the groups in column group. I would like to obtain the same dataframe but with f(myvar) instead of myvar. How can I do this? Thanks, Estefania [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows without loops
This is the exact solution I came up with ... exact, really? Is the time-consuming part the initial merge DFm - merge(DF1, DF2, by=c(X.DATE, X.TIME), all=TRUE) or the postprocessing to turn runs of NAs into the last non-NA value in the column while(any(is.na(DFm))){ if (any(is.na(DFm[1,]))) stop(Complete first row required!) ind - which(is.na(DFm), arr.ind=TRUE) prind - matrix(c(ind[,row]-1, ind[,col]), ncol=2) DFm[is.na(DFm)] - DFm[prind] } If it is the latter, you may get better results from applying zoo::na.locf() to each non-key column of DFm. E.g., library(zoo) f2 - function(DFm) { for(i in 3:length(DFm)) { DFm[[i]] - na.locf(DFm[[i]]) } DFm } f(DFm) gives the same result as Blaser's algorithm f1 - function (DFm) { while (any(is.na(DFm))) { if (any(is.na(DFm[1, ]))) stop(Complete first row required!) ind - which(is.na(DFm), arr.ind = TRUE) prind - matrix(c(ind[, row] - 1, ind[, col]), ncol = 2) DFm[is.na(DFm)] - DFm[prind] } DFm } If there are not a huge number of columns I would guess that f2() would be much faster. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Adeel - SafeGreenCapital Sent: Thursday, May 23, 2013 5:54 AM To: 'Blaser Nello'; r-help@r-project.org Subject: Re: [R] adding rows without loops Thank you Blaser: This is the exact solution I came up with but when comparing 8M rows even on an 8G machine, one runs out of memory. To run this effectively, I have to break the DF into smaller DFs, loop through them and then do a massive rmerge at the end. That's what takes 8+ hours to compute. Even the bigmemory package is causing OOM issues. -Original Message- From: Blaser Nello [mailto:nbla...@ispm.unibe.ch] Sent: Thursday, May 23, 2013 12:15 AM To: Adeel Amin; r-help@r-project.org Subject: RE: [R] adding rows without loops Merge should do the trick. How to best use it will depend on what you want to do with the data after. The following is an example of what you could do. This will perform best, if the rows are missing at random and do not cluster. DF1 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:5,7:9)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DF2 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:8)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DFm - merge(DF1, DF2, by=c(X.DATE, X.TIME), all=TRUE) while(any(is.na(DFm))){ if (any(is.na(DFm[1,]))) stop(Complete first row required!) ind - which(is.na(DFm), arr.ind=TRUE) prind - matrix(c(ind[,row]-1, ind[,col]), ncol=2) DFm[is.na(DFm)] - DFm[prind] } DFm Best, Nello -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Adeel Amin Sent: Donnerstag, 23. Mai 2013 07:01 To: r-help@r-project.org Subject: [R] adding rows without loops I'm comparing a variety of datasets with over 4M rows. I've solved this problem 5 different ways using a for/while loop but the processing time is murder (over 8 hours doing this row by row per data set). As such I'm trying to find whether this solution is possible without a loop or one in which the processing time is much faster. Each dataset is a time series as such: DF1: X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 070045 35 6 01052007 080042 32 7 01052007 090045 32 ... ... ... n DF2 X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 35 6 01052007 070042 32 7 01052007 080045 32 ... ... n+4000 In other words there are 4000 more rows in DF2 then DF1 thus the datasets are of unequal length. I'm trying to ensure that all dataframes have the same number of X.DATE and X.TIME entries. Where they are missing, I'd like to insert a new row. In the above example, when comparing DF2 to DF1, entry 01052007 0600 entry is missing in DF1. The solution would add a row to DF1 at the appropriate index. so new dataframe would be X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 27 6 01052007 070045 35 7 01052007 080042 32 8 01052007 090045 32 Value and Value2 would be the same as row 4. Of course this is simple to accomplish using a row by row analysis but with of 4M rows the processing time destroying and rebinding the datasets is very time
Re: [R] Could graph objects be stored in a two-dimensional list?
You could use lists of lists, and index them with vectors. a - list() a[[1]] - list() a[[2]] - list() a[[c(1,1)]] - g11 a[[c(1,2)]] - g12 a[[c(2,1)]] - g21 a[[c(2,2)]] - g22 print(a[[c(2,1)]]) but this seems like an inefficient use of memory because your indexed data is stored more compactly than the graph object is. I would index the data and generate the graph object on the fly when I wanted to see it. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. jpm miao miao...@gmail.com wrote: Hi, I have a few graph objects created by some graphic package (say, ggplot2, which I use frequently). Because of the existent relation between the graphs, I'd like to index them in two dimensions as p[1,1], p[1,2], p[2,1], p[2,2] for convenience. To my knowledge, the only data type capable of storing graph objects (and any R object) is list, but unfortunately it is available in only one dimension. Could the graphs be stored in any two-dimensional data type? One remedy that comes to my mind is to build a function f so that f(1,1)=1 f(1,2)=2 f(2,1)=3 f(2,2)=4 With functions f and f^{-1} (inverse function of f) , the two-dimensional indices could be mapped to and from a set of one-dimensional indices, and the functions are exactly the way R numbers elements in a matrix. Does R have this built-in function for a m by n matrix or more generally, m*n*p array? (I know this function is easy to write, but just want to make sure whether it exists already) Thanks, Miao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply function within different groups
Thanks a lot!!! It works perkectly! Just one thing, is there a way to preserve my original data frame so i don´t need to join both tables? I could do it with rbind but my original data frame is not in order, so Thanks again! From: arun smartpink...@yahoo.com To: Estefanía Gómez Galimberti tef...@yahoo.com Cc: R help r-help@r-project.org Sent: Thursday, May 23, 2013 12:48 PM Subject: Re: [R] apply function within different groups Hi, May be this helps: dat1- read.table(text= group var1 var2 myvar group1 1 a 100 group2 2 b 200 group2 34 c 300 group3 5 d 400 group3 6 e 500 group4 7 f 600 ,sep=,header=TRUE,stringsAsFactors=FALSE) library(plyr) ddply(dat1,.(group),summarize, f_myvar=mifunc(myvar)) # group f_myvar #1 group1 NA #2 group2 299 #3 group2 300 #4 group3 499 #5 group3 500 #6 group4 NA A.K. - Original Message - From: Estefanía Gómez Galimberti tef...@yahoo.com To: r help help r-help@r-project.org Cc: Sent: Thursday, May 23, 2013 11:30 AM Subject: [R] apply function within different groups Hi, I have a very big data frame and I would like to apply a function to one of the columns within different groups and obtain another dataframe My data frame is like this: group var1 var2 myvar group1 1 a 100 group2 2 b 200 group2 34 c 300 group3 5 d 400 group3 6 e 500 group4 7 f 600 and I woud like to apply this function to column myvar: mifunc = function(vec) { vec=as.vector(vec) for (i in 1:(length(vec)-1)){ vec[i]=vec[i+1]-1 } return(vec) } by the groups in column group. I would like to obtain the same dataframe but with f(myvar) instead of myvar. How can I do this? Thanks, Estefania [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] FW: Kernel smoothing with bandwidth which varies with x
Hello all, I would like to use the Nadaraya-Watson estimator assuming a Gaussian kernel: So far I sued the library(sm) library(sm) x-runif(5000) y-rnorm(5000) plot(x,y,col='black') h1-h.select(x,y,method='aicc') lines(ksmooth(x,y,bandwidth=h1)) which works fine. What if my data were clustered requiring a bandwidth that varies with x? How can I do that? Thanks in advance, Ioanna __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Could graph objects be stored in a two-dimensional list?
To my knowledge, the only data type capable of storing graph objects (and any R object) is list, but unfortunately it is available in only one dimension. Could the graphs be stored in any two-dimensional data type? Lists can have any number of dimensions you want, just as with other vector types. The default printout of such a thing is not very pretty, but the information is in the object. M - matrix(list(as.roman(99), Two, c(3,pi) , c(4,44,444)), nrow=2, ncol=2) M [,1] [,2] [1,] 99Numeric,2 [2,] Two Numeric,3 M[[1,1]] [1] XCIX M[[1,2]] [1] 3.00 3.141593 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jeff Newmiller Sent: Thursday, May 23, 2013 9:06 AM To: jpm miao; r-help Subject: Re: [R] Could graph objects be stored in a two-dimensional list? You could use lists of lists, and index them with vectors. a - list() a[[1]] - list() a[[2]] - list() a[[c(1,1)]] - g11 a[[c(1,2)]] - g12 a[[c(2,1)]] - g21 a[[c(2,2)]] - g22 print(a[[c(2,1)]]) but this seems like an inefficient use of memory because your indexed data is stored more compactly than the graph object is. I would index the data and generate the graph object on the fly when I wanted to see it. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. jpm miao miao...@gmail.com wrote: Hi, I have a few graph objects created by some graphic package (say, ggplot2, which I use frequently). Because of the existent relation between the graphs, I'd like to index them in two dimensions as p[1,1], p[1,2], p[2,1], p[2,2] for convenience. To my knowledge, the only data type capable of storing graph objects (and any R object) is list, but unfortunately it is available in only one dimension. Could the graphs be stored in any two-dimensional data type? One remedy that comes to my mind is to build a function f so that f(1,1)=1 f(1,2)=2 f(2,1)=3 f(2,2)=4 With functions f and f^{-1} (inverse function of f) , the two-dimensional indices could be mapped to and from a set of one-dimensional indices, and the functions are exactly the way R numbers elements in a matrix. Does R have this built-in function for a m by n matrix or more generally, m*n*p array? (I know this function is easy to write, but just want to make sure whether it exists already) Thanks, Miao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sample(c(0, 1)...) vs. rbinom
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Albyn Jones Sent: Thursday, May 23, 2013 8:30 AM To: r-help@r-project.org Subject: Re: [R] sample(c(0, 1)...) vs. rbinom After a bit of playing around, I discovered that sample() does something similar in other situations: set.seed(105021) sample(1:5,1,prob=c(1,1,1,1,1)) [1] 3 set.seed(105021) sample(1:5,1) [1] 2 set.seed(105021) sample(1:5,5,prob=c(1,1,1,1,1)) [1] 3 4 2 1 5 set.seed(105021) sample(1:5,5) [1] 2 5 1 4 3 albyn What is the something similar you are referring to? And I guess I still don't understand what it is that concerns you about the sample function. Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 On 2013-05-22 22:24, peter dalgaard wrote: On May 23, 2013, at 07:01 , Jeff Newmiller wrote: You seem to be building an elaborate structure for testing the reproducibility of the random number generator. I suspect that rbinom is calling the random number generator a different number of times when you pass prob=0.5 than otherwise. Nope. It's switching 0 and 1: set.seed(1); sample(0:1,10,replace=TRUE,prob=c(1-pp,pp)); set.seed(1); rbinom(10,1,pp) [1] 1 1 0 0 1 0 0 0 0 1 [1] 0 0 1 1 0 1 1 1 1 0 which is curious, but of course has no implication for the distributional properties. Curiouser, if you drop the prob= in sample. set.seed(1); sample(0:1,10,replace=TRUE); set.seed(1); rbinom(10,1,pp) [1] 0 0 1 1 0 1 1 1 1 0 [1] 0 0 1 1 0 1 1 1 1 0 However, it was never a design goal that two different random functions (or even two code paths within the same function) should give exactly the same values, even if they simulate the same distribution, so this is nothing more than a curiosity. Appendix A: some R code that exhibits the problem = ppp - seq(0, 1, by = 0.01) result - do.call(rbind, lapply(ppp, function(p) { set.seed(1) sampleRes - sample(c(0, 1), size = 1, replace = TRUE, prob=c(1-p, p)) set.seed(1) rbinomRes - rbinom(1, size = 1, prob = p) data.frame(prob = p, equivalent = all(sampleRes == rbinomRes)) })) result Appendix B: the output from the R code == prob equivalent 1 0.00 TRUE 2 0.01 TRUE 3 0.02 TRUE 4 0.03 TRUE 5 0.04 TRUE 6 0.05 TRUE 7 0.06 TRUE 8 0.07 TRUE 9 0.08 TRUE 10 0.09 TRUE 11 0.10 TRUE 12 0.11 TRUE 13 0.12 TRUE 14 0.13 TRUE 15 0.14 TRUE 16 0.15 TRUE 17 0.16 TRUE 18 0.17 TRUE 19 0.18 TRUE 20 0.19 TRUE 21 0.20 TRUE 22 0.21 TRUE 23 0.22 TRUE 24 0.23 TRUE 25 0.24 TRUE 26 0.25 TRUE 27 0.26 TRUE 28 0.27 TRUE 29 0.28 TRUE 30 0.29 TRUE 31 0.30 TRUE 32 0.31 TRUE 33 0.32 TRUE 34 0.33 TRUE 35 0.34 TRUE 36 0.35 TRUE 37 0.36 TRUE 38 0.37 TRUE 39 0.38 TRUE 40 0.39 TRUE 41 0.40 TRUE 42 0.41 TRUE 43 0.42 TRUE 44 0.43 TRUE 45 0.44 TRUE 46 0.45 TRUE 47 0.46 TRUE 48 0.47 TRUE 49 0.48 TRUE 50 0.49 TRUE 51 0.50 FALSE 52 0.51 TRUE 53 0.52 TRUE 54 0.53 TRUE 55 0.54 TRUE 56 0.55 TRUE 57 0.56 TRUE 58 0.57 TRUE 59 0.58 TRUE 60 0.59 TRUE 61 0.60 TRUE 62 0.61 TRUE 63 0.62 TRUE 64 0.63 TRUE 65 0.64 TRUE 66 0.65 TRUE 67 0.66 TRUE 68 0.67 TRUE 69 0.68 TRUE 70 0.69 TRUE 71 0.70 TRUE 72 0.71 TRUE 73 0.72 TRUE 74 0.73 TRUE 75 0.74 TRUE 76 0.75 TRUE 77 0.76 TRUE 78 0.77 TRUE 79 0.78 TRUE 80 0.79 TRUE 81 0.80 TRUE 82 0.81 TRUE 83 0.82 TRUE 84 0.83 TRUE 85 0.84 TRUE 86 0.85 TRUE 87 0.86 TRUE 88 0.87 TRUE 89 0.88 TRUE 90 0.89 TRUE 91 0.90 TRUE 92 0.91 TRUE 93 0.92 TRUE 94 0.93 TRUE 95 0.94 TRUE 96 0.95 TRUE 97 0.96 TRUE 98 0.97 TRUE 99 0.98 TRUE 100 0.99 TRUE 101 1.00 TRUE Appendix C: Session information === sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8
Re: [R] apply function within different groups
Hi, No problem. Try: dat2-within(dat1,f_myvar-ave(myvar,group,FUN=mifunc)) dat2 # group var1 var2 myvar f_myvar #1 group1 1 a 100 NA #2 group2 2 b 200 299 #3 group2 34 c 300 300 #4 group3 5 d 400 499 #5 group3 6 e 500 500 #6 group4 7 f 600 NA A.K. From: Estefanía Gómez Galimberti tef...@yahoo.com To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Thursday, May 23, 2013 12:08 PM Subject: Re: [R] apply function within different groups Thanks a lot!!! It works perkectly! Just one thing, is there a way to preserve my original data frame so i don´t need to join both tables? I could do it with rbind but my original data frame is not in order, so Thanks again! From: arun smartpink...@yahoo.com To: Estefanía Gómez Galimberti tef...@yahoo.com Cc: R help r-help@r-project.org Sent: Thursday, May 23, 2013 12:48 PM Subject: Re: [R] apply function within different groups Hi, May be this helps: dat1- read.table(text= group var1 var2 myvar group1 1 a 100 group2 2 b 200 group2 34 c 300 group3 5 d 400 group3 6 e 500 group4 7 f 600 ,sep=,header=TRUE,stringsAsFactors=FALSE) library(plyr) ddply(dat1,.(group),summarize, f_myvar=mifunc(myvar)) # group f_myvar #1 group1 NA #2 group2 299 #3 group2 300 #4 group3 499 #5 group3 500 #6 group4 NA A.K. - Original Message - From: Estefanía Gómez Galimberti tef...@yahoo.com To: r help help r-help@r-project.org Cc: Sent: Thursday, May 23, 2013 11:30 AM Subject: [R] apply function within different groups Hi, I have a very big data frame and I would like to apply a function to one of the columns within different groups and obtain another dataframe My data frame is like this: group var1 var2 myvar group1 1 a 100 group2 2 b 200 group2 34 c 300 group3 5 d 400 group3 6 e 500 group4 7 f 600 and I woud like to apply this function to column myvar: mifunc = function(vec) { vec=as.vector(vec) for (i in 1:(length(vec)-1)){ vec[i]=vec[i+1]-1 } return(vec) } by the groups in column group. I would like to obtain the same dataframe but with f(myvar) instead of myvar. How can I do this? Thanks, Estefania [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply function within different groups
Using the previous solution: dat3-mutate(dat1,f_myvar=ddply(dat1,.(group),summarize,f_myvar=mifunc(myvar))[,2]) identical(dat2,dat3) #[1] TRUE A.K. - Original Message - From: arun smartpink...@yahoo.com To: Estefanía Gómez Galimberti tef...@yahoo.com Cc: R help r-help@r-project.org Sent: Thursday, May 23, 2013 1:01 PM Subject: Re: [R] apply function within different groups Hi, No problem. Try: dat2-within(dat1,f_myvar-ave(myvar,group,FUN=mifunc)) dat2 # group var1 var2 myvar f_myvar #1 group1 1 a 100 NA #2 group2 2 b 200 299 #3 group2 34 c 300 300 #4 group3 5 d 400 499 #5 group3 6 e 500 500 #6 group4 7 f 600 NA A.K. From: Estefanía Gómez Galimberti tef...@yahoo.com To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Thursday, May 23, 2013 12:08 PM Subject: Re: [R] apply function within different groups Thanks a lot!!! It works perkectly! Just one thing, is there a way to preserve my original data frame so i don´t need to join both tables? I could do it with rbind but my original data frame is not in order, so Thanks again! From: arun smartpink...@yahoo.com To: Estefanía Gómez Galimberti tef...@yahoo.com Cc: R help r-help@r-project.org Sent: Thursday, May 23, 2013 12:48 PM Subject: Re: [R] apply function within different groups Hi, May be this helps: dat1- read.table(text= group var1 var2 myvar group1 1 a 100 group2 2 b 200 group2 34 c 300 group3 5 d 400 group3 6 e 500 group4 7 f 600 ,sep=,header=TRUE,stringsAsFactors=FALSE) library(plyr) ddply(dat1,.(group),summarize, f_myvar=mifunc(myvar)) # group f_myvar #1 group1 NA #2 group2 299 #3 group2 300 #4 group3 499 #5 group3 500 #6 group4 NA A.K. - Original Message - From: Estefanía Gómez Galimberti tef...@yahoo.com To: r help help r-help@r-project.org Cc: Sent: Thursday, May 23, 2013 11:30 AM Subject: [R] apply function within different groups Hi, I have a very big data frame and I would like to apply a function to one of the columns within different groups and obtain another dataframe My data frame is like this: group var1 var2 myvar group1 1 a 100 group2 2 b 200 group2 34 c 300 group3 5 d 400 group3 6 e 500 group4 7 f 600 and I woud like to apply this function to column myvar: mifunc = function(vec) { vec=as.vector(vec) for (i in 1:(length(vec)-1)){ vec[i]=vec[i+1]-1 } return(vec) } by the groups in column group. I would like to obtain the same dataframe but with f(myvar) instead of myvar. How can I do this? Thanks, Estefania [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sample(c(0, 1)...) vs. rbinom
the something similar is return a different result in two situations where one might expect the same result, ie when a probability vector with equal probabilities is supplied versus the default of equal probabilities. And, assuming that by concerns me you mean worries me, I have no clue why you think it does! It is a curiosity. albyn On Thu, May 23, 2013 at 04:38:18PM +, Nordlund, Dan (DSHS/RDA) wrote: -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Albyn Jones Sent: Thursday, May 23, 2013 8:30 AM To: r-help@r-project.org Subject: Re: [R] sample(c(0, 1)...) vs. rbinom After a bit of playing around, I discovered that sample() does something similar in other situations: set.seed(105021) sample(1:5,1,prob=c(1,1,1,1,1)) [1] 3 set.seed(105021) sample(1:5,1) [1] 2 set.seed(105021) sample(1:5,5,prob=c(1,1,1,1,1)) [1] 3 4 2 1 5 set.seed(105021) sample(1:5,5) [1] 2 5 1 4 3 albyn What is the something similar you are referring to? And I guess I still don't understand what it is that concerns you about the sample function. Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 On 2013-05-22 22:24, peter dalgaard wrote: On May 23, 2013, at 07:01 , Jeff Newmiller wrote: You seem to be building an elaborate structure for testing the reproducibility of the random number generator. I suspect that rbinom is calling the random number generator a different number of times when you pass prob=0.5 than otherwise. Nope. It's switching 0 and 1: set.seed(1); sample(0:1,10,replace=TRUE,prob=c(1-pp,pp)); set.seed(1); rbinom(10,1,pp) [1] 1 1 0 0 1 0 0 0 0 1 [1] 0 0 1 1 0 1 1 1 1 0 which is curious, but of course has no implication for the distributional properties. Curiouser, if you drop the prob= in sample. set.seed(1); sample(0:1,10,replace=TRUE); set.seed(1); rbinom(10,1,pp) [1] 0 0 1 1 0 1 1 1 1 0 [1] 0 0 1 1 0 1 1 1 1 0 However, it was never a design goal that two different random functions (or even two code paths within the same function) should give exactly the same values, even if they simulate the same distribution, so this is nothing more than a curiosity. Appendix A: some R code that exhibits the problem = ppp - seq(0, 1, by = 0.01) result - do.call(rbind, lapply(ppp, function(p) { set.seed(1) sampleRes - sample(c(0, 1), size = 1, replace = TRUE, prob=c(1-p, p)) set.seed(1) rbinomRes - rbinom(1, size = 1, prob = p) data.frame(prob = p, equivalent = all(sampleRes == rbinomRes)) })) result Appendix B: the output from the R code == prob equivalent 1 0.00 TRUE 2 0.01 TRUE 3 0.02 TRUE 4 0.03 TRUE 5 0.04 TRUE 6 0.05 TRUE 7 0.06 TRUE 8 0.07 TRUE 9 0.08 TRUE 10 0.09 TRUE 11 0.10 TRUE 12 0.11 TRUE 13 0.12 TRUE 14 0.13 TRUE 15 0.14 TRUE 16 0.15 TRUE 17 0.16 TRUE 18 0.17 TRUE 19 0.18 TRUE 20 0.19 TRUE 21 0.20 TRUE 22 0.21 TRUE 23 0.22 TRUE 24 0.23 TRUE 25 0.24 TRUE 26 0.25 TRUE 27 0.26 TRUE 28 0.27 TRUE 29 0.28 TRUE 30 0.29 TRUE 31 0.30 TRUE 32 0.31 TRUE 33 0.32 TRUE 34 0.33 TRUE 35 0.34 TRUE 36 0.35 TRUE 37 0.36 TRUE 38 0.37 TRUE 39 0.38 TRUE 40 0.39 TRUE 41 0.40 TRUE 42 0.41 TRUE 43 0.42 TRUE 44 0.43 TRUE 45 0.44 TRUE 46 0.45 TRUE 47 0.46 TRUE 48 0.47 TRUE 49 0.48 TRUE 50 0.49 TRUE 51 0.50 FALSE 52 0.51 TRUE 53 0.52 TRUE 54 0.53 TRUE 55 0.54 TRUE 56 0.55 TRUE 57 0.56 TRUE 58 0.57 TRUE 59 0.58 TRUE 60 0.59 TRUE 61 0.60 TRUE 62 0.61 TRUE 63 0.62 TRUE 64 0.63 TRUE 65 0.64 TRUE 66 0.65 TRUE 67 0.66 TRUE 68 0.67 TRUE 69 0.68 TRUE 70 0.69 TRUE 71 0.70 TRUE 72 0.71 TRUE 73 0.72 TRUE 74 0.73 TRUE 75 0.74 TRUE 76 0.75 TRUE 77 0.76 TRUE 78 0.77 TRUE 79 0.78 TRUE 80 0.79 TRUE 81 0.80 TRUE 82 0.81 TRUE 83 0.82 TRUE 84 0.83
Re: [R] data frame sum
Hi, ab- cbind(a,b) indx-duplicated(names(ab))|duplicated(names(ab),fromLast=TRUE) res1-cbind(ab[!indx],v2=rowSums(ab[indx])) res1[,order(as.numeric(gsub([A-Za-z],,names(res1,] #v1 v2 v3 #1 3 4 5 #Another example: a2- data.frame(v1=c(3,6,7),v2=c(2,4,8)) b2- data.frame(v2=c(2,6,7),v3=c(5,4,9)) ab2- cbind(a2,b2) indx-duplicated(names(ab2))|duplicated(names(ab2),fromLast=TRUE) res1-cbind(ab2[!indx],v2=rowSums(ab2[indx])) res1[,order(as.numeric(gsub([A-Za-z],,names(res1,] # v1 v2 v3 #1 3 4 5 #2 6 10 4 #3 7 15 9 A.K. Dear R expert, I have two data frame a and b: a - data.frame(v1=3,v2=2) b - data.frame(v2=2,v3=5) Is it possible to obtain a new data frame resulting from the sum of the previous df with the 3 variables? namely v1,v2,v3 3,4,5 Thanx, Gianandrea __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] strings
I have two files containing words. I want to print the are in file 1 but NOT in file 2. How do I go about? file 1: ABL1 1 ALKBH1 2 ALKBH2 3 ALKBH3 4ANKRD17 5 APEX1 6 APEX2 7 APTX 8 ASF1A 9 ASTE1 10 ATM 11 ATR 12 ATRIP 13 ATRX 14 ATXN3 15 BCCIP 16 BLM 17 BRCA1 18 BRCA2 file2: ALKBH2 1ALKBH3 2 APEX1 3 APEX2 4 APLF 5 APTX 6 ATM 7 ATR 8 ATRIP 9 BLM 10BRCA1 11BRCA2 12BRIP1 13 BTBD12 14 CCNH [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strings
Hi, Try: dat1- structure(list(V2 = c(ALKBH1, ALKBH2, ALKBH3, ANKRD17, APEX1, APEX2, APTX, ASF1A, ASTE1, ATM, ATR, ATRIP, ATRX, ATXN3, BCCIP, BLM, BRCA1, BRCA2)), .Names = V2, class = data.frame, row.names = c(NA, 18L)) dat2- structure(list(V2 = c(ALKBH3, APEX1, APEX2, APLF, APTX, ATM, ATR, ATRIP, BLM, BRCA1, BRCA2, BRIP1, BTBD12, CCNH)), .Names = V2, class = data.frame, row.names = c(NA, 14L)) library(sqldf) sqldf('SELECT * FROM dat1 EXCEPT SELECT * FROM dat2') # V2 #1 ALKBH1 #2 ALKBH2 #3 ANKRD17 #4 ASF1A #5 ASTE1 #6 ATRX #7 ATXN3 #8 BCCIP #or dat2$id- 1 res-merge(dat1,dat2,all=TRUE) subset(res,is.na(res$id))[1] # V2 #1 ALKBH1 #2 ALKBH2 #4 ANKRD17 #9 ASF1A #10 ASTE1 #14 ATRX #15 ATXN3 #16 BCCIP A.K. I have two files containing words. I want to print the are in file 1 but NOT in file 2. How do I go about? file 1: ABL1 1 ALKBH1 2 ALKBH2 3 ALKBH3 4 ANKRD17 5 APEX1 6 APEX2 7 APTX 8 ASF1A 9 ASTE1 10 ATM 11 ATR 12 ATRIP 13 ATRX 14 ATXN3 15 BCCIP 16 BLM 17 BRCA1 18 BRCA2 file2: ALKBH2 1 ALKBH3 2 APEX1 3 APEX2 4 APLF 5 APTX 6 ATM 7 ATR 8 ATRIP 9 BLM 10 BRCA1 11 BRCA2 12 BRIP1 13 BTBD12 14 CCNH __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] point.in.polygon help
In that case I'd definitely look more at the over() function than that ugly bit I suggested before. Get your fish info into a SpatialPointsDataFrame Since your polygons are in a SpatialPolygonsDataFrame, I would expect the data frame part has one row per basin, and it contains the basin names or other unique identifier. Loop through the basin names, subsetting the SpatialPolygonsDataFrame for each each basin, then use the over() function the with the fish SpatialPointsDataFrame to tell you which fish are in the current basin. That's an outline; there are obviously lots of details that would be needed. This should work even if, for example, a single basin consists of more than one polygon (presumably non-overlapping). There may be a more efficient way, but I don't know it off the top of my head. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 5/23/13 6:03 AM, karengrace84 kgfis...@alumni.unc.edu wrote: I am looking at fish tagging data. I have gps coordinates of where each fish was tagged and released, and I have a map of 10 coastal basins of the state of Louisiana. I am trying to determine which basin each fish was tagged in. -- View this message in context: http://r.789695.n4.nabble.com/point-in-polygon-help-tp4667645p4667808.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strings
See the setdiff() function -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 5/23/13 11:04 AM, Robin Mjelle robinmje...@gmail.com wrote: I have two files containing words. I want to print the are in file 1 but NOT in file 2. How do I go about? file 1: ABL1 1 ALKBH1 2 ALKBH2 3 ALKBH3 4ANKRD17 5 APEX1 6 APEX2 7 APTX 8 ASF1A 9 ASTE1 10 ATM 11 ATR 12 ATRIP 13 ATRX 14 ATXN3 15 BCCIP 16 BLM 17 BRCA1 18 BRCA2 file2: ALKBH2 1ALKBH3 2 APEX1 3 APEX2 4 APLF 5 APTX 6 ATM 7 ATR 8 ATRIP 9 BLM 10BRCA1 11BRCA2 12BRIP1 13 BTBD12 14 CCNH [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strings
You recommended library(sqldf) sqldf('SELECT * FROM dat1 EXCEPT SELECT * FROM dat2') Using nothing but the core R packages setdiff() returns the difference between two sets. setdiff(dat1$V2, dat2$V2) [1] ALKBH1 ALKBH2 ANKRD17 ASF1A ASTE1 ATRXATXN3 BCCIP If there are possibly duplicates in dat1$V2, so it is not a set, and you want the duplicates in the result, use dat1$V2[ !is.element(dat1$V2, dat2$V2) ] [1] ALKBH1 ALKBH2 ANKRD17 ASF1A ASTE1 ATRXATXN3 BCCIP a - c(1, 2, 3, 2, 1, 4) b - c(1, 3) setdiff(a, b) [1] 2 4 a[ !is.element(a, b) ] [1] 2 2 4 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of arun Sent: Thursday, May 23, 2013 12:05 PM To: R help Subject: Re: [R] strings Hi, Try: dat1- structure(list(V2 = c(ALKBH1, ALKBH2, ALKBH3, ANKRD17, APEX1, APEX2, APTX, ASF1A, ASTE1, ATM, ATR, ATRIP, ATRX, ATXN3, BCCIP, BLM, BRCA1, BRCA2)), .Names = V2, class = data.frame, row.names = c(NA, 18L)) dat2- structure(list(V2 = c(ALKBH3, APEX1, APEX2, APLF, APTX, ATM, ATR, ATRIP, BLM, BRCA1, BRCA2, BRIP1, BTBD12, CCNH)), .Names = V2, class = data.frame, row.names = c(NA, 14L)) library(sqldf) sqldf('SELECT * FROM dat1 EXCEPT SELECT * FROM dat2') # V2 #1 ALKBH1 #2 ALKBH2 #3 ANKRD17 #4 ASF1A #5 ASTE1 #6 ATRX #7 ATXN3 #8 BCCIP #or dat2$id- 1 res-merge(dat1,dat2,all=TRUE) subset(res,is.na(res$id))[1] # V2 #1 ALKBH1 #2 ALKBH2 #4 ANKRD17 #9 ASF1A #10 ASTE1 #14 ATRX #15 ATXN3 #16 BCCIP A.K. I have two files containing words. I want to print the are in file 1 but NOT in file 2. How do I go about? file 1: ABL1 1 ALKBH1 2 ALKBH2 3 ALKBH3 4 ANKRD17 5 APEX1 6 APEX2 7 APTX 8 ASF1A 9 ASTE1 10 ATM 11 ATR 12 ATRIP 13 ATRX 14 ATXN3 15 BCCIP 16 BLM 17 BRCA1 18 BRCA2 file2: ALKBH2 1 ALKBH3 2 APEX1 3 APEX2 4 APLF 5 APTX 6 ATM 7 ATR 8 ATRIP 9 BLM 10 BRCA1 11 BRCA2 12 BRIP1 13 BTBD12 14 CCNH __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strings
#or dat1$V2[is.na(match(dat1$V2,dat2$V2))] #[1] ALKBH1 ALKBH2 ANKRD17 ASF1A ASTE1 ATRX ATXN3 #[8] BCCIP a[is.na(match(a,b))] #[1] 2 2 4 A.K. - Original Message - From: William Dunlap wdun...@tibco.com To: arun smartpink...@yahoo.com; R help r-help@r-project.org Cc: Sent: Thursday, May 23, 2013 3:18 PM Subject: RE: [R] strings You recommended library(sqldf) sqldf('SELECT * FROM dat1 EXCEPT SELECT * FROM dat2') Using nothing but the core R packages setdiff() returns the difference between two sets. setdiff(dat1$V2, dat2$V2) [1] ALKBH1 ALKBH2 ANKRD17 ASF1A ASTE1 ATRX ATXN3 BCCIP If there are possibly duplicates in dat1$V2, so it is not a set, and you want the duplicates in the result, use dat1$V2[ !is.element(dat1$V2, dat2$V2) ] [1] ALKBH1 ALKBH2 ANKRD17 ASF1A ASTE1 ATRX ATXN3 BCCIP a - c(1, 2, 3, 2, 1, 4) b - c(1, 3) setdiff(a, b) [1] 2 4 a[ !is.element(a, b) ] [1] 2 2 4 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of arun Sent: Thursday, May 23, 2013 12:05 PM To: R help Subject: Re: [R] strings Hi, Try: dat1- structure(list(V2 = c(ALKBH1, ALKBH2, ALKBH3, ANKRD17, APEX1, APEX2, APTX, ASF1A, ASTE1, ATM, ATR, ATRIP, ATRX, ATXN3, BCCIP, BLM, BRCA1, BRCA2)), .Names = V2, class = data.frame, row.names = c(NA, 18L)) dat2- structure(list(V2 = c(ALKBH3, APEX1, APEX2, APLF, APTX, ATM, ATR, ATRIP, BLM, BRCA1, BRCA2, BRIP1, BTBD12, CCNH)), .Names = V2, class = data.frame, row.names = c(NA, 14L)) library(sqldf) sqldf('SELECT * FROM dat1 EXCEPT SELECT * FROM dat2') # V2 #1 ALKBH1 #2 ALKBH2 #3 ANKRD17 #4 ASF1A #5 ASTE1 #6 ATRX #7 ATXN3 #8 BCCIP #or dat2$id- 1 res-merge(dat1,dat2,all=TRUE) subset(res,is.na(res$id))[1] # V2 #1 ALKBH1 #2 ALKBH2 #4 ANKRD17 #9 ASF1A #10 ASTE1 #14 ATRX #15 ATXN3 #16 BCCIP A.K. I have two files containing words. I want to print the are in file 1 but NOT in file 2. How do I go about? file 1: ABL1 1 ALKBH1 2 ALKBH2 3 ALKBH3 4 ANKRD17 5 APEX1 6 APEX2 7 APTX 8 ASF1A 9 ASTE1 10 ATM 11 ATR 12 ATRIP 13 ATRX 14 ATXN3 15 BCCIP 16 BLM 17 BRCA1 18 BRCA2 file2: ALKBH2 1 ALKBH3 2 APEX1 3 APEX2 4 APLF 5 APTX 6 ATM 7 ATR 8 ATRIP 9 BLM 10 BRCA1 11 BRCA2 12 BRIP1 13 BTBD12 14 CCNH __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glmnet package: command meanings
Hi List, I have a little confused when to glmnet() vs cv.glmnet(). I know that, glmnet(): gives the fit cv.glment(): does the cv after the fit I just want to get the beta coefficients after the fit, that's it! But of all the glmnet examples I've seen, the beta coefficient is obtained ONLY AFTER cv.glmnet(). Why is that? Also, why is there so many more extra beta's after the fit? Thanks, Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in png: unable to start png() device
On 23.05.2013 17:06, Ondrej Novak wrote: Hi, I use R 2.14.0 on Win XP Pro SP3 and it behaves same - some times. After I draw a lot of plots (more then 200, 2 concurrent rgui processes running in parallel) to png then I get same error message. Bmp(), jpg(), png() - same error. Restart of Rgui helps nothing. Solutin - restart system and voila everything is ok. I suspect that there might be something wrong with allocation/deallocation of Windows resources in windows() function. R-2.14.0 is anicent, can you try this woth a recent R such as R-3.0.1 please and if the problem persists, please provide reproducible code so that we can try to reproduce in order to find the problem. Best, Uwe Ligges Ondrej Novak [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: Kernel smoothing with bandwidth which varies with x
On 23.05.2013 18:10, IOANNA wrote: Hello all, I would like to use the Nadaraya-Watson estimator assuming a Gaussian kernel: So far I sued the library(sm) library(sm) x-runif(5000) y-rnorm(5000) plot(x,y,col='black') h1-h.select(x,y,method='aicc') lines(ksmooth(x,y,bandwidth=h1)) which works fine. What if my data were clustered requiring a bandwidth that varies with x? How can I do that? I'd start with trying to transform x so that the bandwidth can be fixed. Uwe Ligges Thanks in advance, Ioanna __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Distance calculation
Dear useRs, i have the following data arranged in three columns structure(c(0.492096635764151, 0.42688044914, 0.521585941816778, 1.66472272302545, 2.61878329527404, 2.19154489521664, 0.493876245329722, 0.4915787202584, 0.889477365620806, 0.609135860199222, 0.739201878930367, 0.854663750519518, 2.06195904001605, 1.41493262330451, 1.35748791897328, 1.19490680241894, 0.702488756183322, 0.338258418490199, 0.123398398622741, 0.138548982660226, 0.16170889185798, 0.414543218677095, 1.84629295875002, 2.24547399004563), .Dim = c(12L, 2L)) The distance is to be calculated by subtracting each value of each column from the corresponding column value in the following way =The column values are cyclic. For example, after row 12 there is once again row 1. So, in a way, row 3 is more closer to row 12 than to row 8. = The peak value is the maximum value for any column. the values falling in the range of 80% of the maximum values can also be considered as maximum value provided they are not falling immediatly next to eachother. = If we plot column 1 and column 2 the peak value of column 1 is at 5th grade of x-axis and for column 2 its in 12th. For column 2 at x=1 the value is very close to that of the value at x=12 (in 80% range of it), but still it can considered as peak value as it is immediatly falling next to the maximum value. Now The peaks are moved towards eachother in a shortest possible way unless maximum values are under eachother more precisely, column 1 1 2 3 4 5(max) 6 7 8 9 10 11 12column 2 1 2 3 4 5 6 7 8 9 10 11 12(max) Now distance is measured in the following way column 1 1 2 3 4 5(max) 6 7 8 9 10 11 12column 2 12(max) 1 2 3 4 5 6 7 8 9 10 11 asum(abs(col1-col2)) ==column 1 1 2 3 4 5(max) 6 7 8 9 10 11 12column 2 11 12(max) 1 2 3 4 5 6 7 8 9 10 bsum(abs(col1-col2))==column 1 1 2 3 4 5(max) 6 7 8 9 10 11 12column 2 10 11 12(max) 1 2 3 4 5 6 7 8 9 csum(abs(col1-col2))==column 1 1 2 3 4 5(max) 6 7 8 9 10 11 12column 2 9 10 11 12(max) 1 2 3 4 5 6 7 8 dsum(abs(col1-col2))==column 1 1 2 3 4 5(max) 6 7 8 9 10 11 12column 2 8 9 10 11 12(max) 1 2 3 4 5 6 7 esum(abs(col1-col2)) total distance= a+b+c+d+e For the following two column it should work the following way structure(c(0.948228727226247, 1.38569091844218, 0.910510759802679, 1.25991218521949, 0.993123416952421, 0.553640392997634, 0.357487763503204, 0.368328033777003, 0.344255688489322, 0.423679560916755, 1.32093576037521, 3.13420679229785, 0.766278117577654, 0.751997501086888, 0.836280758630117, 1.188156460303, 1.56771616670373, 1.15928168139479, 0.522523036011874, 0.561678840701488, 1.11155735914479, 1.26467106348848, 1.09378883406298, 1.17607018089421), .Dim = c(12L, 2L)) column 1 1 2 3 4 5 6 7 8 9 10 11 12(max)column 2 1 2 3 4 5(max) 6 7 8 9 10(max) 11 12 Now as for column 2, 10th value is closer to col1 maximum value, therefore distance is measured in the following way column 1 1 2 3 4 5 6 7 8 9 10 11 12(max)column 2 12 1 2 3 4 5 6 7 8 9 10(max) 11 asum(abs(col1-col2)) --- column 1 1 2 3 4 5 6 7 8 9 10 11 12(max)column 2 11 12 1 2 3 4 5 6 7 8 9 10(max) bsum(abs(col1-col2)) total distance=a+b How can i do it?? Thankyou very very much in advance Elisa i have the following data arranged in three columns structure(c(0.492096635764151, 0.42688044914, 0.521585941816778, 1.66472272302545, 2.61878329527404, 2.19154489521664, 0.493876245329722, 0.4915787202584, 0.889477365620806, 0.609135860199222, 0.739201878930367, 0.854663750519518, 2.06195904001605, 1.41493262330451, 1.35748791897328, 1.19490680241894, 0.702488756183322, 0.338258418490199, 0.123398398622741, 0.138548982660226, 0.16170889185798, 0.414543218677095, 1.84629295875002, 2.24547399004563), .Dim = c(12L, 2L)) The distance is to be calculated by subtracting each value of each column from the corresponding column value in the following way =The column values are cyclic. For example, after row 12 there is once again row 1. So, in a way, row 3 is more closer to row 12 than to row 8. = The peak value is the maximum value for any column. the values falling in the range of 80% of the maximum values can also be considered as maximum value provided they are not falling immediatly next to eachother. = If we plot column 1 and column 2 the peak value of column 1 is at 5th grade of x-axis and for column 2 its in 12th. For column 2 at x=1 the value is very close to that of the value at x=12 (in 80% range of it), but still it can considered as peak value as it is immediatly falling next to the maximum value. Now The peaks are moved towards eachother in a shortest possible way unless maximum values are under eachother more precisely, column 1 1 2 3 4 5(max) 6 7 8 9 10 11 12 column 2 1 2 3 4 5 6 7 8 9 10 11 12(max) Now distance is measured in the following way column 1 1 2 3 4 5(max) 6 7 8 9 10 11 12
Re: [R] Could graph objects be stored in a two-dimensional list?
On May 23, 2013, at 8:30 AM, jpm miao wrote: Hi, I have a few graph objects created by some graphic package (say, ggplot2, which I use frequently). Because of the existent relation between the graphs, I'd like to index them in two dimensions as p[1,1], p[1,2], p[2,1], p[2,2] for convenience. To my knowledge, the only data type capable of storing graph objects (This will all be depending on what you do mean by graph objects.) (and any R object) is list, but unfortunately it is available in only one dimension. I think both of these presumptions are incorrect. Could the graphs be stored in any two-dimensional data type? One remedy that comes to my mind is to build a function f so that f(1,1)=1 f(1,2)=2 f(2,1)=3 f(2,2)=4 With functions f and f^{-1} (inverse function of f) , the two-dimensional indices could be mapped to and from a set of one-dimensional indices, and the functions are exactly the way R numbers elements in a matrix. Does R have this built-in function for a m by n matrix or more generally, m*n*p array? (I know this function is easy to write, but just want to make sure whether it exists already) Matrices can hold list elements: matrix( list(a=a), 2,2) [,1] [,2] [1,] a a [2,] a a matrix( list(a=a), 2,2)[1,1] [[1]] [1] a And list may be nested in a regular matrix list( list( list(a=a), list(b=bb) ), list(list(c=ccc), list(d=) ) )[[1]][[2]] $b [1] bb So storing in this manner for access by an appropriately designed function should also be straight-forward. You could argue that the lattice-object panel structure depends on this fact. [[alternative HTML version deleted]] Please learn to post in plain text. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] order panels in xyplot by increasing slope
I am creating a few dozen multi-panel time series plots using lattice graphics in the lme4 package. Each panel in a given plot represents a tree. each multipanel plot is a particular treatment. Here's my issue: when you use xyplot() to plot this data it orders the panels alphabetically. I would prefer to have them in order of increasing slope of the regression line plotted in each panel. I've read everything I can find regarding the index.cond argument, and the best I can come up with is to manually order them to have the correct increasing slope order, i.e. index.cond=c(7,6,23,4,15,8,...). This would take inordinate amounts of time and I'm sure there is a better, more eloquent solution. Please help! Sorry for the long dataset below, I'm unsure of how to create a reproducible example otherwise. example.plot = xyplot(ht ~ time|tree, data=data, type = c(r, g, p), par.settings=simpleTheme(col=blue), main=abc, ) example.plot data= tree treat site plot rx mr rxl t w l spp time diaht 1 1 1 C-H 2002 1 1 Mn N N 14.55 ac1 9.6 74.5 2 2 1 C-H 2002 1 1 Mn N N 14.55 ac1 7.4 69.5 3 3 1 C-H 2003 1 1 Mn N N 13.34 ac1 6.0 66.7 4 4 1 C-H 2003 1 1 Mn N N 13.34 ac1 7.1 75.4 5 5 1 C-H 2003 1 1 Mn N N 13.34 ac1 7.5 57.5 6 6 1 C-H 2008 2 1 Mc N N 11.63 ac1 5.7 71.5 7 7 1 C-H 2008 2 1 Mc N N 11.63 ac1 5.2 50.0 8 8 1 C-H 2011 2 1 Mc N N 13.04 ac1 6.3 62.0 9 9 1 C-H 2011 2 1 Mc N N 13.04 ac1 6.7 60.5 10 10 1 C-H 2017 3 1 H N N 11.38 ac1 10.7 82.0 11 11 1 C-H 2017 3 1 H N N 11.38 ac1 4.4 27.0 12 12 1 C-H 2018 3 1 H N N 11.08 ac1 5.8 49.0 13 13 1 C-H 2018 3 1 H N N 11.08 ac1 4.3 64.2 14 14 1 C-H 2013 4 1 McH N N 15.09 ac1 11.4 86.0 15 15 1 C-H 2013 4 1 McH N N 15.09 ac1 7.6 87.5 16 16 1 C-H 2014 4 1 McH N N 14.17 ac1 5.8 60.1 17 17 1 C-H 2014 4 1 McH N N 14.17 ac1 11.5 100.5 18 18 1 C-H 2014 4 1 McH N N 14.17 ac1 4.7 53.2 19 19 1 C-H 2019 5 1 MnH N N 11.72 ac1 8.1 56.0 20 20 1 C-H 2019 5 1 MnH N N 11.72 ac1 7.1 56.0 21 21 1 C-H 2019 5 1 MnH N N 11.72 ac1 7.1 56.0 22 22 1 C-H 2020 5 1 MnH N N 14.71 ac1 7.0 78.2 23 23 1 C-H 2020 5 1 MnH N N 14.71 ac1 5.2 47.2 24 24 1 C-H 2020 5 1 MnH N N 14.71 ac1 7.0 83.5 595 1 1 C-H 2002 1 1 Mn N N 14.55 ac2 9.6 96.0 596 2 1 C-H 2002 1 1 Mn N N 14.55 ac2 6.0 72.0 597 3 1 C-H 2003 1 1 Mn N N 13.34 ac2 5.7 75.0 598 4 1 C-H 2003 1 1 Mn N N 13.34 ac2 7.5 101.0 599 5 1 C-H 2003 1 1 Mn N N 13.34 ac2 6.9 58.0 600 6 1 C-H 2008 2 1 Mc N N 11.63 ac2 6.0 84.0 601 7 1 C-H 2008 2 1 Mc N N 11.63 ac2 6.3 72.0 602 8 1 C-H 2011 2 1 Mc N N 13.04 ac2 7.4 101.0 603 9 1 C-H 2011 2 1 Mc N N 13.04 ac2 5.6 62.0 60410 1 C-H 2017 3 1 H N N 11.38 ac2 10.7 110.0 60511 1 C-H 2017 3 1 H N N 11.38 ac2 4.7 60.0 60612 1 C-H 2018 3 1 H N N 11.08 ac2 6.4 48.0 60713 1 C-H 2018 3 1 H N N 11.08 ac2 5.6 70.0 60814 1 C-H 2013 4 1 McH N N 15.09 ac2 11.0 116.0 60915 1 C-H 2013 4 1 McH N N 15.09 ac2 7.5 104.0 61016 1 C-H 2014 4 1 McH N N 14.17 ac2 6.5 61.0 61117 1 C-H 2014 4 1 McH N N 14.17 ac2 10.9 110.0 61218 1 C-H 2014 4 1 McH N N 14.17 ac2 5.9 50.0 61319 1 C-H 2019 5 1 MnH N N 11.72 ac2 8.1 76.0 61420 1 C-H 2019 5 1 MnH N N 11.72 ac2 7.1 82.0 61521 1 C-H 2019 5 1 MnH N N 11.72 ac2 7.1 82.0 61622 1 C-H 2020 5 1 MnH N N 14.71 ac2 7.6 98.0 61723 1 C-H 2020 5 1 MnH N N 14.71 ac2 6.1 70.0 61824 1 C-H 2020 5 1 MnH N N 14.71 ac2 8.4 95.0 11891 1 C-H 2002 1 1 Mn N N 14.55 ac3 13.0 109.0 11902 1 C-H 2002 1 1 Mn N N 14.55 ac3 9.8 77.0 11913 1 C-H 2003 1 1 Mn N N 13.34 ac3 8.0 80.0 11924 1 C-H 2003 1 1 Mn N N 13.34 ac3 13.0 113.0 11935 1 C-H 2003 1 1 Mn N N 13.34 ac3 NANA 11946 1 C-H 2008 2 1 Mc N N 11.63 ac3 7.7 89.0 11957 1 C-H 2008 2 1 Mc N N 11.63 ac3 9.5 84.0 11968 1 C-H 2011 2 1 Mc N N 13.04 ac3 6.2 122.0 11979 1 C-H 2011 2 1 Mc N N 13.04 ac3 NANA 1198 10 1 C-H 2017 3 1 H N N 11.38 ac3 11.5 104.0 1199 11 1 C-H 2017 3 1 H N N 11.38 ac3 6.1 62.0 1200 12 1 C-H 2018 3
Re: [R] order panels in xyplot by increasing slope
On 05/24/2013 06:21 AM, Belair, Ethan D wrote: example.plot = xyplot(ht ~ time|tree, data=data, type = c(r, g, p), par.settings=simpleTheme(col=blue), main=abc, ) example.plot ... Hi Ethan, This may be what you want: panel.slope-function(panel) { return(diff(range(panel$y,na.rm=TRUE))/ diff(range(panel$x,na.rm=TRUE))) } panel.order- order(unlist(lapply(example.plot$panel.args,panel.slope))) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subsetting and Dates
Hi, I am trying to understand why creating Date variables does not work if I subset to avoid NAs. I had problems creating these Date variables in my code and I thought that the presence of NAs was the cause. So I used a condition to avoid NAs. It turns out that NAs are not a problem and I do not need to subset, but I'd like to understand why subsetting causes the problem. The strange numbers I start with are what I get when I read an Excel sheet with the function read.xls() from package gdata. dat1 = c(41327, 41334, 41341, 41348, 41355, 41362, 41369, 41376, 41383, 41390, 41397) dat2 = dat1 dat2[c(5,9)]=NA Data = data.frame(dat1,dat2) keep1 = !is.na(Data$dat1) keep2 = !is.na(Data$dat2) Data$Dat1a = as.Date(Data[,dat1], origin=1899-12-30) Data$Dat1b[keep1] = as.Date(Data[keep1,dat1], origin=1899-12-30) Data$Dat2a = as.Date(Data[,dat2], origin=1899-12-30) Data$Dat2b[keep2] = as.Date(Data[keep2,dat2], origin=1899-12-30) Data dat1 dat2 Dat1a Dat1b Dat2a Dat2b 1 41327 41327 2013-02-22 15758 2013-02-22 15758 2 41334 41334 2013-03-01 15765 2013-03-01 15765 3 41341 41341 2013-03-08 15772 2013-03-08 15772 4 41348 41348 2013-03-15 15779 2013-03-15 15779 5 41355NA 2013-03-22 15786 NANA 6 41362 41362 2013-03-29 15793 2013-03-29 15793 7 41369 41369 2013-04-05 15800 2013-04-05 15800 8 41376 41376 2013-04-12 15807 2013-04-12 15807 9 41383NA 2013-04-19 15814 NANA 10 41390 41390 2013-04-26 15821 2013-04-26 15821 11 41397 41397 2013-05-03 15828 2013-05-03 15828 So variables Dat1b and Dat2b are not converted to Date class. sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] gdata_2.12.0 loaded via a namespace (and not attached): [1] gtools_2.7.0 Thanks in advance, Denis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] When the interaction term should be interpreted in AIC table?
Hi, I would be very graitful if someone could help me to figure out my problem. I used mixed-effects models to analyse my data and AIC approach for model selection. I am studying an effect on Labrador tea on basal diameter of spruce in 2 different habitats (wet and dry zones) during 3 years. This is one of example of my AIC table: Candidate models K AICc $B$(B AICc AICc Wt Zone + Labrador tea + Year 9 -17.75 0.00 0.80 Zone + Labrador tea + Year + Zone $B!_(B Labrador tea 10 -14.69 3.06 0.17 Zone + Labrador tea + Year + Year $B!_(B Labrador tea 12 -11.21 6.53 0.03 Zone + Labrador tea 6 71.14 88.88 0.00 Zone + Labrador tea + Zone $B!_(B Labrador tea 7 73.85 91.59 0.00 I interpreted the main effect of zone, Labrador tea and Year. My question is should I interpret the interaction term Zone $B!_(B Labrador tea also? Normally I interpreted the effect of variables that have been in the models with $B$(B AICc 4. One professor said I should not interpred interaction term if the main effect is stronger. But at the same time I saw articles where author interpreted the interaction term where Akaike weight was still high. Thank you in advance. Galina [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting and Dates
You could convert those columns to Date class by: Data[,c(4,6)]-lapply(Data[,c(4,6)],as.Date,origin=1970-01-01) #or Data[,c(4,6)]-lapply(Data[,c(4,6)],function(x) structure(x,class=Date)) # dat1 dat2 Dat1a Dat1b Dat2a Dat2b #1 41327 41327 2013-02-22 2013-02-22 2013-02-22 2013-02-22 #2 41334 41334 2013-03-01 2013-03-01 2013-03-01 2013-03-01 #3 41341 41341 2013-03-08 2013-03-08 2013-03-08 2013-03-08 #4 41348 41348 2013-03-15 2013-03-15 2013-03-15 2013-03-15 #5 41355 NA 2013-03-22 2013-03-22 NA NA #6 41362 41362 2013-03-29 2013-03-29 2013-03-29 2013-03-29 #7 41369 41369 2013-04-05 2013-04-05 2013-04-05 2013-04-05 #8 41376 41376 2013-04-12 2013-04-12 2013-04-12 2013-04-12 #9 41383 NA 2013-04-19 2013-04-19 NA NA #10 41390 41390 2013-04-26 2013-04-26 2013-04-26 2013-04-26 #11 41397 41397 2013-05-03 2013-05-03 2013-05-03 2013-05-03 A.K. - Original Message - From: Denis Chabot chabot.de...@gmail.com To: R-help@r-project.org Cc: Sent: Thursday, May 23, 2013 5:35 PM Subject: [R] subsetting and Dates Hi, I am trying to understand why creating Date variables does not work if I subset to avoid NAs. I had problems creating these Date variables in my code and I thought that the presence of NAs was the cause. So I used a condition to avoid NAs. It turns out that NAs are not a problem and I do not need to subset, but I'd like to understand why subsetting causes the problem. The strange numbers I start with are what I get when I read an Excel sheet with the function read.xls() from package gdata. dat1 = c(41327, 41334, 41341, 41348, 41355, 41362, 41369, 41376, 41383, 41390, 41397) dat2 = dat1 dat2[c(5,9)]=NA Data = data.frame(dat1,dat2) keep1 = !is.na(Data$dat1) keep2 = !is.na(Data$dat2) Data$Dat1a = as.Date(Data[,dat1], origin=1899-12-30) Data$Dat1b[keep1] = as.Date(Data[keep1,dat1], origin=1899-12-30) Data$Dat2a = as.Date(Data[,dat2], origin=1899-12-30) Data$Dat2b[keep2] = as.Date(Data[keep2,dat2], origin=1899-12-30) Data dat1 dat2 Dat1a Dat1b Dat2a Dat2b 1 41327 41327 2013-02-22 15758 2013-02-22 15758 2 41334 41334 2013-03-01 15765 2013-03-01 15765 3 41341 41341 2013-03-08 15772 2013-03-08 15772 4 41348 41348 2013-03-15 15779 2013-03-15 15779 5 41355 NA 2013-03-22 15786 NA NA 6 41362 41362 2013-03-29 15793 2013-03-29 15793 7 41369 41369 2013-04-05 15800 2013-04-05 15800 8 41376 41376 2013-04-12 15807 2013-04-12 15807 9 41383 NA 2013-04-19 15814 NA NA 10 41390 41390 2013-04-26 15821 2013-04-26 15821 11 41397 41397 2013-05-03 15828 2013-05-03 15828 So variables Dat1b and Dat2b are not converted to Date class. sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] gdata_2.12.0 loaded via a namespace (and not attached): [1] gtools_2.7.0 Thanks in advance, Denis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create and read symbolic links in Windows
Dear R experts, This time I am unable create symbolic links to files as I had done last time. I could not replicate what I had successfully tried last time (rerun the same code without any modifications) . I get the following error message.. [1] FALSE Warning message: In file.link(.file1, file2, : cannot link './File1' to './file2', reason 'The specified network name is no longer available' The file.exists, however, results TRUE when I test for source and target folders and the source file.. I tried with mapping of drives , relative folder path,and nothing worked. The R version (on 64-bit Windows 7): version _ platform x86_64-w64-mingw32 arch x86_64 os mingw32 system x86_64, mingw32 status major 3 minor 0.0 year 2013 month 04 day03 svn rev62481 language R version.string R version 3.0.0 (2013-04-03) nickname Masked Marvel Any suggestions are highly welcome! Thanks, Santosh On Fri, May 3, 2013 at 11:30 AM, Santosh santosh2...@gmail.com wrote: Just got it right please ignore the previous posting... It worked! Prof Ripley made my day!! :) THANK YOU! On Fri, May 3, 2013 at 11:23 AM, Santosh santosh2...@gmail.com wrote: Thanks for your suggestion... I upgraded to R.3.0.0 in 64-bit Windows 7 environment.. This time when I use file.link.. I get the following error message: 'Cannot create a file when that file already exists And I don't see the link. The other function, file.copy, correctly copies to the target location. Still confuse with the error msges... Thanks, Santosh On Thu, May 2, 2013 at 11:42 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On 03/05/2013 07:33, Santosh wrote: Thanks for the suggestions. In windows (Windows 7, 64-bit), I couldn't get file.symlink to work, but file.link did return the result to be TRUE but at the target location, I did not see any link. Not sure I am missing anything more.. Hope it's nothing to do with administrator accounts and administrator rights... Is it something I should check with my system administrator? You may need to update your R: although the posting guide asked you to do that before posting. There was a relevant bug fix in 2.15.3. Thanks, Santosh On Thu, May 2, 2013 at 12:22 PM, Prof Brian Ripley rip...@stats.ox.ac.uk mailto:rip...@stats.ox.ac.uk** wrote: On 02/05/2013 19:50, Santosh wrote: Dear Rxperts.. Got a couple of quick q's.. I am using R in windows environment (both 32-bit and 64-bit) a) Is there a way to create symbolic links to some data files? See ?file.symlink. ??'symbolic link' should have got you there. Note that this is not very useful for files, but that is a Windows and not an R restriction. b) How do I read data from symbolic links? The same ways you read data from files. Thanks so much.. Santosh -- Brian D. Ripley, rip...@stats.ox.ac.uk mailto: rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~__**ripley/http://www.stats.ox.ac.uk/~__ripley/ http://www.stats.ox.ac.uk/~**ripley/http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 tel:%2B44%201865%20272861 (self) 1 South Parks Road, +44 1865 272866 tel:%2B44%201865%20272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 tel:%2B44%201865%20272595 __**__ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/_**_listinfo/r-helphttps://stat.ethz.ch/mailman/__listinfo/r-help https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/__**posting-guide.htmlhttp://www.R-project.org/__posting-guide.html http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~**ripley/http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting and Dates
Thank you for the 2 methods to make the columns class Date, but I would really like to know why these variables were not in Date class with my code. Do you know? Denis Le 2013-05-23 à 21:44, arun smartpink...@yahoo.com a écrit : You could convert those columns to Date class by: Data[,c(4,6)]-lapply(Data[,c(4,6)],as.Date,origin=1970-01-01) #or Data[,c(4,6)]-lapply(Data[,c(4,6)],function(x) structure(x,class=Date)) # dat1 dat2 Dat1a Dat1b Dat2a Dat2b #1 41327 41327 2013-02-22 2013-02-22 2013-02-22 2013-02-22 #2 41334 41334 2013-03-01 2013-03-01 2013-03-01 2013-03-01 #3 41341 41341 2013-03-08 2013-03-08 2013-03-08 2013-03-08 #4 41348 41348 2013-03-15 2013-03-15 2013-03-15 2013-03-15 #5 41355NA 2013-03-22 2013-03-22 NA NA #6 41362 41362 2013-03-29 2013-03-29 2013-03-29 2013-03-29 #7 41369 41369 2013-04-05 2013-04-05 2013-04-05 2013-04-05 #8 41376 41376 2013-04-12 2013-04-12 2013-04-12 2013-04-12 #9 41383NA 2013-04-19 2013-04-19 NA NA #10 41390 41390 2013-04-26 2013-04-26 2013-04-26 2013-04-26 #11 41397 41397 2013-05-03 2013-05-03 2013-05-03 2013-05-03 A.K. - Original Message - From: Denis Chabot chabot.de...@gmail.com To: R-help@r-project.org Cc: Sent: Thursday, May 23, 2013 5:35 PM Subject: [R] subsetting and Dates Hi, I am trying to understand why creating Date variables does not work if I subset to avoid NAs. I had problems creating these Date variables in my code and I thought that the presence of NAs was the cause. So I used a condition to avoid NAs. It turns out that NAs are not a problem and I do not need to subset, but I'd like to understand why subsetting causes the problem. The strange numbers I start with are what I get when I read an Excel sheet with the function read.xls() from package gdata. dat1 = c(41327, 41334, 41341, 41348, 41355, 41362, 41369, 41376, 41383, 41390, 41397) dat2 = dat1 dat2[c(5,9)]=NA Data = data.frame(dat1,dat2) keep1 = !is.na(Data$dat1) keep2 = !is.na(Data$dat2) Data$Dat1a = as.Date(Data[,dat1], origin=1899-12-30) Data$Dat1b[keep1] = as.Date(Data[keep1,dat1], origin=1899-12-30) Data$Dat2a = as.Date(Data[,dat2], origin=1899-12-30) Data$Dat2b[keep2] = as.Date(Data[keep2,dat2], origin=1899-12-30) Data dat1 dat2 Dat1a Dat1b Dat2a Dat2b 1 41327 41327 2013-02-22 15758 2013-02-22 15758 2 41334 41334 2013-03-01 15765 2013-03-01 15765 3 41341 41341 2013-03-08 15772 2013-03-08 15772 4 41348 41348 2013-03-15 15779 2013-03-15 15779 5 41355NA 2013-03-22 15786 NANA 6 41362 41362 2013-03-29 15793 2013-03-29 15793 7 41369 41369 2013-04-05 15800 2013-04-05 15800 8 41376 41376 2013-04-12 15807 2013-04-12 15807 9 41383NA 2013-04-19 15814 NANA 10 41390 41390 2013-04-26 15821 2013-04-26 15821 11 41397 41397 2013-05-03 15828 2013-05-03 15828 So variables Dat1b and Dat2b are not converted to Date class. sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] gdata_2.12.0 loaded via a namespace (and not attached): [1] gtools_2.7.0 Thanks in advance, Denis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ordered and unordered variables
Many thanks for your detailed reply. I'll read your mail thoroughly. Thanks! At 2013-05-23 21:56:29,Greg Snow 538...@gmail.com wrote: Meng, This really comes down to what question you are trying to answer. Before worrying about details of default contrasts and issues like that you first need to work out what is really the question of interest. The main difference between declaring a variable ordered or not is the default contrasts. Defaults are provided because there are many cases where which contrasts are used internally does not matter, so why make someone think about it. In cases where the choice of contrasts matter, it is rare that any default coding is the correct/best choice and you should really think through what contrasts answer the question of interest and use those custom contrasts. For example, to answer the question if Tension has any overall effect it does not matter which contrast encoding you use (as long as it is full rank), the test statistic and p-value for testing the whole effect will be the same. The predictions of the means of groups will also be the same regardless of which contrasts are used (and this is often a clearer way to present/explain the results). A case where the specific contrasts would matter would be if we want to see if we can reduce the number of groups by combining groups together, or interpolate to certain groups. The treatment contrasts will test if low and medium can be combined (which makes sense) and if low and high can be combined (which does not make sense unless the first is true and in fact the overall factor is not significant), what makes more sense would be to compare low to medium and medium to high (it could be that low is different from the other 2, but med and high can be combined). The polynomial contrasts give a different view, the quadratic term in this case tests whether the medium group is the average of the low group and the high group (so we could interpolate medium), this only makes sense if the medium tension is centered (in some sense) between the other 2, i.e. the difference from low to medium is exactly the same as the difference from medium to high, but if that were the case then ! I would expect a numerical term rather than an ordered factor. So, to summarize, it depends on the question of interest. For some questions the contrasts don't matter, in which case it does not matter, in other cases the correct contrasts to use are determined by the question and you should use the contrasts that answer that question (which are rarely a default). On Tue, May 21, 2013 at 11:09 PM, meng laomen...@163.com wrote: Thanks. As to the data warpbreaks, if I want to analysis the impact of tension(L,M,H) on breaks, should I order the tension or not? Many thanks. At 2013-05-21 20:55:18,David Winsemius dwinsem...@comcast.net wrote: On May 20, 2013, at 10:35 PM, meng wrote: Hi all: If the explainary variables are ordinal,the result of regression is different from unordered variables.But I can't understand the result of regression from ordered variable. The data is warpbreaks,which belongs to R. If I use the unordered variable(tension):Levels: L M H The result is easy to understand: Estimate Std. Error t value Pr(|t|) (Intercept)36.39 2.80 12.995 2e-16 *** tensionM -10.00 3.96 -2.525 0.014717 * tensionH -14.72 3.96 -3.718 0.000501 *** If I use the ordered variable(tension):Levels: L M H I don't know how to explain the result: Estimate Std. Error t value Pr(|t|) (Intercept) 28.148 1.617 17.410 2e-16 *** tension.L-10.410 2.800 -3.718 0.000501 *** tension.Q 2.155 2.800 0.769 0.445182 What's tension.L and tension.Q stands for?And how to explain the result then? Ordered factors are handled by the R regression mechanism with orthogonal polynomial contrasts: .L for linear and .Q for quadratic. If the term had 4 levels there would also have been a .C (cubic) term. Treatment contrasts are used for unordered factors. Generally one would want to do predictions for explanations of the results. Trying to explain the individual coefficient values from polynomial contrasts is similar to and just as unproductive as trying to explain the individual coefficients involving interaction terms. -- David Winsemius Alameda, CA, USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read
Re: [R] subsetting and Dates
On May 23, 2013, at 7:06 PM, Denis Chabot wrote: Thank you for the 2 methods to make the columns class Date, but I would really like to know why these variables were not in Date class with my code. Do you know? I suspect that the problem lies in the dispatch to `[-.class` or `$-`. When the first argument is 'logical', then the first argument is not of class Date and so not dispatched to `[-.Date` but rather to .Primitive([-), there being no `$-.logical or `[-.logical` . Arguably, as it were, someone should write S4 methods for `[-` and `$-` that would dispatch to the expected method on a signature where the second argument is a Date or POSIXct class. We might also then want methods for the other two [/$ indexing classes, numeric and character. -- David. Denis Le 2013-05-23 à 21:44, arun smartpink...@yahoo.com a écrit : You could convert those columns to Date class by: Data[,c(4,6)]-lapply(Data[,c(4,6)],as.Date,origin=1970-01-01) #or Data[,c(4,6)]-lapply(Data[,c(4,6)],function(x) structure(x,class=Date)) # dat1 dat2 Dat1a Dat1b Dat2a Dat2b #1 41327 41327 2013-02-22 2013-02-22 2013-02-22 2013-02-22 #2 41334 41334 2013-03-01 2013-03-01 2013-03-01 2013-03-01 #3 41341 41341 2013-03-08 2013-03-08 2013-03-08 2013-03-08 #4 41348 41348 2013-03-15 2013-03-15 2013-03-15 2013-03-15 #5 41355NA 2013-03-22 2013-03-22 NA NA #6 41362 41362 2013-03-29 2013-03-29 2013-03-29 2013-03-29 #7 41369 41369 2013-04-05 2013-04-05 2013-04-05 2013-04-05 #8 41376 41376 2013-04-12 2013-04-12 2013-04-12 2013-04-12 #9 41383NA 2013-04-19 2013-04-19 NA NA #10 41390 41390 2013-04-26 2013-04-26 2013-04-26 2013-04-26 #11 41397 41397 2013-05-03 2013-05-03 2013-05-03 2013-05-03 A.K. - Original Message - From: Denis Chabot chabot.de...@gmail.com To: R-help@r-project.org Cc: Sent: Thursday, May 23, 2013 5:35 PM Subject: [R] subsetting and Dates Hi, I am trying to understand why creating Date variables does not work if I subset to avoid NAs. I had problems creating these Date variables in my code and I thought that the presence of NAs was the cause. So I used a condition to avoid NAs. It turns out that NAs are not a problem and I do not need to subset, but I'd like to understand why subsetting causes the problem. The strange numbers I start with are what I get when I read an Excel sheet with the function read.xls() from package gdata. dat1 = c(41327, 41334, 41341, 41348, 41355, 41362, 41369, 41376, 41383, 41390, 41397) dat2 = dat1 dat2[c(5,9)]=NA Data = data.frame(dat1,dat2) keep1 = !is.na(Data$dat1) keep2 = !is.na(Data$dat2) Data$Dat1a = as.Date(Data[,dat1], origin=1899-12-30) Data$Dat1b[keep1] = as.Date(Data[keep1,dat1], origin=1899-12-30) Data$Dat2a = as.Date(Data[,dat2], origin=1899-12-30) Data$Dat2b[keep2] = as.Date(Data[keep2,dat2], origin=1899-12-30) Data dat1 dat2 Dat1a Dat1b Dat2a Dat2b 1 41327 41327 2013-02-22 15758 2013-02-22 15758 2 41334 41334 2013-03-01 15765 2013-03-01 15765 3 41341 41341 2013-03-08 15772 2013-03-08 15772 4 41348 41348 2013-03-15 15779 2013-03-15 15779 5 41355NA 2013-03-22 15786 NANA 6 41362 41362 2013-03-29 15793 2013-03-29 15793 7 41369 41369 2013-04-05 15800 2013-04-05 15800 8 41376 41376 2013-04-12 15807 2013-04-12 15807 9 41383NA 2013-04-19 15814 NANA 10 41390 41390 2013-04-26 15821 2013-04-26 15821 11 41397 41397 2013-05-03 15828 2013-05-03 15828 So variables Dat1b and Dat2b are not converted to Date class. sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] gdata_2.12.0 loaded via a namespace (and not attached): [1] gtools_2.7.0 Thanks in advance, Denis David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting and Dates
I guess it is due to vectorization. vec1- as.Date(Data[,2],origin=1899-12-30) class(vec1) #[1] Date as.vector(vec1) # [1] 15758 15765 15772 15779 NA 15793 15800 15807 NA 15821 15828 head(as.list(vec1),2) #[[1]] #[1] 2013-02-22 # #[[2]] #[1] 2013-03-01 head(data.frame(vec1),2) # vec1 #1 2013-02-22 #2 2013-03-01 unlist(as.list(vec1)) # [1] 15758 15765 15772 15779 NA 15793 15800 15807 NA 15821 15828 Also, please check: http://r.789695.n4.nabble.com/as-vector-with-mode-quot-list-quot-and-POSIXct-td4667533.html A.K. - Original Message - From: Denis Chabot chabot.de...@gmail.com To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Thursday, May 23, 2013 10:06 PM Subject: Re: [R] subsetting and Dates Thank you for the 2 methods to make the columns class Date, but I would really like to know why these variables were not in Date class with my code. Do you know? Denis Le 2013-05-23 à 21:44, arun smartpink...@yahoo.com a écrit : You could convert those columns to Date class by: Data[,c(4,6)]-lapply(Data[,c(4,6)],as.Date,origin=1970-01-01) #or Data[,c(4,6)]-lapply(Data[,c(4,6)],function(x) structure(x,class=Date)) # dat1 dat2 Dat1a Dat1b Dat2a Dat2b #1 41327 41327 2013-02-22 2013-02-22 2013-02-22 2013-02-22 #2 41334 41334 2013-03-01 2013-03-01 2013-03-01 2013-03-01 #3 41341 41341 2013-03-08 2013-03-08 2013-03-08 2013-03-08 #4 41348 41348 2013-03-15 2013-03-15 2013-03-15 2013-03-15 #5 41355 NA 2013-03-22 2013-03-22 NA NA #6 41362 41362 2013-03-29 2013-03-29 2013-03-29 2013-03-29 #7 41369 41369 2013-04-05 2013-04-05 2013-04-05 2013-04-05 #8 41376 41376 2013-04-12 2013-04-12 2013-04-12 2013-04-12 #9 41383 NA 2013-04-19 2013-04-19 NA NA #10 41390 41390 2013-04-26 2013-04-26 2013-04-26 2013-04-26 #11 41397 41397 2013-05-03 2013-05-03 2013-05-03 2013-05-03 A.K. - Original Message - From: Denis Chabot chabot.de...@gmail.com To: R-help@r-project.org Cc: Sent: Thursday, May 23, 2013 5:35 PM Subject: [R] subsetting and Dates Hi, I am trying to understand why creating Date variables does not work if I subset to avoid NAs. I had problems creating these Date variables in my code and I thought that the presence of NAs was the cause. So I used a condition to avoid NAs. It turns out that NAs are not a problem and I do not need to subset, but I'd like to understand why subsetting causes the problem. The strange numbers I start with are what I get when I read an Excel sheet with the function read.xls() from package gdata. dat1 = c(41327, 41334, 41341, 41348, 41355, 41362, 41369, 41376, 41383, 41390, 41397) dat2 = dat1 dat2[c(5,9)]=NA Data = data.frame(dat1,dat2) keep1 = !is.na(Data$dat1) keep2 = !is.na(Data$dat2) Data$Dat1a = as.Date(Data[,dat1], origin=1899-12-30) Data$Dat1b[keep1] = as.Date(Data[keep1,dat1], origin=1899-12-30) Data$Dat2a = as.Date(Data[,dat2], origin=1899-12-30) Data$Dat2b[keep2] = as.Date(Data[keep2,dat2], origin=1899-12-30) Data dat1 dat2 Dat1a Dat1b Dat2a Dat2b 1 41327 41327 2013-02-22 15758 2013-02-22 15758 2 41334 41334 2013-03-01 15765 2013-03-01 15765 3 41341 41341 2013-03-08 15772 2013-03-08 15772 4 41348 41348 2013-03-15 15779 2013-03-15 15779 5 41355 NA 2013-03-22 15786 NA NA 6 41362 41362 2013-03-29 15793 2013-03-29 15793 7 41369 41369 2013-04-05 15800 2013-04-05 15800 8 41376 41376 2013-04-12 15807 2013-04-12 15807 9 41383 NA 2013-04-19 15814 NA NA 10 41390 41390 2013-04-26 15821 2013-04-26 15821 11 41397 41397 2013-05-03 15828 2013-05-03 15828 So variables Dat1b and Dat2b are not converted to Date class. sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] gdata_2.12.0 loaded via a namespace (and not attached): [1] gtools_2.7.0 Thanks in advance, Denis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Download data
Hello again, I need to download 'WTI - Cushing, Oklahoma' from ' http://www.eia.gov/dnav/pet/pet_pri_spt_s1_d.htm' which is available under the column 'View History' While I can get the data manually, however I was looking for some R implementation which can directly download data into R. Can somebody point me how to achieve that? Thanks for your help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows without loops
Rainer...I can't believe this did the trick. You're a genius. Thank you sir. On Thu, May 23, 2013 at 7:07 AM, Rainer Schuermann rainer.schuerm...@gmx.net wrote: Using the data generated with your code below, does rbind( DF1, DF2[ !(DF2$X.TIME %in% DF1$X.TIME), ] ) DF1 - DF1[ order( DF1$X.DATE, DF1$X.TIME ), ] do the job? Rgds, Rainer On Thursday 23 May 2013 05:54:26 Adeel - SafeGreenCapital wrote: Thank you Blaser: This is the exact solution I came up with but when comparing 8M rows even on an 8G machine, one runs out of memory. To run this effectively, I have to break the DF into smaller DFs, loop through them and then do a massive rmerge at the end. That's what takes 8+ hours to compute. Even the bigmemory package is causing OOM issues. -Original Message- From: Blaser Nello [mailto:nbla...@ispm.unibe.ch] Sent: Thursday, May 23, 2013 12:15 AM To: Adeel Amin; r-help@r-project.org Subject: RE: [R] adding rows without loops Merge should do the trick. How to best use it will depend on what you want to do with the data after. The following is an example of what you could do. This will perform best, if the rows are missing at random and do not cluster. DF1 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:5,7:9)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DF2 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:8)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DFm - merge(DF1, DF2, by=c(X.DATE, X.TIME), all=TRUE) while(any(is.na(DFm))){ if (any(is.na(DFm[1,]))) stop(Complete first row required!) ind - which(is.na(DFm), arr.ind=TRUE) prind - matrix(c(ind[,row]-1, ind[,col]), ncol=2) DFm[is.na(DFm)] - DFm[prind] } DFm Best, Nello -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Adeel Amin Sent: Donnerstag, 23. Mai 2013 07:01 To: r-help@r-project.org Subject: [R] adding rows without loops I'm comparing a variety of datasets with over 4M rows. I've solved this problem 5 different ways using a for/while loop but the processing time is murder (over 8 hours doing this row by row per data set). As such I'm trying to find whether this solution is possible without a loop or one in which the processing time is much faster. Each dataset is a time series as such: DF1: X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 070045 35 6 01052007 080042 32 7 01052007 090045 32 ... ... ... n DF2 X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 35 6 01052007 070042 32 7 01052007 080045 32 ... ... n+4000 In other words there are 4000 more rows in DF2 then DF1 thus the datasets are of unequal length. I'm trying to ensure that all dataframes have the same number of X.DATE and X.TIME entries. Where they are missing, I'd like to insert a new row. In the above example, when comparing DF2 to DF1, entry 01052007 0600 entry is missing in DF1. The solution would add a row to DF1 at the appropriate index. so new dataframe would be X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 27 6 01052007 070045 35 7 01052007 080042 32 8 01052007 090045 32 Value and Value2 would be the same as row 4. Of course this is simple to accomplish using a row by row analysis but with of 4M rows the processing time destroying and rebinding the datasets is very time consuming and I believe highly un-R'ish. What am I missing? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative
Re: [R] subsetting and Dates
On May 23, 2013, at 7:56 PM, arun wrote: I guess it is due to vectorization. The concept of vectorization is much broader than the activities of `as.vector`, but it needs a specific functional mechanism to be considered an explanation. vec1- as.Date(Data[,2],origin=1899-12-30) class(vec1) #[1] Date as.vector(vec1) # [1] 15758 15765 15772 15779NA 15793 15800 15807NA 15821 15828 It is certainly true that `as.vector` could unclass a Date-classed vector, but why do you believe this has anything to do with how `$-` returns its functional result? Setting `trace` on `as.vector` does not result in any signal suggesting that it was called: trace('as.vector') Data$Dat1b = as.Date(Data[ ,dat1], origin=1899-12-30) # Nothing trace( .Primitive($-) ) Data$Dat1b = as.Date(Data[ ,dat1], origin=1899-12-30) trace: `$-`(`*tmp*`, Dat1b, value = c(15758, 15765, 15772, 15779, 15786, 15793, 15800, 15807, 15814, 15821, 15828)) head(as.list(vec1),2) #[[1]] #[1] 2013-02-22 # #[[2]] #[1] 2013-03-01 head(data.frame(vec1),2) #vec1 #1 2013-02-22 #2 2013-03-01 unlist(as.list(vec1)) # [1] 15758 15765 15772 15779NA 15793 15800 15807NA 15821 15828 Also, please check: http://r.789695.n4.nabble.com/as-vector-with-mode-quot-list-quot-and-POSIXct-td4667533.html Interesting but I fail to see the connection to this instance other than R behaving somewhat differently than we might at one time have expected. -- David. A.K. - Original Message - From: Denis Chabot chabot.de...@gmail.com To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Thursday, May 23, 2013 10:06 PM Subject: Re: [R] subsetting and Dates Thank you for the 2 methods to make the columns class Date, but I would really like to know why these variables were not in Date class with my code. Do you know? Denis Le 2013-05-23 à 21:44, arun smartpink...@yahoo.com a écrit : You could convert those columns to Date class by: Data[,c(4,6)]-lapply(Data[,c(4,6)],as.Date,origin=1970-01-01) #or Data[,c(4,6)]-lapply(Data[,c(4,6)],function(x) structure(x,class=Date)) # dat1 dat2 Dat1a Dat1b Dat2a Dat2b #1 41327 41327 2013-02-22 2013-02-22 2013-02-22 2013-02-22 #2 41334 41334 2013-03-01 2013-03-01 2013-03-01 2013-03-01 #3 41341 41341 2013-03-08 2013-03-08 2013-03-08 2013-03-08 #4 41348 41348 2013-03-15 2013-03-15 2013-03-15 2013-03-15 #5 41355NA 2013-03-22 2013-03-22 NA NA #6 41362 41362 2013-03-29 2013-03-29 2013-03-29 2013-03-29 #7 41369 41369 2013-04-05 2013-04-05 2013-04-05 2013-04-05 #8 41376 41376 2013-04-12 2013-04-12 2013-04-12 2013-04-12 #9 41383NA 2013-04-19 2013-04-19 NA NA #10 41390 41390 2013-04-26 2013-04-26 2013-04-26 2013-04-26 #11 41397 41397 2013-05-03 2013-05-03 2013-05-03 2013-05-03 A.K. - Original Message - From: Denis Chabot chabot.de...@gmail.com To: R-help@r-project.org Cc: Sent: Thursday, May 23, 2013 5:35 PM Subject: [R] subsetting and Dates Hi, I am trying to understand why creating Date variables does not work if I subset to avoid NAs. I had problems creating these Date variables in my code and I thought that the presence of NAs was the cause. So I used a condition to avoid NAs. It turns out that NAs are not a problem and I do not need to subset, but I'd like to understand why subsetting causes the problem. The strange numbers I start with are what I get when I read an Excel sheet with the function read.xls() from package gdata. dat1 = c(41327, 41334, 41341, 41348, 41355, 41362, 41369, 41376, 41383, 41390, 41397) dat2 = dat1 dat2[c(5,9)]=NA Data = data.frame(dat1,dat2) keep1 = !is.na(Data$dat1) keep2 = !is.na(Data$dat2) Data$Dat1a = as.Date(Data[,dat1], origin=1899-12-30) Data$Dat1b[keep1] = as.Date(Data[keep1,dat1], origin=1899-12-30) Data$Dat2a = as.Date(Data[,dat2], origin=1899-12-30) Data$Dat2b[keep2] = as.Date(Data[keep2,dat2], origin=1899-12-30) Data dat1 dat2 Dat1a Dat1b Dat2a Dat2b 1 41327 41327 2013-02-22 15758 2013-02-22 15758 2 41334 41334 2013-03-01 15765 2013-03-01 15765 3 41341 41341 2013-03-08 15772 2013-03-08 15772 4 41348 41348 2013-03-15 15779 2013-03-15 15779 5 41355NA 2013-03-22 15786 NANA 6 41362 41362 2013-03-29 15793 2013-03-29 15793 7 41369 41369 2013-04-05 15800 2013-04-05 15800 8 41376 41376 2013-04-12 15807 2013-04-12 15807 9 41383NA 2013-04-19 15814 NANA 10 41390 41390 2013-04-26 15821 2013-04-26 15821 11 41397 41397 2013-05-03 15828 2013-05-03 15828 So variables Dat1b and Dat2b are not converted to Date class. sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base