Re: [R] replacing ugly for loops
I am not sure you have expressed what you wanjt to do correctly. See inline: On Wed, Oct 10, 2012 at 9:10 PM, andrewH ahoer...@rprogress.org wrote: I have a couple of hundred American Community Survey Summary Files files containing rectangular arrays of data, mainly though not exclusively numeric. Each file is referred to as a sequence (henceforth seq). -- so 1 seq (terrible identifier -- see below for why) = 1 file From these files I am trying to extract particular subsets (tables) consisting of a sets of columns. These tables are defined by three numbers (now in columns in a data frame): 1. a file identifier (seq) 2. first column position numbers (startNo) 3. length of table (len) So your data frame, call it yourframe, has columns named: seq startNo len so the columns to select for one triple would consist of startNo:(startNo+length-1). I am trying to create for each sequence a vector of all the column numbers for tables in that sequence. So for each seq id you want to find all the column numbers, right? sq.n - seq_len(nrow(yourframe)) ## Just to make it easier to read colms - tapply(sq.n, yourframe$seq,function(x) with(yourframe[x,], sort(unique(do.call(c, mapply(seq, from=startNo, length=len,SIMPLIFY = FALSE) ## Comments In the mapply call, seq is the R function, ?seq. That's why using it as a name for a file id is terrible -- it causes confusion. In the absence of data, this is untested -- and probably not quite right. But it should be close, I hope. The key idea is the use of mapply to get the sequence of columns for each row in all the rows for each seq id. The SIMPLIFY = FALSE guarantees that this yields a list of vectors of column indices, which are then glopped together and cleaned up by the sort(unique(do.call( ... stuff. colms should then be a list giving the sorted column numbers to choose for each seq id. I do not know whether (once cleaned up,) this is either more elegant or more efficient than what you proposed. And I wouldn't be surprised if someone like Bill Dunlap comes up with a lot better way, either. But it is different -- and perhaps amusing. ... If I have properly understood what you wanted. If not, ignore all. Cheers, Bert Obviously I could do this with nested for loops,e.g.. seq - c(1,1,2,2) startNo - c(3, 10, 3, 15) len - c(4, 2, 5, 3) data.df - data.frame(seq, startNo, len) seq.f - factor(data.df$seq) data.l - split(data.df, seq.f) selectColsList- vector(list, length(levels(seq.f))) for (i in seq_along(levels(seq.f))){ selectCols - numeric() for (j in seq_along(data.l[[i]]$startNo)){ selectCols - c(selectCols, data.l[[i]]$startNo[j]:(data.l[[i]]$startNo[j] data.l[[i]]$len[j]-1)) } selectColsList[[i]] - selectCols } selectColsList [[1]] [1] 3 4 5 6 10 11 [[2]] [1] 3 4 5 6 7 15 16 17 But this code strikes me as inelegant and verbose. It seems to me that there ought to be a way to make the outer loop, (indexed with i) into a tapply function (which is why I started with a split()), and the inner loop (indexed with j) into some cute recursive function, but I was not able to do so. If anyone could suggest some nicer (e.g. shorter, or faster, or just more sophisticated) way to do this instead, I would be most grateful. Sincerely, andrewH -- View this message in context: http://r.789695.n4.nabble.com/replacing-ugly-for-loops-tp4645821.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Connect R and Lyx in UBUNTU
By Connect I meant to say that I was able to write code chunks in LYX and compile them within LYX( using R) to produce results along with other stuffs. There are many tutorials available for doing this under Windows but I could not solve the problem for linux (UBUNTU). -Atanu -- View this message in context: http://r.789695.n4.nabble.com/Connect-R-and-Lyx-in-UBUNTU-tp4645675p4645824.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Connect R and Lyx in UBUNTU
It is actually much easier to do it under Ubuntu; see a video here: http://yihui.name/knitr/demo/lyx/ If you want to use Sweave instead of knitr, there is also a module for it. The official documentation is here: - https://github.com/downloads/yihui/lyx/sweave.pdf - https://github.com/downloads/yihui/lyx/knitr.pdf Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: 515-294-2465 Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA On Thu, Oct 11, 2012 at 12:42 AM, ATANU ata.s...@gmail.com wrote: By Connect I meant to say that I was able to write code chunks in LYX and compile them within LYX( using R) to produce results along with other stuffs. There are many tutorials available for doing this under Windows but I could not solve the problem for linux (UBUNTU). -Atanu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exporting summary plm results to latex
Hi Sebastian I think I found the package by accident when I did a search of the Cran package page forlatex but did not use it as it could not do a very particular problem. If there was no other alternative use the add.to.row argument of xtable A while ago I needed to add some info from the summary of a glm an I think I did it by using the add.to.row argument and \multicolumn{n}{l}{text}value where n is the number of columns for the text and the value is the summary subscript object. Be careful of \ which has to be \\ and carriage returns I know its a bit kludgy but if I was doing more of them I would make a template for my text editor which cuts down the work. Regards Duncan At 09:45 11/10/2012, you wrote: I am also interested in the standard errors, but beneath not next to the point estimates which is standard in the xtable package. If you by any chance remember the name of the package or how to do it that would be much appreciated! Cheers, Sebastian On Oct 10, 2012, at 7:10 PM, Duncan Mackay mac...@northnet.com.au wrote: Hi If you just want the coefficients. xtable(summary(fe)$coef) % latex table generated in R 2.15.1 by xtable 1.7-0 package % Thu Oct 11 09:04:59 2012 \begin{table}[ht] \begin{center} \begin{tabular}{r} \hline Estimate Std. Error t-value Pr($$$|$t$|$) \\ \hline x 0.12 0.07 1.78 0.08 \\ \hline \end{tabular} \end{center} \end{table} There is another package whose name eludes me which may help for tables which have different outputs to the output of lm etc HTH Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England Armidale NSW 2351 Email: home: mac...@northnet.com.au At 05:09 11/10/2012, you wrote: HI, May be you can use library(texreg): library(plm) #generating some data x - rnorm(270) y - rnorm(270) t - rep(1:3,30) i - rep(1:90, each=3) data - data.frame(i,t,x,y) fe - plm(y~x,data=data,model=within) summary(fe) library(texreg) fe1-extract.plm(fe) #extract the plm object library(xtable) xtable(do.call(rbind,lapply(fe1,function(x) data.frame(x % latex table generated in R 2.15.0 by xtable 1.7-0 package % Wed Oct 10 14:59:10 2012 \begin{table}[ht] \begin{center} \begin{tabular}{rr} \hline x \\ \hline Estimate -0.03 \\ Std. Error 0.08 \\ Pr($$$|$t$|$) 0.68 \\ R\$\verb|^|2\$ 0.00 \\ Adj. R\$\verb|^|2\$ 0.00 \\ Num. obs. 270.00 \\ \hline \end{tabular} \end{center} \end{table} #Another example. In this case, you can create two tables from the zz1 list data(Produc, package = plm) zz - plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, index = c(state,year)) zz1-extract.plm(zz) lapply(lapply(zz1,function(x) data.frame(x)),xtable) [[1]] % latex table generated in R 2.15.0 by xtable 1.7-0 package % Wed Oct 10 15:08:02 2012 \begin{table}[ht] \begin{center} \begin{tabular}{} \hline Estimate Std..Error Pr...t.. \\ \hline log(pcap) -0.03 0.03 0.37 \\ log(pc) 0.29 0.03 0.00 \\ log(emp) 0.77 0.03 0.00 \\ unemp -0.01 0.00 0.00 \\ \hline \end{tabular} \end{center} \end{table} [[2]] % latex table generated in R 2.15.0 by xtable 1.7-0 package % Wed Oct 10 15:08:02 2012 \begin{table}[ht] \begin{center} \begin{tabular}{rr} \hline x \\ \hline R\$\verb|^|2\$ 0.94 \\ Adj. R\$\verb|^|2\$ 0.88 \\ Num. obs. 816.00 \\ \hline \end{tabular} \end{center} \end{table} Hope it helps. A.K. - Original Message - From: Sebastian Barfort sb3...@nyu.edu To: r-help@r-project.org Cc: Sent: Wednesday, October 10, 2012 1:07 PM Subject: [R] Exporting summary plm results to latex Dear all, I am trying to export my fixed effect results to Latex. I am using the plm package with the summary function. However, it does not look like apsrtable, stargazer, or any other package can accompany using the plm package. I am interested in a classic table with the coefficient in one row followed by the standard error in paranthesis in the next row and stars by the coefficient to show relevant coefficient level. coefficient 1 xxx** (xxx) Here is a reproducible example: library(plm) #generating some data x - rnorm(270) y - rnorm(270) t - rep(1:3,30) i - rep(1:90, each=3) data - data.frame(i,t,x,y) fe - plm(y~x,data=data,model=within) summary(fe) If there is an alternative to using the plm package that works with any of the export to latex packages, I would be very interested to know. Otherwise, any ideas of how to solve this problem are very welcome. I almost exclusively use fixed effect panel models, and the problem of exporting results to Latex is one of the things preventing me from switching entirely from Stata to R. Kind regards, Sebastian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list
Re: [R] Connect R and Lyx in UBUNTU
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I have pointed out already: This is a LyX question - please ask on their mailing list (http://www.lyx.org/MailingLists#toc2 and http://dir.gmane.org/gmane.editors.lyx.general) There are many users who use LyX / sweave or knitr / R under Ubuntu! Rainer On 11/10/12 08:08, Yihui Xie wrote: It is actually much easier to do it under Ubuntu; see a video here: http://yihui.name/knitr/demo/lyx/ If you want to use Sweave instead of knitr, there is also a module for it. The official documentation is here: - https://github.com/downloads/yihui/lyx/sweave.pdf - https://github.com/downloads/yihui/lyx/knitr.pdf Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: 515-294-2465 Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA On Thu, Oct 11, 2012 at 12:42 AM, ATANU ata.s...@gmail.com wrote: By Connect I meant to say that I was able to write code chunks in LYX and compile them within LYX( using R) to produce results along with other stuffs. There are many tutorials available for doing this under Windows but I could not solve the problem for linux (UBUNTU). -Atanu -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iEYEARECAAYFAlB2bK0ACgkQoYgNqgF2egpT4gCeN3+VSYx2hMAfSc+jp+Jr81b4 mcEAn3xLh8U7hLiB34L1Rouk3ECKN0Ue =OWBb -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to replicate SAS by group processing in R
On Wed, Oct 10, 2012 at 7:09 PM, ramoss ramine.mossad...@finra.org wrote: In SAS I use the following code: proc sort data=upper; by tdate stock_symbol expire strike; run; data upper1; set upper; by tdate stock_symbol expire strike; if first.expire then output; rename strike=astrike; run; on the following data set: tdate stock_symbolexpiration strike 9/11/2012 C 9/16/201211 9/11/2012 C 9/16/201212 9/11/2012 C 9/16/201213 9/12/2012 C 9/16/201214 9/12/2012 C 9/16/201215 9/12/2012 C 9/16/201216 9/12/2012 C 9/16/2012 17 to get the following results: tdate stock_symbolexpiration strike 9/11/2012 C 9/16/201211 9/12/2012 C 9/16/201214 How would I replicate this kind of logic in R? First, replicate it in some kind of universally understood language - like English. Nearly every alien in every sci-fi film I've seen speaks English, so that's a safe assumption :) What does it do? Take the first record within groups defined by tdate? Why does your code say 'expire' but the data have 'expiration'? Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ptak and Candpara
Hi, I am using the package PTAK and in particular the command Candpara to perform the Parafac factorizationor of a tensor. The results are not encouraging as I expected, I'm starting a phase of analysis to see if there are errors. I pose a question and I hope you can help me. The command to run the factorization is: ## CANDECOMP/PARAFAC results- CANDPARA(data_matrix, dim=3) summary(results) U-results[[1]]$v V-results[[2]]$v W-results[[3]]$v data_matrix is a tensor of 943x1682x4. what I want understand is: U, V, W, are really the three factors that I should get from factorization? I hope someone can help me. Thank you. giuseppe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Case study in forensic computing domain
Hi, I am looking for case studies, possibly real world, in forensic domain that will entice forensic computing students and demonstrate the usefulness of machine learning in forensics. Does anyone know of any such case studies? Students should be able to replicate the case study, so it should have some public corpus data and R code to implement the machine learning approach. I think a case study to determine the authorship of document using machine learning would be good. The other case study could a regression model to detect fake currency based on size, weight and other attributes of a note. Any pointers would be welcome. Thanks, Ambi. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GAM without intercept
Hi Sergio, based on my understanding ( see Wood Generalized Additive Model) smoothing basis incorporates the intercept already, due to identifiable issues. Therefore the intercept is always specified and you don't need to specify. I guess that your m2 to model is simply not correct. Hope it helps Anna Anna Freni Sterrantino Department of Statistics University of Bologna, Italy via Belle Arti 41, 40124 BO. Da: SAEC sergio.es...@uach.cl A: r-help@r-project.org Inviato: Giovedì 11 Ottobre 2012 0:22 Oggetto: [R] GAM without intercept Hi everybody, I am trying to fit a GAM model without intercept using library mgcv. However, the result has nothing to do with the observed data. In fact the predicted points are far from the predicted points obtained from the model with intercept. For example: #First I generate some simulated data: library(mgcv) x-seq(0,10,length=100) y-x^2+rnorm(100) #then I fit a gam model with and without intercept m1-gam(y~s(x,k=10,bs='cs')) m2-gam(y~s(x,k=10,bs='cs')-1) #and now I obtain predicted values for the interval 0-1 x1-seq(0,10,0.1) y1-predict(m1,newdata=list(x=x1)) y2-predict(m2,newdata=list(x=x1)) #plotting predicted values plot(x,y,ylim=c(0,100)) lines(x1,y1,lwd=4,col='red') lines(x1,y2,lwd=4,col='blue') In this example you can see that the red line are the predicted points from the model with intercept which fit pretty good to the data, but the blue line (without intercept) is far from the observed points. Probably I missunderstanding some key elements in gam modelling or using incorrect syntaxis. I don't know what the problem is. Any ideas will be helpful. Sergio -- Sergio A. Estay Inst. Ciencias Ambientales y Evolutivas Universidad Austral de Chile Casilla 567, Valdivia, Chile Phone: 5663-293913 http://www.ciencias.uach.cl/instituto/ciencias_ambientales_evolutivas/academicos/sergio-estay.php -- View this message in context: http://r.789695.n4.nabble.com/GAM-without-intercept-tp4645786.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Options to extend memory limit
Dear All, at the moment I am using R for calculations of large databases. Unfortunately, R only manages to complete certain operations at some times, and not at others. I usually get the error message cannot allocate vector of size XX I am using the 64-bit version with Windows 7. While my computer has 8 RAM, I do have a feeling that R cannot use all of it. Searching online, I found that you can increase the memory with the options --max-mem-size/ --max-ppsize or change the environment variable R_MAX_MEM_SIZE to allow deep recursion or large and complicated calculations to be done Unfortunately, I am not very knowledgable yet on how to use R and I did not quite manage to use the commands successfully. Could you please tell me whether these do make sense for my case and if so how (and at what stage of the process) I can use them? Thank you very, very much in advance. Kind regards, Jennifer PricewaterhouseCoopers Aktiengesellschaft Wirtschaftsprüfungsgesellschaft Vorsitzender des Aufsichtsrates WP StB Dr. Norbert Vogelpoth Vorstandsmitglieder WP StB Prof. Dr. Norbert Winkeljohann · WP StB Dr. Peter Bartels WP StB CPA Markus Burghardt · StB Prof. Dr. Dieter Endres · WP StB Prof. Dr. Georg Kämpfer WP StB Harald Kayser · WP RA StB Dr. Jan Konerding · WP StB Andreas Menke StB Marius Möller · WP StB Martin Scholich Sitz: Frankfurt am Main - Amtsgericht Frankfurt am Main HRB 44845 Mitglied von PricewaterhouseCoopers International, einer Company limited by guarantee registriert in England und Wales __ Diese Information ist ausschliesslich fuer den Adressaten bestimmt und kann vertrauliche oder gesetzlich geschuetzte Informationen enthalten. Wenn Sie nicht der bestimmungsgemaesse Adressat sind, unterrichten Sie bitte den Absender und vernichten Sie diese Mail. Anderen als dem bestimmungsgemaessen Adressaten ist es untersagt, diese E-Mail zu lesen, zu speichern, weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu verwenden. Wir verwenden aktuelle Virenschutzprogramme. Fuer Schaeden, die dem Empfaenger gleichwohl durch von uns zugesandte mit Viren befallene E-Mails entstehen, schliessen wir jede Haftung aus. * * * * * The information contained in this email is intended only...{{dropped:15}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple t-tests across similar variable names
Hello, I have a problem, with your data example my results are different. I have changed the names of two of the variables, to allow for 'pre' and 'post' to be first in the names. # auxiliary functions ifswap - function(x) if(x[1] %in% c(pre, post)) x[2:1] else x getpair - function(i, post) post[ which(vmat[post, 1] == vmat[i, 1]) ] makeLine - function(h) c(MeanDiff = unname(h$estimate), CIlower = h$conf.int[1], CIupper = h$conf.int[2], p.value = h$p.value) doTests - function(DF, Pairs){ t.list - lapply( seq_len(nrow(Pairs)), function(i) t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) ) do.call(rbind, lapply(t.list, makeLine)) } # dataset set.seed(432) dat2 - data.frame(apple_pre = sample(10:20,5,replace=TRUE), orange_post = sample(18:28,5,replace=TRUE), pre_banana = sample(25:35,5,replace=TRUE), # here apple_post = sample(20:30,5,replace=TRUE), post_banana = sample(40:50,5,replace=TRUE), # and here orange_pre = sample(5:10,5,replace=TRUE)) # # start processing the data.frame # Make pairs of pre/post columns vars - names(dat2) vmat - do.call(rbind, strsplit(vars, _)) vmat - t(apply(vmat, 1, ifswap)) pre - which(vmat[, 2] == pre) post - which(vmat[, 2] == post) post - sapply(pre, getpair, post) pairs - matrix(c(pre, post), ncol = 2) # now the tests result - doTests(dat2, pairs) rownames(result) - vmat[pre, 1] result In your results I believe that the values for meandifference are the means of x[, 1], at least that's what I've got. Anyway, I'll see both codes again, to try to see what's going on. Hope this helps, Rui Barradas Em 11-10-2012 05:31, arun escreveu: HI, If you have a lot of variables and in no order, then it would be better to order the data by column names. For e.g. set.seed(432) dat2-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE)) dat3-dat2[order(colnames(dat2))] #order the columns list3-list(dat3[,1:2],dat3[,3:4],dat3[,5:6]) res3-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value))) row.names(res3)-unlist(unique(lapply(strsplit(colnames(dat3),_),`[`,1))) res3 # meandifference CIlow CIhigh p.value #apple12.6 8.519476 16.68052 0.0010166626 #banana 15.0 12.088040 17.91196 0.0001388506 #orange 18.2 13.604166 22.79583 0.0003888560 A.K. - Original Message - From: Nundy, Shantanu snu...@chicagobooth.edu To: r-help@r-project.org r-help@r-project.org Cc: Sent: Wednesday, October 10, 2012 7:09 PM Subject: Re: [R] multiple t-tests across similar variable names Hi everyone- I have a dataset with multiple pre and post variables I want to compare. The variables are named apple_pre or pre_banana with the corresponding post variables named apple_post or post_banana. The variables are in no particular order. apple_pre orange_pre orange_post pre_banana apple_post post_banana person_1 person_2 person_3 ... person_x How do I: 1. Run a series of paired t-tests for the apple_pre variables and pre_banana variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*). 2. Print the results from these t-tests in a table with col 1=mean difference, col 2= 95% conf interval, col 3=p-value. Thank you kindly, -Shantanu Shantanu Nundy, M.D. University of Chicago [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] practical to loop over 2million rows?
If I use a nested ifelse statement in a loop it takes me 13 minutes to get an answer on just 50,000 rows. ... ifelse(strataID[i+1]==strataID[i], y-x[i+1], y-x[i-1])) maybe take a closer look at the ifelse help page and the examples? First, ifelse is intended to be vectorized. If you nest it in a loop, you're effectively nesting a loop inside a loop. And by putting ifelse inside ifelse, you've done that twice. And then you've run the loops on vectors of length one, so 'twas all in vain... Second, the two things after the condition in ifelse are not instructions, they are arguments to the function. Putting y-something in as an argument means '(promise to) store something in a variable called y, and then pass y to the function'. You probably didn't mean that. Third, ifelse returns a vector of the results; you're not using the return value for anything. For a single 'if' that takes some action, you want 'if' and 'else' _separately_, not 'ifelse' y-length(x) #length() already returns a numeric value. So if you must do this with a loop, it would look more like for(i in 1:length(x)+1) { #because x[i-1] wand x[i+1] won't be there for all i otherwise if (!is.na(x[i])) , y[i]-x[i] if(strataID[i+1]==strataID[i]) y-x[i+1] else y-x[i] #I changed the second x index because I can't see why it differed from the strataID index #or, using the fact that 'if' also returns something: # y - if(strataID[i+1]==strataID[i]) x[i+1] else x[i] } Finally, if you don't preallocate y at the length you want, R will have to move the whole of y to a new memory location with one more space every time you append something to it. There's a section on that in the R inferno. It's a really good way of slowing R down. So let's try something else. strataID - sample(letters[1:3], 200, replace=T) #a nice long strata identifier with some matches likely x - rnorm(200) #some random numbers x - ifelse(x -2, NA, x) #a few NA's now in x, though it does take a few seconds for the 2 million observations i - 1:(length(x)-1) #A long indexing vector with space for the last x[i+1] y - x #That puts all the NA's in the right place in y, allocates y and happens to put all the current values of x into y too. system.time( y[i]-ifelse( strataID[i+1]==strataID[i], x[i+1], x[i] ) ) #does the whole loop and stores it in the 'right' places in y - # though it will foul up those NA's because of your x indexing. And incidentally it doesn't change the last y either #On my allegedly 2GHz machine the systemt time result was 2.87 seconds for the 2 million 'rows' #Incidentally, a look at what we ended up with: data.frame(s=strataID, y=y)[1:30,] #says you probably aren;t getting anything useful from the exercise other than a feel for what can go wrong with loops. *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm on matrix data
Baoqiang, Here's an approach that should work: (1) Make sure that the column names of trainx and testx are the same. (2) Combine trainy and trainx into a data frame for fitting the model. (2) Use the newdata= argument in the predict() function. (3) Convert testx from matrix to data frame. # some example data nrow - 5 ncol - 3 colnames - paste(x, seq(ncol), sep=) nrow2 - 8 trainx - matrix(rnorm(nrow*ncol), ncol=ncol, dimnames=list(NULL, colnames)) trainy - matrix(rnorm(nrow), ncol=1, dimnames=list(NULL, y)) testx - matrix(rnorm(nrow2*ncol), ncol=ncol, dimnames=list(NULL, colnames)) # create data frames for model fitting and prediction traindf - data.frame(cbind(trainy, trainx)) testdf - data.frame(testx) # fit the model and make predictions for new data fit - lm(y ~ ., data=traindf) py - predict(fit, newdata=testdf) Note that the lm() function you fit to the two matrices worked just fine lm(trainy ~ trainx) but the way that names are assigned to the predictor variables trainxx1, trainxx2, etc makes it inconvenient in predicting on new data. Jean Baoqiang Cao bqcaom...@gmail.com wrote on 10/10/2012 09:35:47 AM: Hi, I have a question about using lm on matrix, have to admit it is very trivial but I just couldn't find the answer after searched the mailing list and other online tutorial. It would be great if you could help. I have a matrix trainx of 492(rows) by 220(columns) that is my x, and trainy is 492 by 1. Also, I have the newdata testx which is 240 (rows) by 220 (columns). Here is what I got: py - predict(lm(trainy ~ trainx ), data.frame(testx)) Warning message: 'newdata' had 240 rows but variable(s) found have 492 rows The fitting formula I intended is: trainy ~ trainx[,1] + trainx[,2] + .. +trainx[,220]. Any help, please? Best, Baoqiang [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Contacting Delphi ??
What does the sudden appearance of Contacting Delphi ..the oracle is unavailable. We apologize for any inconvenience. mean? A bug? It appears at plotting. If you have an ordinary plot command, that is very strange indeed. It's a help message ... of sorts*. It should be no more likely to appear by accident in plotting than a manual page. What was the plot command that caused it to appear? S *It helps you realise you typed too many question marks *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] performance analytics- package
In performance analytics - performance summary session , i cant run the code of - charts.PerformanceSummary(datafrom_table, rf = 0, main = NULL, method = ModifiedVaR, width = 0,event.labels = NULL, ylog = FALSE, wealth.index = FALSE, gap = 12) it just return blank chart. datafrom_table - having a csv file. and the rest of the things are get from the site https://www.rmetrics.org/files/Meielisalp2007/Presentations/Peterson.pdf but i dont get the result - could u please help me. -- View this message in context: http://r.789695.n4.nabble.com/performance-analytics-package-tp4645834.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R(BCA Package)
Hifor all... I have tried jack.jill dataset in BCA package.This Dataset actually contains 557 observations and 8 variables.but i have got only 2 obsevations.anybody tried this same function.You people got same answers like me or getting as usual values? Please reply me by Kokila.k -- View this message in context: http://r.789695.n4.nabble.com/R-BCA-Package-tp4645835.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nlmnib Package + Hessian Output
Sorry but I don't modified my function with mle2 :( :( Can you give example how to obtain Hessian with numDeriv ? Serdar # Function Linn=function(param){ phi1=((param[1]^2/(1+param[1]^2))) phi2=((param[2]^2/(1+param[2]^2))) phi3=((param[3]^2/(1+param[3]^2))) phi4=((param[4]^2/(1+param[4]^2))) sigw1=sqrt(exp(param[5])) sigw2=sqrt(exp(param[6])) sigw3=sqrt(exp(param[7])) sigw4=sqrt(exp(param[8])) sigv=sqrt(exp(param[9])) Betam1=((param[10]*100)/(sqrt(1+param[10]^2))) Betam2=((param[11]*100)/(sqrt(1+param[11]^2))) Betam3=((param[12]*100)/(sqrt(1+param[12]^2))) Betam4=((param[13]*100)/(sqrt(1+param[13]^2))) phi=diag(c(phi1,phi2,phi3,phi4),4,4) betam=c(Betam1,Betam2,Betam3,Betam4) sigw=diag(c(sigw1,sigw2,sigw3,sigw4),4,4) a-(1.001) mu0=c(ols[1,1],ols[2,1],ols[3,1],ols[4,1]) sigma0=diag(c(a,a,a,a),4,4) kf=kfilter1(n,rt,rm,mu0,sigma0,phi,betam,sigw,sigv) return(kf$like) } a-(1.001) init.par-c(0.5,0.5,0.5,0.5,a,a,a,a,a,ols[1,1],ols[2,1],ols[3,1],ols[4,1]) ### -- View this message in context: http://r.789695.n4.nabble.com/nlmnib-Package-Hessian-Output-tp4645768p4645838.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading in a (very simple) list from a file
Brilliant! Thank you both, this works! Combined with the other suggestion of setting stringsAsFactors to FALSE when reading in the data frame, I now have the behaviour I wanted. I had been beginning to get the sense that one of the apply functions was the solution. I will now do some reading on split to understand precisely what I'm doing... Best wishes, Anne -- View this message in context: http://r.789695.n4.nabble.com/reading-in-a-very-simple-list-from-a-file-tp4645741p4645839.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] column width in .dbf files using write.dbf ... to be continued
Old topic... An answer may be useful for someone else, though... Just do : environment(write.dbfMODIF)-environment(foreign::write.dbf ) and it should be good to go. Cheers, -- View this message in context: http://r.789695.n4.nabble.com/column-width-in-dbf-files-using-write-dbf-to-be-continued-tp1013017p4645841.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Exporting each row in the table as new table
Dear all, I am new to R and I am familiar with very basic stuff. I am trying to create tables in text format from each row of my table and export these tables with specific attribute in the table. I tried after reading some forums but nothing worked. Can you please help me. ex: dataGT ID State YearGrowth 1 IA 199925 2 IA 200027 3 KS 199935 4 KS 200031 5 KY 199914 6 KY 200018 7 NE 199934 8 NE 200038 I am trying to have each row of the table as new table and need to export that table with name of of the ID. Please help me if possible. Thank you Kalyani -- View this message in context: http://r.789695.n4.nabble.com/Exporting-each-row-in-the-table-as-new-table-tp4645844.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exporting summary plm results to latex
Hi, I tried this function on an example dataset and it seems to be working. extract.plm - function(model) { if (!class(model)[1] == plm) { stop(Internal error: Incorrect model type! Should be a plm object!) } zz1-summary(model)$coef[,1:2] zz2-as.data.frame(apply(zz1,2,function(x) sprintf(%.3f,x))) zz2[]-sapply(zz2,function(x) as.numeric(as.character(x))) zz3-data.frame(Coefficient=row.names(zz1),zz2) zz3-melt(zz3,by=Coefficient) zz4-within(zz3,{Coefficient-as.character(Coefficient);variable-as.character(variable)}) zz5-ddply(zz4,.(Coefficient),function(x) x) zz5$value[zz5$variable==Estimate] zz5$value[zz5$variable==Std..Error] zz5$value[zz5$variable==Estimate]-ifelse(summary(model)$coef[,4]0.05 summary(model)$coef[,4]=0.01, gsub((.*),\\1*,zz5$value[zz5$variable==Estimate]),ifelse(summary(model)$coef[,4]0.01,gsub((.*),\\1**,zz5$value[zz5$variable==Estimate]),zz5$value[zz5$variable==Estimate])) zz5$value[zz5$variable==Std..Error]-gsub((.*),(\\1),zz5$value[zz5$variable==Std..Error]) res-zz5[,c(1,3)] res } data(Produc, package = plm) zz - plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, index = c(state,year)) extract.plm(zz) #Using Coefficient as id variables # Coefficient value #1 log(emp) 0.768 #2 log(emp) (0.03) #3 log(pc) 0.292** #4 log(pc) (0.025) #5 log(pcap) -0.026** #6 log(pcap) (0.029) #7 unemp -0.005** #8 unemp (0.001) library(xtable) xtable(extract.plm(zz)) Using Coefficient as id variables % latex table generated in R 2.15.0 by xtable 1.7-0 package % Thu Oct 11 09:43:00 2012 \begin{table}[ht] \begin{center} \begin{tabular}{rll} \hline Coefficient value \\ \hline 1 log(emp) 0.768 \\ 2 log(emp) (0.03) \\ 3 log(pc) 0.292** \\ 4 log(pc) (0.025) \\ 5 log(pcap) -0.026** \\ 6 log(pcap) (0.029) \\ 7 unemp -0.005** \\ 8 unemp (0.001) \\ \hline \end{tabular} \end{center} \end{table} A.K. - Original Message - From: Sebastian Barfort sb3...@nyu.edu To: Duncan Mackay mac...@northnet.com.au Cc: r-help-r-project.org r-help@r-project.org Sent: Wednesday, October 10, 2012 7:45 PM Subject: Re: [R] Exporting summary plm results to latex I am also interested in the standard errors, but beneath not next to the point estimates which is standard in the xtable package. If you by any chance remember the name of the package or how to do it that would be much appreciated! Cheers, Sebastian On Oct 10, 2012, at 7:10 PM, Duncan Mackay mac...@northnet.com.au wrote: Hi If you just want the coefficients. xtable(summary(fe)$coef) % latex table generated in R 2.15.1 by xtable 1.7-0 package % Thu Oct 11 09:04:59 2012 \begin{table}[ht] \begin{center} \begin{tabular}{r} \hline Estimate Std. Error t-value Pr($$$|$t$|$) \\ \hline x 0.12 0.07 1.78 0.08 \\ \hline \end{tabular} \end{center} \end{table} There is another package whose name eludes me which may help for tables which have different outputs to the output of lm etc HTH Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England Armidale NSW 2351 Email: home: mac...@northnet.com.au At 05:09 11/10/2012, you wrote: HI, May be you can use library(texreg): library(plm) #generating some data x - rnorm(270) y - rnorm(270) t - rep(1:3,30) i - rep(1:90, each=3) data - data.frame(i,t,x,y) fe - plm(y~x,data=data,model=within) summary(fe) library(texreg) fe1-extract.plm(fe) #extract the plm object library(xtable) xtable(do.call(rbind,lapply(fe1,function(x) data.frame(x % latex table generated in R 2.15.0 by xtable 1.7-0 package % Wed Oct 10 14:59:10 2012 \begin{table}[ht] \begin{center} \begin{tabular}{rr} \hline x \\ \hline Estimate -0.03 \\ Std. Error 0.08 \\ Pr($$$|$t$|$) 0.68 \\ R\$\verb|^|2\$ 0.00 \\ Adj. R\$\verb|^|2\$ 0.00 \\ Num. obs. 270.00 \\ \hline \end{tabular} \end{center} \end{table} #Another example. In this case, you can create two tables from the zz1 list data(Produc, package = plm) zz - plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, index = c(state,year)) zz1-extract.plm(zz) lapply(lapply(zz1,function(x) data.frame(x)),xtable) [[1]] % latex table generated in R 2.15.0 by xtable 1.7-0 package % Wed Oct 10 15:08:02 2012 \begin{table}[ht] \begin{center} \begin{tabular}{} \hline Estimate Std..Error Pr...t.. \\ \hline log(pcap) -0.03 0.03 0.37 \\ log(pc) 0.29 0.03 0.00 \\ log(emp) 0.77 0.03 0.00 \\ unemp -0.01 0.00 0.00 \\ \hline \end{tabular} \end{center} \end{table} [[2]] % latex table generated in R 2.15.0 by xtable 1.7-0 package % Wed Oct 10 15:08:02 2012 \begin{table}[ht] \begin{center} \begin{tabular}{rr} \hline x \\ \hline R\$\verb|^|2\$ 0.94 \\ Adj. R\$\verb|^|2\$ 0.88 \\ Num. obs. 816.00 \\ \hline \end{tabular} \end{center} \end{table} Hope it helps. A.K.
Re: [R] performance analytics- package
On Thu, Oct 11, 2012 at 11:04 AM, sheenmaria sheenmar...@gmail.com wrote: In performance analytics - performance summary session , i cant run the code of - charts.PerformanceSummary(datafrom_table, rf = 0, main = NULL, method = ModifiedVaR, width = 0,event.labels = NULL, ylog = FALSE, wealth.index = FALSE, gap = 12) it just return blank chart. datafrom_table - having a csv file. and the rest of the things are get from the site https://www.rmetrics.org/files/Meielisalp2007/Presentations/Peterson.pdf but i dont get the result - could u please help me. charts.PerformanceSummary() is well tested, so you'll need to supply datafrom_table (or an approximation thereof) using the dput() function to make this problem reproducible. Note that dput(datafrom_table) will cause R to print a lot of what might seem to you gibberish but it's important you copy and paste it directly into your reply to allow us to replicate your problem. If your dataset is large, use dput(head(datafrom_table, 30)) instead. Finally, I note you're posting from Nabble. Please include context in your reply -- I don't believe Nabble does this automatically, so you'll need to manually include it. Most of the regular respondents on this list don't use Nabble -- it is a _mailing list_ after all -- so we don't get the forum view you do, only emails of the individual posts. Combine that with the high volume of posts, and it's quite difficult to trace a discussion if we all don't make sure to include context. Cheers, Michael -- View this message in context: http://r.789695.n4.nabble.com/performance-analytics-package-tp4645834.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exporting each row in the table as new table
On Thu, Oct 11, 2012 at 2:04 PM, kallu kallu...@gmail.com wrote: Dear all, I am new to R and I am familiar with very basic stuff. I am trying to create tables in text format from each row of my table and export these tables with specific attribute in the table. I tried after reading some forums but nothing worked. Can you please help me. ex: dataGT ID State YearGrowth 1 IA 199925 2 IA 200027 3 KS 199935 4 KS 200031 5 KY 199914 6 KY 200018 7 NE 199934 8 NE 200038 I am trying to have each row of the table as new table and need to export that table with name of of the ID. Please help me if possible. Thank you Kalyani Hi Kalyani, I'm afraid I don't understand your question: what do you mean in this context by table? data frame()s? csv files? And in either case, why are you splitting into single row objects? When you say attribute do you mean the formal programming construct that is key to many things in R or something simpler? In short, could you elaborate further? Finally, I note you're posting from Nabble. Please include context in your reply -- I don't believe Nabble does this automatically, so you'll need to manually include it. Most of the regular respondents on this list don't use Nabble -- it is a _mailing list_ after all -- so we don't get the forum view you do, only emails of the individual posts. Combine that with the high volume of posts, and it's quite difficult to trace a discussion if we all don't make sure to include context. This might also be of help: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example Cheers, Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple t-tests across similar variable names
HI Rui, By running your code, I got the results as: result # MeanDiff CIlower CIupper p.value #apple -12.6 -16.68052 -8.519476 0.0010166626 #banana -15.0 -17.91196 -12.088040 0.0001388506 #orange -18.2 -22.79583 -13.604166 0.0003888560 From my code: res3 # meandifference CIlow CIhigh p.value #apple 12.6 8.519476 16.68052 0.0010166626 #banana 15.0 12.088040 17.91196 0.0001388506 #orange 18.2 13.604166 22.79583 0.0003888560 There is difference in signs. A.K. - Original Message - From: Rui Barradas ruipbarra...@sapo.pt To: arun smartpink...@yahoo.com; Nundy, Shantanu snu...@chicagobooth.edu Cc: R help r-help@r-project.org Sent: Thursday, October 11, 2012 9:25 AM Subject: Re: [R] multiple t-tests across similar variable names Hello, I have a problem, with your data example my results are different. I have changed the names of two of the variables, to allow for 'pre' and 'post' to be first in the names. # auxiliary functions ifswap - function(x) if(x[1] %in% c(pre, post)) x[2:1] else x getpair - function(i, post) post[ which(vmat[post, 1] == vmat[i, 1]) ] makeLine - function(h) c(MeanDiff = unname(h$estimate), CIlower = h$conf.int[1], CIupper = h$conf.int[2], p.value = h$p.value) doTests - function(DF, Pairs){ t.list - lapply( seq_len(nrow(Pairs)), function(i) t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) ) do.call(rbind, lapply(t.list, makeLine)) } # dataset set.seed(432) dat2 - data.frame(apple_pre = sample(10:20,5,replace=TRUE), orange_post = sample(18:28,5,replace=TRUE), pre_banana = sample(25:35,5,replace=TRUE), # here apple_post = sample(20:30,5,replace=TRUE), post_banana = sample(40:50,5,replace=TRUE), # and here orange_pre = sample(5:10,5,replace=TRUE)) # # start processing the data.frame # Make pairs of pre/post columns vars - names(dat2) vmat - do.call(rbind, strsplit(vars, _)) vmat - t(apply(vmat, 1, ifswap)) pre - which(vmat[, 2] == pre) post - which(vmat[, 2] == post) post - sapply(pre, getpair, post) pairs - matrix(c(pre, post), ncol = 2) # now the tests result - doTests(dat2, pairs) rownames(result) - vmat[pre, 1] result In your results I believe that the values for meandifference are the means of x[, 1], at least that's what I've got. Anyway, I'll see both codes again, to try to see what's going on. Hope this helps, Rui Barradas Em 11-10-2012 05:31, arun escreveu: HI, If you have a lot of variables and in no order, then it would be better to order the data by column names. For e.g. set.seed(432) dat2-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE)) dat3-dat2[order(colnames(dat2))] #order the columns list3-list(dat3[,1:2],dat3[,3:4],dat3[,5:6]) res3-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value))) row.names(res3)-unlist(unique(lapply(strsplit(colnames(dat3),_),`[`,1))) res3 # meandifference CIlow CIhigh p.value #apple 12.6 8.519476 16.68052 0.0010166626 #banana 15.0 12.088040 17.91196 0.0001388506 #orange 18.2 13.604166 22.79583 0.0003888560 A.K. - Original Message - From: Nundy, Shantanu snu...@chicagobooth.edu To: r-help@r-project.org r-help@r-project.org Cc: Sent: Wednesday, October 10, 2012 7:09 PM Subject: Re: [R] multiple t-tests across similar variable names Hi everyone- I have a dataset with multiple pre and post variables I want to compare. The variables are named apple_pre or pre_banana with the corresponding post variables named apple_post or post_banana. The variables are in no particular order. apple_pre orange_pre orange_post pre_banana apple_post post_banana person_1 person_2 person_3 ... person_x How do I: 1. Run a series of paired t-tests for the apple_pre variables and pre_banana variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*). 2. Print the results from these t-tests in a table with col 1=mean difference, col 2= 95% conf interval, col 3=p-value. Thank you kindly, -Shantanu Shantanu Nundy, M.D. University of Chicago [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org
Re: [R] Options to extend memory limit
jennifer.moeller-gulland at de.pwc.com writes: at the moment I am using R for calculations of large databases. Unfortunately, R only manages to complete certain operations at some times, and not at others. I usually get the error message cannot allocate vector of size XX I am using the 64-bit version with Windows 7. While my computer has 8 RAM, I do have a feeling that R cannot use all of it. Searching online, I found that you can increase the memory with the options --max-mem-size/ --max-ppsize or change the environment variable R_MAX_MEM_SIZE to allow deep recursion or large and complicated calculations to be done I believe this may be somewhat out of date (although I don't use Windows so I'm a little rusty). If you are dealing with large databases you should almost certainly check out the High Performance Computing task view (you can google it), which recommends many approaches for dealing with Big Data. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dotplot in .R with lattice latticeExtra: proper visualization
Dear everyone, I'm trying to do a dotplot with the libraries lattice and latticeExtra. However, no proper representation of the values on the vertical y-axis is done by .R. Instead of choosing the actual values of the numeric variable, .R plots the rank of the value. That is, there are values [375, 500, 625, 750, ..., 3000] and .R plots their ranks [1,2,3,4,...23] and chooses the scale accordingly. Has someone experienced a problem like this? How can I manage the get a proper representation with ticks like (0, 500, 1000, 1500, ...) on the vertical y-scale? Here's my data: https://www.dropbox.com/s/egy25cj00rhum40/data.csv And here the program code so far: df.dose - read.table(data.csv, sep=,, header=TRUE) library(lattice); library(latticeExtra) useOuterStrips(dotplot(z ~ sample.size | as.factor(effect.size)*as.factor(true.dose), groups=as.factor(type), data=df.dose, as.table=TRUE)) I'd be glad for any kind of help! Andres - Andres -- View this message in context: http://r.789695.n4.nabble.com/dotplot-in-R-with-lattice-latticeExtra-proper-visualization-tp4645850.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] own function: computing time
That's perfect, thanks a lot! Tonja Gesendet: Mittwoch, 10. Oktober 2012 um 21:37 Uhr Von: William Dunlap wdun...@tibco.com An: tonja.krue...@web.de tonja.krue...@web.de, r-help@r-project.org r-help@r-project.org Betreff: RE: [R] own function: computing time Your original method would be the following function f - function (x, y) { xy - cbind(x, y) outside - function(z) { !any(x z[1] y z[2]) } j - apply(xy, 1, outside) which(j) } and the following one quickly computes the same thing as the above as long as there are no repeated points (if there are repeated points it chooses one of them). f1 - function (x, y) { o - order(x, decreasing = TRUE) yo - y[o] j - logical(length(y)) j[o] - yo == cummax(yo) which(j) } Think of the problem as finding the ladder points (Feller's term) of a sequence of points, the places where the sequence reaches a new high point. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of William Dunlap Sent: Wednesday, October 10, 2012 9:52 AM To: tonja.krue...@web.de; r-help@r-project.org Subject: Re: [R] own function: computing time No, the desired points are not a subset of the convex hull. E.g., x=c(0,1:5), y=c(0,1/(1:5)). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: William Dunlap Sent: Wednesday, October 10, 2012 9:46 AM To: 'tonja.krue...@web.de'; r-help@r-project.org Subject: RE: [R] own function: computing time Are the points you are looking for (those data points with no other data points above or to the right of them) a subset of the convex hull of the data points? If so, chull(x,y) can quickly give you the points on the convex hull (typically a fairly small number) and you can look through them for the ones you want. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of tonja.krue...@web.de Sent: Wednesday, October 10, 2012 3:16 AM To: r-help@r-project.org Subject: [R] own function: computing time Hi all, I wrote a function that actually does what I want it to do, but it tends to be very slow for large amount of data. On my computer it takes 5.37 seconds for 16000 data points and 21.95 seconds for 32000 data points. As my real data consists of 1800 data points it would take ages to use the function as it is now. Could someone help me to speed up the calculation? Thank you, Tonja system.time({ x - runif(32000) y - runif(32000) xy - cbind(x,y) outer - function(z){ !any(x z[1] y z[2])} j - apply(xy,1, outer) plot(x,y) points(x[j],y[j],col=green) }) __ R-help@r-project.org mailing list [1]https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide [2]http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list [3]https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide [4]http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. References 1. https://stat.ethz.ch/mailman/listinfo/r-help 2. http://www.R-project.org/posting-guide.html 3. https://stat.ethz.ch/mailman/listinfo/r-help 4. http://www.R-project.org/posting-guide.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple t-tests across similar variable names
Hello, If that is the problem now, then change the variables' names. In what follows, the first line is just the example you gave. In the actual runnunig code uncomment the commented out lines. vars - c(red_apple_pre, post_banana_organic) #vars - names(dat) vars - gsub(_pre, =pre, vars) vars - gsub(_post, =post, vars) vars - gsub(pre_, pre=, vars) vars - gsub(post_, post=, vars) vars - gsub(_, \\., vars) vars - sub(=, _, vars) #names(dat) - vars Rui Barradas Em 11-10-2012 15:17, Nundy, Shantanu escreveu: Actually, I see now that part of the problem is that many of the names have multiple underscores such as red_apple_pre or post_banana_organic. I think this is causing a problem for this line in your code: vmat - do.call(rbind, strsplit(vars, _)) Shantanu From: Nundy, Shantanu Sent: Thursday, October 11, 2012 9:07 AM To: Rui Barradas Subject: RE: [R] multiple t-tests across similar variable names Rui, Thank you so much for your solution. It is exactly what I was struggling with! One small question. When I ran the code on my actual dataset I got the error below: vars - names(master) vmat - do.call(rbind, strsplit(vars, _)) Warning message: In function (..., deparse.level = 1) : number of columns of result is not a multiple of vector length (arg 1) My guess is that the problem is not all the variables have pre or post in them. Some of the variables are constants that I will not do a paired t-test on. What would be the easiest way to get around this, perhaps even by simply removing all of the variables that have neither pre or post in them? Thanks again, Shantanu From: arun [smartpink...@yahoo.com] Sent: Thursday, October 11, 2012 8:50 AM To: Rui Barradas Cc: Nundy, Shantanu Subject: Re: [R] multiple t-tests across similar variable names HI Rui, Thanks for testing the code. I will look into it later. A.K. - Original Message - From: Rui Barradas ruipbarra...@sapo.pt To: arun smartpink...@yahoo.com; Nundy, Shantanu snu...@chicagobooth.edu Cc: R help r-help@r-project.org Sent: Thursday, October 11, 2012 9:25 AM Subject: Re: [R] multiple t-tests across similar variable names Hello, I have a problem, with your data example my results are different. I have changed the names of two of the variables, to allow for 'pre' and 'post' to be first in the names. # auxiliary functions ifswap - function(x) if(x[1] %in% c(pre, post)) x[2:1] else x getpair - function(i, post) post[ which(vmat[post, 1] == vmat[i, 1]) ] makeLine - function(h) c(MeanDiff = unname(h$estimate), CIlower = h$conf.int[1], CIupper = h$conf.int[2], p.value = h$p.value) doTests - function(DF, Pairs){ t.list - lapply( seq_len(nrow(Pairs)), function(i) t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) ) do.call(rbind, lapply(t.list, makeLine)) } # dataset set.seed(432) dat2 - data.frame(apple_pre = sample(10:20,5,replace=TRUE), orange_post = sample(18:28,5,replace=TRUE), pre_banana = sample(25:35,5,replace=TRUE), # here apple_post = sample(20:30,5,replace=TRUE), post_banana = sample(40:50,5,replace=TRUE), # and here orange_pre = sample(5:10,5,replace=TRUE)) # # start processing the data.frame # Make pairs of pre/post columns vars - names(dat2) vmat - do.call(rbind, strsplit(vars, _)) vmat - t(apply(vmat, 1, ifswap)) pre - which(vmat[, 2] == pre) post - which(vmat[, 2] == post) post - sapply(pre, getpair, post) pairs - matrix(c(pre, post), ncol = 2) # now the tests result - doTests(dat2, pairs) rownames(result) - vmat[pre, 1] result In your results I believe that the values for meandifference are the means of x[, 1], at least that's what I've got. Anyway, I'll see both codes again, to try to see what's going on. Hope this helps, Rui Barradas Em 11-10-2012 05:31, arun escreveu: HI, If you have a lot of variables and in no order, then it would be better to order the data by column names. For e.g. set.seed(432) dat2-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE)) dat3-dat2[order(colnames(dat2))] #order the columns list3-list(dat3[,1:2],dat3[,3:4],dat3[,5:6]) res3-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value))) row.names(res3)-unlist(unique(lapply(strsplit(colnames(dat3),_),`[`,1))) res3 # meandifference CIlow CIhigh p.value #apple12.6 8.519476 16.68052 0.0010166626 #banana 15.0 12.088040 17.91196 0.0001388506 #orange 18.2 13.604166
Re: [R] Options to extend memory limit
On Thu, 11 Oct 2012 14:45:16 +0200, jennifer.moeller-gull...@de.pwc.com wrote: Dear All, at the moment I am using R for calculations of large databases. Unfortunately, R only manages to complete certain operations at some times, and not at others. I usually get the error message cannot allocate vector of size XX I am using the 64-bit version with Windows 7. While my computer has 8 RAM, I do have a feeling that R cannot use all of it. Searching online, I found that you can increase the memory with the options --max-mem-size/ --max-ppsize or change the environment variable R_MAX_MEM_SIZE to allow deep recursion or large and complicated calculations to be done Unfortunately, I am not very knowledgable yet on how to use R and I did not quite manage to use the commands successfully. Could you please tell me whether these do make sense for my case and if so how (and at what stage of the process) I can use them? Are you sure you're using the 64 bit R executable which comes with the R installation? -- Seb __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple t-tests across similar variable names
Hello, Em 11-10-2012 15:14, arun escreveu: HI Rui, By running your code, I got the results as: result # MeanDiff CIlowerCIupper p.value #apple -12.6 -16.68052 -8.519476 0.0010166626 #banana-15.0 -17.91196 -12.088040 0.0001388506 #orange-18.2 -22.79583 -13.604166 0.0003888560 From my code: res3 # meandifference CIlow CIhigh p.value #apple12.6 8.519476 16.68052 0.0010166626 #banana 15.0 12.088040 17.91196 0.0001388506 #orange 18.2 13.604166 22.79583 0.0003888560 There is difference in signs. Mistery solved. Rui Barradas A.K. - Original Message - From: Rui Barradas ruipbarra...@sapo.pt To: arun smartpink...@yahoo.com; Nundy, Shantanu snu...@chicagobooth.edu Cc: R help r-help@r-project.org Sent: Thursday, October 11, 2012 9:25 AM Subject: Re: [R] multiple t-tests across similar variable names Hello, I have a problem, with your data example my results are different. I have changed the names of two of the variables, to allow for 'pre' and 'post' to be first in the names. # auxiliary functions ifswap - function(x) if(x[1] %in% c(pre, post)) x[2:1] else x getpair - function(i, post) post[ which(vmat[post, 1] == vmat[i, 1]) ] makeLine - function(h) c(MeanDiff = unname(h$estimate), CIlower = h$conf.int[1], CIupper = h$conf.int[2], p.value = h$p.value) doTests - function(DF, Pairs){ t.list - lapply( seq_len(nrow(Pairs)), function(i) t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) ) do.call(rbind, lapply(t.list, makeLine)) } # dataset set.seed(432) dat2 - data.frame(apple_pre = sample(10:20,5,replace=TRUE), orange_post = sample(18:28,5,replace=TRUE), pre_banana = sample(25:35,5,replace=TRUE), # here apple_post = sample(20:30,5,replace=TRUE), post_banana = sample(40:50,5,replace=TRUE), # and here orange_pre = sample(5:10,5,replace=TRUE)) # # start processing the data.frame # Make pairs of pre/post columns vars - names(dat2) vmat - do.call(rbind, strsplit(vars, _)) vmat - t(apply(vmat, 1, ifswap)) pre - which(vmat[, 2] == pre) post - which(vmat[, 2] == post) post - sapply(pre, getpair, post) pairs - matrix(c(pre, post), ncol = 2) # now the tests result - doTests(dat2, pairs) rownames(result) - vmat[pre, 1] result In your results I believe that the values for meandifference are the means of x[, 1], at least that's what I've got. Anyway, I'll see both codes again, to try to see what's going on. Hope this helps, Rui Barradas Em 11-10-2012 05:31, arun escreveu: HI, If you have a lot of variables and in no order, then it would be better to order the data by column names. For e.g. set.seed(432) dat2-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE)) dat3-dat2[order(colnames(dat2))] #order the columns list3-list(dat3[,1:2],dat3[,3:4],dat3[,5:6]) res3-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value))) row.names(res3)-unlist(unique(lapply(strsplit(colnames(dat3),_),`[`,1))) res3 # meandifference CIlow CIhigh p.value #apple12.6 8.519476 16.68052 0.0010166626 #banana 15.0 12.088040 17.91196 0.0001388506 #orange 18.2 13.604166 22.79583 0.0003888560 A.K. - Original Message - From: Nundy, Shantanu snu...@chicagobooth.edu To: r-help@r-project.org r-help@r-project.org Cc: Sent: Wednesday, October 10, 2012 7:09 PM Subject: Re: [R] multiple t-tests across similar variable names Hi everyone- I have a dataset with multiple pre and post variables I want to compare. The variables are named apple_pre or pre_banana with the corresponding post variables named apple_post or post_banana. The variables are in no particular order. apple_pre orange_pre orange_post pre_banana apple_post post_banana person_1 person_2 person_3 ... person_x How do I: 1. Run a series of paired t-tests for the apple_pre variables and pre_banana variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*). 2. Print the results from these t-tests in a table with col 1=mean difference, col 2= 95% conf interval, col 3=p-value. Thank you kindly, -Shantanu Shantanu Nundy, M.D. University of Chicago [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Options to extend memory limit
On Oct 11, 2012, at 9:55 AM, Sebastian P. Luque splu...@gmail.com wrote: On Thu, 11 Oct 2012 14:45:16 +0200, jennifer.moeller-gull...@de.pwc.com wrote: Dear All, at the moment I am using R for calculations of large databases. Unfortunately, R only manages to complete certain operations at some times, and not at others. I usually get the error message cannot allocate vector of size XX I am using the 64-bit version with Windows 7. While my computer has 8 RAM, I do have a feeling that R cannot use all of it. Searching online, I found that you can increase the memory with the options --max-mem-size/ --max-ppsize or change the environment variable R_MAX_MEM_SIZE to allow deep recursion or large and complicated calculations to be done Unfortunately, I am not very knowledgable yet on how to use R and I did not quite manage to use the commands successfully. Could you please tell me whether these do make sense for my case and if so how (and at what stage of the process) I can use them? Are you sure you're using the 64 bit R executable which comes with the R installation? Sebastian hit on my initial thought here, though depending upon how much data you are dealing with, 8Gb may indeed not be enough and some of your RAM may be used by other processes/applications, leaving less for R. A quick check to see which version you are running is to use: .Machine$sizeof.pointer If it returns 8, you are using the 64 bit version of R. If it comes back with 4, you are using the 32 bit version of R, which of course will be more limited in how much RAM it can access. If it returns 8, then as Ben noted, you may want to evaluate some of the Large Memory options on the HPC task view: http://cran.r-project.org/web/views/HighPerformanceComputing.html or of course install more RAM. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dotplot in .R with lattice latticeExtra: proper visualization
On Oct 11, 2012, at 6:48 AM, Andres LaCortadora wrote: Dear everyone, I'm trying to do a dotplot with the libraries lattice and latticeExtra. However, no proper representation of the values on the vertical y-axis is done by .R. Instead of choosing the actual values of the numeric variable, .R plots the rank of the value. That is, there are values [375, 500, 625, 750, ..., 3000] and .R plots their ranks [1,2,3,4,...23] and chooses the scale accordingly. Has someone experienced a problem like this? How can I manage the get a proper representation with ticks like (0, 500, 1000, 1500, ...) on the vertical y-scale? I suspect it will be difficult with dotplot. It is expecting a factor variable for the y-value and appears to be coercing the LHS argument to one. Why not use xyplot if you are plotting numeric by numeric? If you what to add horizontal lines `ala` dotplot you could construct a panel that had the appropriate commands. -- David. Here's my data: https://www.dropbox.com/s/egy25cj00rhum40/data.csv And here the program code so far: df.dose - read.table(data.csv, sep=,, header=TRUE) library(lattice); library(latticeExtra) useOuterStrips(dotplot(z ~ sample.size | as.factor(effect.size)*as.factor(true.dose), groups=as.factor(type), data=df.dose, as.table=TRUE)) I'd be glad for any kind of help! Andres David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optim and nlminb
It appears you are using the approach throw every method at a problem and select the answer you like. I use this quite a lot with optimx to see just what disasters I can create, but I do so to see if the software will return sensible error messages. You will have to provide a reproducible example if you want useful answers from this list (as per posting guide). Optimization tools are like F1 racing cars -- many controls and settings, with lots of power but difficulties in controlling it. Their users -- even if well-qualified in other areas -- are unfortunately often those who have trouble riding a bicycle with just one speed. There is a serious and quite involved learning curve. Previously you tried optimx, but seem to have misunderstood or disregarded the answers. It is quite likely the problem you are sending to the optimizers is ill-posed or plain wrong. Certainly it does not have a gradient function, which is almost always a good idea. If you prepare a reproducible example that can be run by readers of the list you will a) discover what is wrong as you prepare it, or b) be able to submit and very likely get useful help. Indeed in several years on the list, I've never seen a query with a short, testable case fail to get an answer very quickly. JN On 10/11/2012 06:00 AM, r-help-requ...@r-project.org wrote: Message: 92 Date: Wed, 10 Oct 2012 13:16:38 -0700 (PDT) From: nserdar snes1...@hotmail.com To: r-help@r-project.org Subject: [R] optim and nlminb Message-ID: 1349900198210-4645772.p...@n4.nabble.com Content-Type: text/plain; charset=us-ascii #optim package estimate-optim(init.par,Linn,hessian=TRUE, method=c(L-BFGS-B),control = list(trace=1,abstol=0.001),lower=c(0,0,0,0,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf),upper=c(1,1,1,1,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf)) #nlminb package estimate-nlminb(init.par,Linn,gr=NULL,hessian=TRUE,control = list(trace=1,factr=1),lower=c(0,0,0,0,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf),upper=c(1,1,1,1,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf)) I did not get same results from above equations. Log-likelihood values are close but parameter estimation completely different. My expectation is very close to nlminb packages. Do you have any idea and suggestion between packages? Regards, Serdar __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Formatting data for bootstrapping for confidence intervals
Hi all, New to R, so this may be obvious to some. I've been trying to figure this out for a while, I have a dataset events that looks something like this: AreaNAMEDATEX Xn Y 1 X 1/10/10 1 1 0 1 Y 1/11/10 0 0 1 1 X 1/12/10 1 0 0 1 X 1/12/10 1 0 0 1 X 1/12/10 1 0 0 2 X 2/12/10 1 1 0 2 X 2/12/10 1 0 0 2 Y 2/12/10 0 0 1 2 X 2/13/10 1 0 0 2 X 2/13/10 1 0 0 2 X 2/13/10 1 0 0 2 X 2/14/10 1 0 0 2 X 2/14/10 1 0 0 2 X 2/14/10 1 1 0 2 X 2/14/10 1 0 0 3 X 7/27/11 1 0 0 3 X 7/27/11 1 1 0 3 X 7/27/11 1 0 0 3 X 7/28/11 1 0 0 3 X 7/28/11 1 1 0 3 X 7/28/11 1 0 0 3 X 7/28/11 1 0 0 3 Y 7/28/11 0 0 1 3 X 7/28/11 1 0 0 3 X 7/28/11 1 1 0 3 Y 7/28/11 0 0 1 3 X 7/28/11 1 0 0 3 X 7/29/11 1 0 0 3 X 7/29/11 1 0 0 3 X 7/29/11 1 1 0 X and Y are events. Every row represents a single event happening, with a 1 indicating which one happens at that time. Xn indicates X happening at night. I want to bootstrap these events over days but I think I need to summarize them first, ie. get something that looks like this: AreaDATEX Xn Y 1 1/10/10 1 1 0 1 1/11/10 0 0 1 1 1/12/10 3 0 0 2 2/12/10 2 1 1 etc. and then for each Area, bootstrap the data over the days. Any ideas? I've tried using the 'reshape' package but I don't know how to sum over parts of the columns as defined by the DATE values... Many thanks ahead! -- View this message in context: http://r.789695.n4.nabble.com/Formatting-data-for-bootstrapping-for-confidence-intervals-tp4645860.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple t-tests across similar variable names
Hi Shantanu, I guess the below code should solve both the issues: set.seed(432) dat2-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),pre_banana=sample(25:35,5,replace=TRUE),post_apple=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE)) colnames(dat2)-gsub(^pre\\_(.*),\\1_pre,gsub(^post\\_(.*),\\1_post,colnames(dat2))) dat3-t(dat2[order(colnames(dat2))]) dat3-data.frame(varName=gsub((.*)\\_.*,\\1,row.names(dat3)),dat3) list3-lapply(split(dat3,dat3$varName),function(x) t(x[-1])) res3-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value))) res3 # meandifference CIlow CIhigh p.value #apple 12.6 8.519476 16.68052 0.0010166626 #banana 15.0 12.088040 17.91196 0.0001388506 #orange 18.2 13.604166 22.79583 0.0003888560 A.K. - Original Message - From: Nundy, Shantanu snu...@chicagobooth.edu To: arun smartpink...@yahoo.com Cc: Sent: Thursday, October 11, 2012 10:22 AM Subject: RE: [R] multiple t-tests across similar variable names hi Arun, This is very helpful thanks. I'm running into a couple issues: 1. Since some of the variables start with pre_apple and others apple_post sorting the variables doesn't completely put pre-post variables next to each other. 2. I have about 50 variables so typing this line is a bit cumbersome: list3-list(dat3[,1:2],dat3[,3:4],dat3[,5:6]) Thanks, Shantanu From: arun [smartpink...@yahoo.com] Sent: Thursday, October 11, 2012 9:14 AM To: Rui Barradas Cc: Nundy, Shantanu; R help Subject: Re: [R] multiple t-tests across similar variable names HI Rui, By running your code, I got the results as: result # MeanDiff CIlower CIupper p.value #apple -12.6 -16.68052 -8.519476 0.0010166626 #banana -15.0 -17.91196 -12.088040 0.0001388506 #orange -18.2 -22.79583 -13.604166 0.0003888560 From my code: res3 # meandifference CIlow CIhigh p.value #apple 12.6 8.519476 16.68052 0.0010166626 #banana 15.0 12.088040 17.91196 0.0001388506 #orange 18.2 13.604166 22.79583 0.0003888560 There is difference in signs. A.K. - Original Message - From: Rui Barradas ruipbarra...@sapo.pt To: arun smartpink...@yahoo.com; Nundy, Shantanu snu...@chicagobooth.edu Cc: R help r-help@r-project.org Sent: Thursday, October 11, 2012 9:25 AM Subject: Re: [R] multiple t-tests across similar variable names Hello, I have a problem, with your data example my results are different. I have changed the names of two of the variables, to allow for 'pre' and 'post' to be first in the names. # auxiliary functions ifswap - function(x) if(x[1] %in% c(pre, post)) x[2:1] else x getpair - function(i, post) post[ which(vmat[post, 1] == vmat[i, 1]) ] makeLine - function(h) c(MeanDiff = unname(h$estimate), CIlower = h$conf.int[1], CIupper = h$conf.int[2], p.value = h$p.value) doTests - function(DF, Pairs){ t.list - lapply( seq_len(nrow(Pairs)), function(i) t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) ) do.call(rbind, lapply(t.list, makeLine)) } # dataset set.seed(432) dat2 - data.frame(apple_pre = sample(10:20,5,replace=TRUE), orange_post = sample(18:28,5,replace=TRUE), pre_banana = sample(25:35,5,replace=TRUE), # here apple_post = sample(20:30,5,replace=TRUE), post_banana = sample(40:50,5,replace=TRUE), # and here orange_pre = sample(5:10,5,replace=TRUE)) # # start processing the data.frame # Make pairs of pre/post columns vars - names(dat2) vmat - do.call(rbind, strsplit(vars, _)) vmat - t(apply(vmat, 1, ifswap)) pre - which(vmat[, 2] == pre) post - which(vmat[, 2] == post) post - sapply(pre, getpair, post) pairs - matrix(c(pre, post), ncol = 2) # now the tests result - doTests(dat2, pairs) rownames(result) - vmat[pre, 1] result In your results I believe that the values for meandifference are the means of x[, 1], at least that's what I've got. Anyway, I'll see both codes again, to try to see what's going on. Hope this helps, Rui Barradas Em 11-10-2012 05:31, arun escreveu: HI, If you have a lot of variables and in no order, then it would be better to order the data by column names. For e.g. set.seed(432) dat2-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE)) dat3-dat2[order(colnames(dat2))] #order the columns list3-list(dat3[,1:2],dat3[,3:4],dat3[,5:6]) res3-do.call(rbind,lapply(lapply(list3,function(x)
[R] plots for presentation
Dear users, I am preparing a presentation in latex(beamer) . I would like to show parts of my plots per click. Example, consider I have two time series x and y: x-ts(rnorm(100), start=1900,end=1999) y-ts(rnorm(100), start=1900,end=1999) plot(x) lines(y,col=2) Then I imported this plot into latex as .eps file. My question is, how can i show plot of each time series separately in sequence (one after the other). An also I want to show parts of the plots at different time segments in my presentation. To be honest, I don't know if these features are in R or in latex. Thanks in advance M. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sorting a data frame by specifying a vector
Hello all, I cannot seem to figure out this seemingly simple procedure. I want to sort a data frame by a specified character vector. So for : df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs= runif(length(rep(c(Summer,Fall,Winter,Spring),4 I want to sort the data frame by the seasons but in the order I specify since alphapetically would not put the season in sequential order I tried the following and a few other things but no dice. It looks like I will have to convert to factors. Any thoughts? Thanks df.. - df..[sort(as.factor(Df..$Season,levels=c(Summer,Fall,Winter,Spring))),] Josh -- View this message in context: http://r.789695.n4.nabble.com/Sorting-a-data-frame-by-specifying-a-vector-tp4645867.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optim and nlminb
a fortune? On 10/11/2012 9:56 AM, John C Nash wrote: snip Indeed in several years on the list, I've never seen a query with a short, testable case fail to get an answer very quickly. JN -- Spencer Graves, PE, PhD President and Chief Technology Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 web: www.structuremonitoring.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting a data frame by specifying a vector
?order df[order(yourcolumn, ] -- Bert On Thu, Oct 11, 2012 at 10:08 AM, LCOG1 jr...@lcog.org wrote: Hello all, I cannot seem to figure out this seemingly simple procedure. I want to sort a data frame by a specified character vector. So for : df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs= runif(length(rep(c(Summer,Fall,Winter,Spring),4 I want to sort the data frame by the seasons but in the order I specify since alphapetically would not put the season in sequential order I tried the following and a few other things but no dice. It looks like I will have to convert to factors. Any thoughts? Thanks df.. - df..[sort(as.factor(Df..$Season,levels=c(Summer,Fall,Winter,Spring))),] Josh -- View this message in context: http://r.789695.n4.nabble.com/Sorting-a-data-frame-by-specifying-a-vector-tp4645867.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help on probability distribution question
Dear All,  I have a questions I would like to ask about and wonder if you have any thoughts to make it work in R.  1. I work in the field of medicine where physiologic variables are often simulated, and they can not have negative values. Most often the assumption is made to simulate this parameters with a normal distribution but in the log-domain to avoid from negative values to be generated. Since the expected mean and SD is usually known from the normal domain, using the methods described in the wikipedia article Arithmetric moments I generate μand Ï and simulate with rlnorm(). At times though the following issue comes up: I have the mean and SD for the parameters available from the normal domain, and the covariance matrix from the normal domain. Then I would like to simulate the values, but to avoid from negative values being generated I have to fall back on rlnorm in {compositions}. My issue is though that my covariance matrix is representing the covariance of the parameters in the normal domain, as opposed to in the lognormal domain. Any thoughts on how to work around this?  apreciate the help,  Andras [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Expected number of events, Andersen-Gill model fit via coxph in package survival
Thank you, Dr. Therneau, that was very helpful. Best regards, Omar. On Mon, Oct 8, 2012 at 9:58 AM, Terry Therneau thern...@mayo.edu wrote: I am interested in producing the expected number of events, in a recurring events setting. I am using the Andersen-Gill model, as fit by the function coxph in the package survival. I need to produce expected numbers of events for a cohort, cumulatively, at several fixed times. My ultimate goal is: To fit an AG model to a reference sample, then use that fitted model to generate expected numbers of events for a new cohort; then, comparing the expected vs. the observed numbers of events would give us some idea of whether the new cohort differs from the reference one. From my reading of the documentation and the text by Therneau and Grambsch, it seems that the function survexp is what I need. But using it I am not able to obtain expected numbers of events that match reasonably well the observed numbers *even for the same reference population.* So, I think I am misunderstanding something quite badly. You've hit a common confusion. Observed versus expected events computations are done on a cumulative hazard scale H, not the surivival scale S; S = exp(-H). Relating this back to simple Poisson models H(t) would be the expected number of events by time t and S(t) the probability of no events before time t. G. Berry (Biometrics 1983) has a classic ane readable article on this (especially if you ignore the proofs). Using your example: cphfit - coxph(Surv(start,stop,event)~rx+number+size+cluster(id),data=bladder2) zz - predict(cphfit, type='expected') c(sum(zz), sum(bladder2$event)) [1] 112 112 tdata - bladder2[1:10] #new data set (lazy way) predict(cphfit, type='expected', newdata=tdata) [1] 0.0324089 0.3226540 0.4213402 1.0560768 0.6702130 0.2163531 0.6490665 [8] 0.8864808 0.2932915 0.5190647 You can also do this using survexp and the cohort=FALSE argument, which would return S(t) for each subject and we would then use -log(result) to get H. This is how it was done when I wrote the book, but the newer predict function is easier. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting a data frame by specifying a vector
Hi, In your dataset, it seems like it is already ordered in the way you wanted to. df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs= runif(length(rep(c(Summer,Fall,Winter,Spring),4 #Suppose the order you want is: vec2-c(Summer,Winter,Fall,Spring) df1-df..[match(df..$Season,vec2),] row.names(df1)-1:nrow(df1) df1 # Season Obs #1 Summer 0.2141001 #2 Winter 0.9318599 #3 Fall 0.6722337 #4 Spring 0.1927715 #5 Summer 0.2141001 #6 Winter 0.9318599 #7 Fall 0.6722337 #8 Spring 0.1927715 #9 Summer 0.2141001 #10 Winter 0.9318599 #11 Fall 0.6722337 #12 Spring 0.1927715 #13 Summer 0.2141001 #14 Winter 0.9318599 #15 Fall 0.6722337 #16 Spring 0.1927715 A.K. - Original Message - From: LCOG1 jr...@lcog.org To: r-help@r-project.org Cc: Sent: Thursday, October 11, 2012 1:08 PM Subject: [R] Sorting a data frame by specifying a vector Hello all, I cannot seem to figure out this seemingly simple procedure. I want to sort a data frame by a specified character vector. So for : df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs= runif(length(rep(c(Summer,Fall,Winter,Spring),4 I want to sort the data frame by the seasons but in the order I specify since alphapetically would not put the season in sequential order I tried the following and a few other things but no dice. It looks like I will have to convert to factors. Any thoughts? Thanks df.. - df..[sort(as.factor(Df..$Season,levels=c(Summer,Fall,Winter,Spring))),] Josh -- View this message in context: http://r.789695.n4.nabble.com/Sorting-a-data-frame-by-specifying-a-vector-tp4645867.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on probability distribution question
On 11-Oct-2012 17:22:44 Andras Farkas wrote: Dear All, I have a questions I would like to ask about and wonder if you have any thoughts to make it work in R. 1. I work in the field of medicine where physiologic variables are often simulated, and they can not have negative values. Most often the assumption is made to simulate this parameters with a normal distribution but in the log-domain to avoid from negative values to be generated. Since the expected mean and SD is usually known from the normal domain, using the methods described in the wikipedia article Arithmetric moments I generate μand Ï and simulate with rlnorm(). At times though the following issue comes up: I have the mean and SD for the parameters available from the normal domain, and the covariance matrix from the normal domain. Then I would like to simulate the values, but to avoid from negative values being generated I have to fall back on rlnorm in {compositions}. My issue is though that my covariance matrix is representing the covariance of the parameters in the normal domain, as opposed to in the lognormal domain. Any thoughts on how to work around this? apreciate the help, Andras If I understand your question correctly, if Y is the variable being simulated then you know the mean (M, say) and the variance (V, say) of log(Y). So you can simulate X from a normal distribution with mean M and variance V = S^2 (S = SD of X), and then Y = exp(X): Y - exp(rnorm(n,M,S)) where n is the number of sampled values you want. When Y is multivariate, with M the vector of means and V the covariance matrix of log(Y), then use a similar approach with the function mvrnorm() from the MASS package: library(MASS) Y - mvrnorm(n,M,V) Does this help? Ted. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 11-Oct-2012 Time: 18:51:47 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replacing ugly for loops
Sorry, you **did** supply data and my solution **does** work (except I left off 1 closing ) . sq.n - seq_len(nrow(data.df)) tapply(sq.n,data.df$seq,function(x)with(data.df[x,], + sort(unique(do.call(c,mapply(seq,from=startNo,length=len,SIMPLIFY=FALSE)) $`1` [1] 3 4 5 6 10 11 $`2` [1] 3 4 5 6 7 15 16 17 Cheers, Bert On Wed, Oct 10, 2012 at 10:59 PM, Bert Gunter bgun...@gene.com wrote: I am not sure you have expressed what you wanjt to do correctly. See inline: On Wed, Oct 10, 2012 at 9:10 PM, andrewH ahoer...@rprogress.org wrote: I have a couple of hundred American Community Survey Summary Files files containing rectangular arrays of data, mainly though not exclusively numeric. Each file is referred to as a sequence (henceforth seq). -- so 1 seq (terrible identifier -- see below for why) = 1 file From these files I am trying to extract particular subsets (tables) consisting of a sets of columns. These tables are defined by three numbers (now in columns in a data frame): 1. a file identifier (seq) 2. first column position numbers (startNo) 3. length of table (len) So your data frame, call it yourframe, has columns named: seq startNo len so the columns to select for one triple would consist of startNo:(startNo+length-1). I am trying to create for each sequence a vector of all the column numbers for tables in that sequence. So for each seq id you want to find all the column numbers, right? sq.n - seq_len(nrow(yourframe)) ## Just to make it easier to read colms - tapply(sq.n, yourframe$seq,function(x) with(yourframe[x,], sort(unique(do.call(c, mapply(seq, from=startNo, length=len,SIMPLIFY = FALSE) ## Comments In the mapply call, seq is the R function, ?seq. That's why using it as a name for a file id is terrible -- it causes confusion. In the absence of data, this is untested -- and probably not quite right. But it should be close, I hope. The key idea is the use of mapply to get the sequence of columns for each row in all the rows for each seq id. The SIMPLIFY = FALSE guarantees that this yields a list of vectors of column indices, which are then glopped together and cleaned up by the sort(unique(do.call( ... stuff. colms should then be a list giving the sorted column numbers to choose for each seq id. I do not know whether (once cleaned up,) this is either more elegant or more efficient than what you proposed. And I wouldn't be surprised if someone like Bill Dunlap comes up with a lot better way, either. But it is different -- and perhaps amusing. ... If I have properly understood what you wanted. If not, ignore all. Cheers, Bert Obviously I could do this with nested for loops,e.g.. seq - c(1,1,2,2) startNo - c(3, 10, 3, 15) len - c(4, 2, 5, 3) data.df - data.frame(seq, startNo, len) seq.f - factor(data.df$seq) data.l - split(data.df, seq.f) selectColsList- vector(list, length(levels(seq.f))) for (i in seq_along(levels(seq.f))){ selectCols - numeric() for (j in seq_along(data.l[[i]]$startNo)){ selectCols - c(selectCols, data.l[[i]]$startNo[j]:(data.l[[i]]$startNo[j] data.l[[i]]$len[j]-1)) } selectColsList[[i]] - selectCols } selectColsList [[1]] [1] 3 4 5 6 10 11 [[2]] [1] 3 4 5 6 7 15 16 17 But this code strikes me as inelegant and verbose. It seems to me that there ought to be a way to make the outer loop, (indexed with i) into a tapply function (which is why I started with a split()), and the inner loop (indexed with j) into some cute recursive function, but I was not able to do so. If anyone could suggest some nicer (e.g. shorter, or faster, or just more sophisticated) way to do this instead, I would be most grateful. Sincerely, andrewH -- View this message in context: http://r.789695.n4.nabble.com/replacing-ugly-for-loops-tp4645821.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting a data frame by specifying a vector
Sorry if I wasn't clear but the result I am looking for is as follows # Season Obs #1 Summer 0.2141001 #5 Summer 0.2141001 #9 Summer 0.2141001 #13 Summer 0.2141001 #3 Fall 0.6722337 #7 Fall 0.6722337 #11 Fall 0.6722337 #15 Fall 0.6722337 #2 Winter 0.9318599 #6 Winter 0.9318599 #10 Winter 0.9318599 #14 Winter 0.9318599 #4 Spring 0.1927715 #8 Spring 0.1927715 #12 Spring 0.1927715 #16 Spring 0.1927715 The process you describe does not get me there Any other recommendations? -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Thursday, October 11, 2012 10:33 AM To: ROLL Josh F Cc: R help Subject: Re: [R] Sorting a data frame by specifying a vector Hi, In your dataset, it seems like it is already ordered in the way you wanted to. df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs= runif(length(rep(c(Summer,Fall,Winter,Spring),4 #Suppose the order you want is: vec2-c(Summer,Winter,Fall,Spring) df1-df..[match(df..$Season,vec2),] row.names(df1)-1:nrow(df1) df1 # Season Obs #1 Summer 0.2141001 #2 Winter 0.9318599 #3 Fall 0.6722337 #4 Spring 0.1927715 #5 Summer 0.2141001 #6 Winter 0.9318599 #7 Fall 0.6722337 #8 Spring 0.1927715 #9 Summer 0.2141001 #10 Winter 0.9318599 #11 Fall 0.6722337 #12 Spring 0.1927715 #13 Summer 0.2141001 #14 Winter 0.9318599 #15 Fall 0.6722337 #16 Spring 0.1927715 A.K. - Original Message - From: LCOG1 jr...@lcog.org To: r-help@r-project.org Cc: Sent: Thursday, October 11, 2012 1:08 PM Subject: [R] Sorting a data frame by specifying a vector Hello all, I cannot seem to figure out this seemingly simple procedure. I want to sort a data frame by a specified character vector. So for : df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs= runif(length(rep(c(Summer,Fall,Winter,Spring),4 I want to sort the data frame by the seasons but in the order I specify since alphapetically would not put the season in sequential order I tried the following and a few other things but no dice. It looks like I will have to convert to factors. Any thoughts? Thanks df.. - df..[sort(as.factor(Df..$Season,levels=c(Summer,Fall,Winter,Spring))),] Josh -- View this message in context: http://r.789695.n4.nabble.com/Sorting-a-data-frame-by-specifying-a-vector-tp4645867.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plots for presentation
On 11/10/2012 1:08 PM, mamush bukana wrote: Dear users, I am preparing a presentation in latex(beamer) . I would like to show parts of my plots per click. Example, consider I have two time series x and y: x-ts(rnorm(100), start=1900,end=1999) y-ts(rnorm(100), start=1900,end=1999) plot(x) lines(y,col=2) Then I imported this plot into latex as .eps file. My question is, how can i show plot of each time series separately in sequence (one after the other). An also I want to show parts of the plots at different time segments in my presentation. To be honest, I don't know if these features are in R or in latex. Mostly Latex/Beamer. Draw the two versions of the plot, and tell beamer to show the first one only on overlay 1, the second only on overlay 2. This is particularly easy using Sweave, because you can save the code that drew the first plot and re-use it in the second. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survey package question
Hello, I have got a cluster sample using an election dataset where I already had the final results of a county-specific election. I am trying to figure out what would be the best sampling design for my data. The structure of the dataset is: 1) polling station (in general schools where people vote, for a county, for example, there are 15 polling stations) 2) inside each polling station, there are voting units, where people actually vote (on average there are about 40 voting units for polling station) 3) for each voting unit I have the total votes by candidate (e.g., candidate 1 =322, candidate 2=122, candidate 3= 89) The initial sampling design is: 1) selection of 5 polling stations PPS (based on number of voters) 2) selection of 10 voting units (SRS) I am interested in estimating the proportion of votes by candidate (let's assume we have 3 candidates). My naive estimate would be: votes for candidate 1 / all valid votes = proportion e.g. candidate 1= 2132 / 10874= .1906 candidate 2= 5323 / 10874= .4895 candidate 3= 3419 / 10874= .3144 In this case, the unit of analysis is voters (or votes). If I specify the sampling design using the survey package in this way... design -svydesign(id=~station + unit fpc=~probstation +probunit, data=sample, pps=brewer) svyciprop(~I(candidate1/totalVotes), design) ... I am assuming that the unit of analysis is the voting unit, right? and I am estimating an average among voting units? I should expand my database at individual level (voters), or I just have to include a unit weight according to the number of voters for voting unit? In other words, is there a way to estimate, for instance, votes for candidate 1 / all valid votes = proportion, directly from the survey package or I have to expand the database at people level (voters), and then estimate the proportion using svymean and the respective design. I would appreciate any advice or help. Sebastian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting a data frame by specifying a vector
On Thu, Oct 11, 2012 at 10:43 AM, ROLL Josh F jr...@lcog.org wrote: Sorry if I wasn't clear. Actually, my bad -- I didn't read carefully enough. But the answer is still essentially correct -- just change the ordering of the levels of Season, which, by default, is alphabetic. df$Season - factor(df$Season, lev = c(Summer,Fall,Winter,Spring)) df - df[order(df$Season),] Learn about factors (Read the Intro to R tutorial if you haven't already). They are very handy (and much despised by some). -- Bert The result I am looking for would be something like: # Season Obs #1 Summer 0.2141001 #5 Summer 0.2141001 #9 Summer 0.2141001 #13 Summer 0.2141001 #3Fall 0.6722337 #7Fall 0.6722337 #11 Fall 0.6722337 #15 Fall 0.6722337 #2 Winter 0.9318599 #6 Winter 0.9318599 #10 Winter 0.9318599 #14 Winter 0.9318599 #4 Spring 0.1927715 #8 Spring 0.1927715 #12 Spring 0.1927715 #16 Spring 0.1927715 Any other thoughts? JR -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Thursday, October 11, 2012 10:19 AM To: ROLL Josh F Cc: r-help@r-project.org Subject: Re: [R] Sorting a data frame by specifying a vector ?order df[order(yourcolumn, ] -- Bert On Thu, Oct 11, 2012 at 10:08 AM, LCOG1 jr...@lcog.org wrote: Hello all, I cannot seem to figure out this seemingly simple procedure. I want to sort a data frame by a specified character vector. So for : df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs= runif(length(rep(c(Summer,Fall,Winter,Spring),4 I want to sort the data frame by the seasons but in the order I specify since alphapetically would not put the season in sequential order I tried the following and a few other things but no dice. It looks like I will have to convert to factors. Any thoughts? Thanks df.. - df..[sort(as.factor(Df..$Season,levels=c(Summer,Fall,Winter,Spr ing))),] Josh -- View this message in context: http://r.789695.n4.nabble.com/Sorting-a-data-frame-by-specifying-a-vec tor-tp4645867.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting a data frame by specifying a vector
HI, In this case, specifying the factor levels would be easier. Try this: set.seed(1) df - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs= runif(length(rep(c(Summer,Fall,Winter,Spring),4 df1-within(df,{Season-factor(Season,levels=c(Summer,Fall,Winter,Spring))}) library(plyr) df2-ddply(df1,.(Season),function(x) x) df2 # Season Obs #1 Summer 0.26550866 #2 Summer 0.20168193 #3 Summer 0.62911404 #4 Summer 0.68702285 #5 Fall 0.37212390 #6 Fall 0.89838968 #7 Fall 0.06178627 #8 Fall 0.38410372 #9 Winter 0.57285336 #10 Winter 0.94467527 #11 Winter 0.20597457 #12 Winter 0.76984142 #13 Spring 0.90820779 #14 Spring 0.66079779 #15 Spring 0.17655675 #16 Spring 0.49769924 Just curious, in your reply, the Obs column has only 4 values. Do you want to get the means??? A.K. - Original Message - From: ROLL Josh F jr...@lcog.org To: 'arun' smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Thursday, October 11, 2012 1:42 PM Subject: RE: [R] Sorting a data frame by specifying a vector Sorry if I wasn't clear but the result I am looking for is as follows # Season Obs #1 Summer 0.2141001 #5 Summer 0.2141001 #9 Summer 0.2141001 #13 Summer 0.2141001 #3 Fall 0.6722337 #7 Fall 0.6722337 #11 Fall 0.6722337 #15 Fall 0.6722337 #2 Winter 0.9318599 #6 Winter 0.9318599 #10 Winter 0.9318599 #14 Winter 0.9318599 #4 Spring 0.1927715 #8 Spring 0.1927715 #12 Spring 0.1927715 #16 Spring 0.1927715 The process you describe does not get me there Any other recommendations? -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Thursday, October 11, 2012 10:33 AM To: ROLL Josh F Cc: R help Subject: Re: [R] Sorting a data frame by specifying a vector Hi, In your dataset, it seems like it is already ordered in the way you wanted to. df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs= runif(length(rep(c(Summer,Fall,Winter,Spring),4 #Suppose the order you want is: vec2-c(Summer,Winter,Fall,Spring) df1-df..[match(df..$Season,vec2),] row.names(df1)-1:nrow(df1) df1 # Season Obs #1 Summer 0.2141001 #2 Winter 0.9318599 #3 Fall 0.6722337 #4 Spring 0.1927715 #5 Summer 0.2141001 #6 Winter 0.9318599 #7 Fall 0.6722337 #8 Spring 0.1927715 #9 Summer 0.2141001 #10 Winter 0.9318599 #11 Fall 0.6722337 #12 Spring 0.1927715 #13 Summer 0.2141001 #14 Winter 0.9318599 #15 Fall 0.6722337 #16 Spring 0.1927715 A.K. - Original Message - From: LCOG1 jr...@lcog.org To: r-help@r-project.org Cc: Sent: Thursday, October 11, 2012 1:08 PM Subject: [R] Sorting a data frame by specifying a vector Hello all, I cannot seem to figure out this seemingly simple procedure. I want to sort a data frame by a specified character vector. So for : df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs= runif(length(rep(c(Summer,Fall,Winter,Spring),4 I want to sort the data frame by the seasons but in the order I specify since alphapetically would not put the season in sequential order I tried the following and a few other things but no dice. It looks like I will have to convert to factors. Any thoughts? Thanks df.. - df..[sort(as.factor(Df..$Season,levels=c(Summer,Fall,Winter,Spring))),] Josh -- View this message in context: http://r.789695.n4.nabble.com/Sorting-a-data-frame-by-specifying-a-vector-tp4645867.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting a data frame by specifying a vector
I'm pretty sure you were already given the answer: order() in conjunction with a factor with the level in an order you specify. mydf$Season - factor(mydf$Season, levels=c(Summer,Fall,Winter,Spring)) mydf[order(mydf$Season),] Thanks for making sure to include the context in your replies. Sarah On Thu, Oct 11, 2012 at 1:42 PM, ROLL Josh F jr...@lcog.org wrote: Sorry if I wasn't clear but the result I am looking for is as follows # Season Obs #1 Summer 0.2141001 #5 Summer 0.2141001 #9 Summer 0.2141001 #13 Summer 0.2141001 #3Fall 0.6722337 #7Fall 0.6722337 #11 Fall 0.6722337 #15 Fall 0.6722337 #2 Winter 0.9318599 #6 Winter 0.9318599 #10 Winter 0.9318599 #14 Winter 0.9318599 #4 Spring 0.1927715 #8 Spring 0.1927715 #12 Spring 0.1927715 #16 Spring 0.1927715 The process you describe does not get me there Any other recommendations? -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Thursday, October 11, 2012 10:33 AM To: ROLL Josh F Cc: R help Subject: Re: [R] Sorting a data frame by specifying a vector Hi, In your dataset, it seems like it is already ordered in the way you wanted to. df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs= runif(length(rep(c(Summer,Fall,Winter,Spring),4 #Suppose the order you want is: vec2-c(Summer,Winter,Fall,Spring) df1-df..[match(df..$Season,vec2),] row.names(df1)-1:nrow(df1) df1 # Season Obs #1 Summer 0.2141001 #2 Winter 0.9318599 #3Fall 0.6722337 #4 Spring 0.1927715 #5 Summer 0.2141001 #6 Winter 0.9318599 #7Fall 0.6722337 #8 Spring 0.1927715 #9 Summer 0.2141001 #10 Winter 0.9318599 #11 Fall 0.6722337 #12 Spring 0.1927715 #13 Summer 0.2141001 #14 Winter 0.9318599 #15 Fall 0.6722337 #16 Spring 0.1927715 A.K. - Original Message - From: LCOG1 jr...@lcog.org To: r-help@r-project.org Cc: Sent: Thursday, October 11, 2012 1:08 PM Subject: [R] Sorting a data frame by specifying a vector Hello all, I cannot seem to figure out this seemingly simple procedure. I want to sort a data frame by a specified character vector. So for : df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs= runif(length(rep(c(Summer,Fall,Winter,Spring),4 I want to sort the data frame by the seasons but in the order I specify since alphapetically would not put the season in sequential order I tried the following and a few other things but no dice. It looks like I will have to convert to factors. Any thoughts? Thanks df.. - df..[sort(as.factor(Df..$Season,levels=c(Summer,Fall,Winter,Spring))),] Josh -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Friedman test for replicated blocked data
It looks like friedman in agricolae package handles replicates by averaging and then doing the unreplicated Freidman analysis. Any pointers to the fully replicated analysis, given, for ex., in Conover, Practical Nonparametric Statistics (3rd Edn.), pp 383f? Thanks, John -- View this message in context: http://r.789695.n4.nabble.com/Friedman-test-for-replicated-blocked-data-tp798293p4645875.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] bug tracker broken
Hi, I get a 404 page not found on the root. There is not webmaster link on r-project.org that I can see. Whom should I contact? Thanks Antonio PS: Yes I was trying to report my first bug. It's a conspiracy with p 0.01. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Repeating a series of commands
I'm trying to figure out how to repeat a series of commands in R and have the outputs added to a dataframe after each iteration. My code starts this way... a-read.csv(File1.csv) b-read.csv(File2.csv) a$Z-ifelse(a$Z==L,sample(1:4,length(a$Z),replace=TRUE),ifelse(a$Z==M,sample(5:8,length(a$Z),replace=TRUE),ifelse(a$Z==U,sample(9:10,length(a$Z),replace=TRUE),))) a$Z-as.numeric(a$Z) b$Z-ifelse(b$Z==L,sample(1:4,length(b$Z),replace=TRUE),ifelse(b$Z==M,sample(5:8,length(b$Z),replace=TRUE),ifelse(b$Z==U,sample(9:10,length(b$Z),replace=TRUE),))) b$Z-as.numeric(b$Z) This is basically just starting off with a new and partially random data set every time that then goes through a bunch of other commands (not shown) and ends with the following outputs saved. Output1, Output2, Output3, Output4 where each of these is just a single number. My questions is: 1. How do I repeat the entire series of commands x number of times and save each of the outputs into a structure like this: Output1 Output2 Output3 Output4 Iteration 1 Iteration 2 Iteration 3 etc. Not even sure where to start. Are loops the answer? Thanks, -- View this message in context: http://r.789695.n4.nabble.com/Repeating-a-series-of-commands-tp4645881.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Formatting data for bootstrapping for confidence intervals
Hello, To aggregate the data use, yes, it's exists, function aggregate. with(dat, aggregate(cbind(X, Xn, Y), list(Area, DATE), FUN = sum)) # output Group.1 Group.2 X Xn Y 1 1 1/10/10 1 1 0 2 1 1/11/10 0 0 1 3 1 1/12/10 3 0 0 4 2 2/12/10 2 1 1 5 2 2/13/10 3 0 0 6 2 2/14/10 4 1 0 7 3 7/27/11 3 1 0 8 3 7/28/11 7 2 2 9 3 7/29/11 3 1 0 And take a look at package boot. Maybe you'll find something there. Hope this helps, Rui Barradas Em 11-10-2012 16:55, Paul Wennekes escreveu: Hi all, New to R, so this may be obvious to some. I've been trying to figure this out for a while, I have a dataset events that looks something like this: AreaNAMEDATEX Xn Y 1 X 1/10/10 1 1 0 1 Y 1/11/10 0 0 1 1 X 1/12/10 1 0 0 1 X 1/12/10 1 0 0 1 X 1/12/10 1 0 0 2 X 2/12/10 1 1 0 2 X 2/12/10 1 0 0 2 Y 2/12/10 0 0 1 2 X 2/13/10 1 0 0 2 X 2/13/10 1 0 0 2 X 2/13/10 1 0 0 2 X 2/14/10 1 0 0 2 X 2/14/10 1 0 0 2 X 2/14/10 1 1 0 2 X 2/14/10 1 0 0 3 X 7/27/11 1 0 0 3 X 7/27/11 1 1 0 3 X 7/27/11 1 0 0 3 X 7/28/11 1 0 0 3 X 7/28/11 1 1 0 3 X 7/28/11 1 0 0 3 X 7/28/11 1 0 0 3 Y 7/28/11 0 0 1 3 X 7/28/11 1 0 0 3 X 7/28/11 1 1 0 3 Y 7/28/11 0 0 1 3 X 7/28/11 1 0 0 3 X 7/29/11 1 0 0 3 X 7/29/11 1 0 0 3 X 7/29/11 1 1 0 X and Y are events. Every row represents a single event happening, with a 1 indicating which one happens at that time. Xn indicates X happening at night. I want to bootstrap these events over days but I think I need to summarize them first, ie. get something that looks like this: AreaDATEX Xn Y 1 1/10/10 1 1 0 1 1/11/10 0 0 1 1 1/12/10 3 0 0 2 2/12/10 2 1 1 etc. and then for each Area, bootstrap the data over the days. Any ideas? I've tried using the 'reshape' package but I don't know how to sum over parts of the columns as defined by the DATE values... Many thanks ahead! -- View this message in context: http://r.789695.n4.nabble.com/Formatting-data-for-bootstrapping-for-confidence-intervals-tp4645860.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Repeating a series of commands
encapsulate them into a function and call the function ?? -- Bert On Thu, Oct 11, 2012 at 11:09 AM, KoopaTrooper ncoop...@tulane.edu wrote: I'm trying to figure out how to repeat a series of commands in R and have the outputs added to a dataframe after each iteration. My code starts this way... a-read.csv(File1.csv) b-read.csv(File2.csv) a$Z-ifelse(a$Z==L,sample(1:4,length(a$Z),replace=TRUE),ifelse(a$Z==M,sample(5:8,length(a$Z),replace=TRUE),ifelse(a$Z==U,sample(9:10,length(a$Z),replace=TRUE),))) a$Z-as.numeric(a$Z) b$Z-ifelse(b$Z==L,sample(1:4,length(b$Z),replace=TRUE),ifelse(b$Z==M,sample(5:8,length(b$Z),replace=TRUE),ifelse(b$Z==U,sample(9:10,length(b$Z),replace=TRUE),))) b$Z-as.numeric(b$Z) This is basically just starting off with a new and partially random data set every time that then goes through a bunch of other commands (not shown) and ends with the following outputs saved. Output1, Output2, Output3, Output4 where each of these is just a single number. My questions is: 1. How do I repeat the entire series of commands x number of times and save each of the outputs into a structure like this: Output1 Output2 Output3 Output4 Iteration 1 Iteration 2 Iteration 3 etc. Not even sure where to start. Are loops the answer? Thanks, -- View this message in context: http://r.789695.n4.nabble.com/Repeating-a-series-of-commands-tp4645881.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on probability distribution question
Ted,  thanks for the answer. I actually think I have it the other way around. Let me give you an example:  1. I know the mean parameter value of a variable (V), lets call it M with a value of 5, and I also know the SD, let us call it SD with a value of 3: #V M -5 SD -3  2. Usually in case there is no known covariance with another parameter and in order to avoid from negative values being generated I would do the following: calculate mu and sigma: mu -log(M)-0.5*log(1+SD^2/(M^2)) sigma -sqrt(log(1+SD^2/(M^2)))  3. then I would simulate: Y -rlnorm(5000,mu,sigma) then do mean(Y) sd(Y)  with resulting values of 4.968 for mean and 2.923 for SD, which I am reasonably happy with.  At times though I have a multivariate situation on my hands where I know V with M and SD from above and additional V1 with M1 and SD1, and V2 with M2 and SD2, example:  #VM -5 SD -3  #V1 M1 -8 SD -4  #V2 M -12 SD -6  in addition to knowing this information I also have a covariance matrix available for these 3 parameters. Based on my previous experience with using mvrnorm, if I do what is suggested below then I will generate negative values, which is no good news for me. In the mean time simply calculating mu and sigma again and then simulate all 3 variables independently as above in 1. would not be apropriate because that would not take into consideration the known covariance between parameters. Hope this example makes my qestion more clear, and any thoughts would be apreciated  thanks,  Andras From: ted.hard...@wlandres.net ted.hard...@wlandres.net To: r-help@r-project.org r-help@r-project.org Sent: Thursday, October 11, 2012 1:51 PM Subject: RE: [R] Help on probability distribution question On 11-Oct-2012 17:22:44 Andras Farkas wrote: Dear All, I have a questions I would like to ask about and wonder if you have any thoughts to make it work in R. 1. I work in the field of medicine where physiologic variables are often simulated, and they can not have negative values. Most often the assumption is made to simulate this parameters with a normal distribution but in the log-domain to avoid from negative values to be generated. Since the expected mean and SD is usually known from the normal domain, using the methods described in the wikipedia article Arithmetric moments I generate üand ÃÆ and simulate with rlnorm(). At times though the following issue comes up: I have the mean and SD for the parameters available from the normal domain, and the covariance matrix from the normal domain. Then I would like to simulate the values, but to avoid from negative values being generated I have to fall back on rlnorm in {compositions}. My issue is though that my covariance matrix is representing the covariance of the parameters in the normal domain, as opposed to in the lognormal domain. Any thoughts on how to work around this? apreciate the help, Andras If I understand your question correctly, if Y is the variable being simulated then you know the mean (M, say) and the variance (V, say) of log(Y). So you can simulate X from a normal distribution with mean M and variance V = S^2 (S = SD of X), and then Y = exp(X):  Y - exp(rnorm(n,M,S)) where n is the number of sampled values you want. When Y is multivariate, with M the vector of means and V the covariance matrix of log(Y), then use a similar approach with the function mvrnorm() from the MASS package:  library(MASS)  Y - mvrnorm(n,M,V) Does this help? Ted. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 11-Oct-2012 Time: 18:51:47 This message was sent by XFMail - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on probability distribution question
(I made a slip with the mulstivariate case below: see at [***]) On 11-Oct-2012 17:51:51 Ted Harding wrote: On 11-Oct-2012 17:22:44 Andras Farkas wrote: Dear All, I have a questions I would like to ask about and wonder if you have any thoughts to make it work in R. 1. I work in the field of medicine where physiologic variables are often simulated, and they can not have negative values. Most often the assumption is made to simulate this parameters with a normal distribution but in the log-domain to avoid from negative values to be generated. Since the expected mean and SD is usually known from the normal domain, using the methods described in the wikipedia article Arithmetric moments I generate μand Ï and simulate with rlnorm(). At times though the following issue comes up: I have the mean and SD for the parameters available from the normal domain, and the covariance matrix from the normal domain. Then I would like to simulate the values, but to avoid from negative values being generated I have to fall back on rlnorm in {compositions}. My issue is though that my covariance matrix is representing the covariance of the parameters in the normal domain, as opposed to in the lognormal domain. Any thoughts on how to work around this? apreciate the help, Andras If I understand your question correctly, if Y is the variable being simulated then you know the mean (M, say) and the variance (V, say) of log(Y). So you can simulate X from a normal distribution with mean M and variance V = S^2 (S = SD of X), and then Y = exp(X): Y - exp(rnorm(n,M,S)) where n is the number of sampled values you want. When Y is multivariate, with M the vector of means and V the covariance matrix of log(Y), then use a similar approach with the function mvrnorm() from the MASS package: [***] ## library(MASS) ## Y - mvrnorm(n,M,V) library(MASS) Y - exp(mvrnorm(n,M,V)) Does this help? Ted. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 11-Oct-2012 Time: 18:51:47 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 11-Oct-2012 Time: 19:44:02 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Course: Data exploration, regression, GLM GAM with R introduction
We would like to announce the following statistics course: Data exploration, regression, GLM GAM. With introduction to R When: 4 - 8 February 2013. Where: Coimbra, Portugal. For details, see: http://www.highstat.com/statscourse.htm Course flyer: http://www.highstat.com/Courses/Flyer2013FebCoimbra.pdf Kind regards, Alain Zuur -- Dr. Alain F. Zuur First author of: 1. Analysing Ecological Data (2007). Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p. URL: www.springer.com/0-387-45967-7 2. Mixed effects models and extensions in ecology with R. (2009). Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer. http://www.springer.com/life+sci/ecology/book/978-0-387-87457-9 3. A Beginner's Guide to R (2009). Zuur, AF, Ieno, EN, Meesters, EHWG. Springer http://www.springer.com/statistics/computational/book/978-0-387-93836-3 4. Zero Inflated Models and Generalized Linear Mixed Models with R. (2012) Zuur, Saveliev, Ieno. http://www.highstat.com/book4.htm Other books: http://www.highstat.com/books.htm Statistical consultancy, courses, data analysis and software Highland Statistics Ltd. 6 Laverock road UK - AB41 6FN Newburgh Tel: 0044 1358 788177 Email: highs...@highstat.com URL: www.highstat.com URL: www.brodgar.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple t-tests across similar variable names
HI Shantanu, I saw your reply to Rui regarding multiple underscores in Nabble: (Actually, I see now that part of the problem is that many of the names have multiple underscores such as red_apple_pre or post_banana_organic. I think this is causing a problem for this line in your code:) I wasn't aware of that problem. In that case, try this: set.seed(432) dat2-data.frame(red_apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),pre_banana_organic=sample(25:35,5,replace=TRUE),post_apple=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE)) nam1-c(apple,orange,banana) nam2-c(pre,post) colnames(dat2)-unlist(lapply(lapply(strsplit(colnames(dat2),_),function(x) x[x%in%nam1|x%in%nam2]),function(x) paste(x[1],x[2],sep=_))) colnames(dat2)-gsub(^pre\\_(.*),\\1_pre,gsub(^post\\_(.*),\\1_post,colnames(dat2))) dat3-t(dat2[order(colnames(dat2))]) dat3-data.frame(varName=gsub((.*)\\_.*,\\1,row.names(dat3)),dat3) list3-lapply(split(dat3,dat3$varName),function(x) t(x[-1])) res3-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value))) res3 # meandifference CIlow CIhigh p.value #apple 12.6 8.519476 16.68052 0.0010166626 #banana 15.0 12.088040 17.91196 0.0001388506 #orange 18.2 13.604166 22.79583 0.0003888560 I hope this works. A.K. - Original Message - From: Nundy, Shantanu snu...@chicagobooth.edu To: arun smartpink...@yahoo.com Cc: Sent: Thursday, October 11, 2012 10:22 AM Subject: RE: [R] multiple t-tests across similar variable names hi Arun, This is very helpful thanks. I'm running into a couple issues: 1. Since some of the variables start with pre_apple and others apple_post sorting the variables doesn't completely put pre-post variables next to each other. 2. I have about 50 variables so typing this line is a bit cumbersome: list3-list(dat3[,1:2],dat3[,3:4],dat3[,5:6]) Thanks, Shantanu From: arun [smartpink...@yahoo.com] Sent: Thursday, October 11, 2012 9:14 AM To: Rui Barradas Cc: Nundy, Shantanu; R help Subject: Re: [R] multiple t-tests across similar variable names HI Rui, By running your code, I got the results as: result # MeanDiff CIlower CIupper p.value #apple -12.6 -16.68052 -8.519476 0.0010166626 #banana -15.0 -17.91196 -12.088040 0.0001388506 #orange -18.2 -22.79583 -13.604166 0.0003888560 From my code: res3 # meandifference CIlow CIhigh p.value #apple 12.6 8.519476 16.68052 0.0010166626 #banana 15.0 12.088040 17.91196 0.0001388506 #orange 18.2 13.604166 22.79583 0.0003888560 There is difference in signs. A.K. - Original Message - From: Rui Barradas ruipbarra...@sapo.pt To: arun smartpink...@yahoo.com; Nundy, Shantanu snu...@chicagobooth.edu Cc: R help r-help@r-project.org Sent: Thursday, October 11, 2012 9:25 AM Subject: Re: [R] multiple t-tests across similar variable names Hello, I have a problem, with your data example my results are different. I have changed the names of two of the variables, to allow for 'pre' and 'post' to be first in the names. # auxiliary functions ifswap - function(x) if(x[1] %in% c(pre, post)) x[2:1] else x getpair - function(i, post) post[ which(vmat[post, 1] == vmat[i, 1]) ] makeLine - function(h) c(MeanDiff = unname(h$estimate), CIlower = h$conf.int[1], CIupper = h$conf.int[2], p.value = h$p.value) doTests - function(DF, Pairs){ t.list - lapply( seq_len(nrow(Pairs)), function(i) t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) ) do.call(rbind, lapply(t.list, makeLine)) } # dataset set.seed(432) dat2 - data.frame(apple_pre = sample(10:20,5,replace=TRUE), orange_post = sample(18:28,5,replace=TRUE), pre_banana = sample(25:35,5,replace=TRUE), # here apple_post = sample(20:30,5,replace=TRUE), post_banana = sample(40:50,5,replace=TRUE), # and here orange_pre = sample(5:10,5,replace=TRUE)) # # start processing the data.frame # Make pairs of pre/post columns vars - names(dat2) vmat - do.call(rbind, strsplit(vars, _)) vmat - t(apply(vmat, 1, ifswap)) pre - which(vmat[, 2] == pre) post - which(vmat[, 2] == post) post - sapply(pre, getpair, post) pairs - matrix(c(pre, post), ncol = 2) # now the tests result - doTests(dat2, pairs) rownames(result) - vmat[pre, 1] result In your results I believe that the values for meandifference are the means of x[, 1], at least that's what I've got. Anyway, I'll see both codes again, to try to see what's going on. Hope this helps, Rui Barradas Em 11-10-2012 05:31, arun escreveu: HI, If you have a lot of variables and in no order, then it would
Re: [R] bug tracker broken
On Oct 11, 2012, at 19:56 , Antonio Piccolboni wrote: Hi, I get a 404 page not found on the root. There is not webmaster link on r-project.org that I can see. Whom should I contact? Thanks The machine hosting the bug tracker is having some issues. Just wait for the dust to settle... - Peter D. Antonio PS: Yes I was trying to report my first bug. It's a conspiracy with p 0.01. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bug tracker broken
On 11.10.2012 19:56, Antonio Piccolboni wrote: Hi, I get a 404 page not found on the root. There is not webmaster link on r-project.org that I can see. Whom should I contact? Thanks Thanks for the note. The servers where the bug tracker is installed are experiencing problems that are known. The people are working on it. Best, Uwe Ligges Antonio PS: Yes I was trying to report my first bug. It's a conspiracy with p 0.01. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] struggling with R2wd or SWord? Try rtf!
I have been looking for a way to write R-generated reports to Microsoft Word documents. In the past, I used the package R2wd, but for some reason I haven't been able to get it to work on my current set up. R version 2.15.0 (64-bit) Windows 7 Enterprise - Service Pack 1 Microsoft Office Professional Plus 2010 - Word version 14.0.6123.5001 (32-bit) I gave the package SWord a try, too. Also, no luck. But, I just recently ran across the package rtf, and it serves my needs quite well. Since some of you may find yourself in a similar situation, I thought I'd spread the word (ha!) about rtf. Below is some introductory code based on examples in http://cran.r-project.org/web/packages/rtf/vignettes/rtf.pdf Give it a try. You may like it! Jean `·.,, (((º `·.,, (((º `·.,, (((º Jean V. Adams Statistician U.S. Geological Survey Great Lakes Science Center 223 East Steinfest Road Antigo, WI 54409 USA http://www.glsc.usgs.gov library(rtf) rtf - RTF(rtf_vignette.doc, width=8.5, height=11, font.size=10, omi=c(1, 1, 1, 1)) addHeader(rtf, title=This text was added with the addHeader() function., subtitle=So was this.) addParagraph(rtf, This text was added with the addParagraph() function. It is a new self-contained paragraph. When Alpha; is greater than beta;, then gamma; is equal to zero.\n) startParagraph(rtf) addText(rtf, This text was added with the startParagraph() and addText() functions. You can insert ) addText(rtf, styled , bold=TRUE, italic=TRUE) addText(rtf, text this way. But, you must end the paragraph manually with the endParagraph() function.\n) endParagraph(rtf) increaseIndent(rtf) addParagraph(rtf, paste(rep(You can indent text with the increaseIndent() function., 4), collapse= )) addNewLine(rtf) decreaseIndent(rtf) addParagraph(rtf, paste(rep(And remove the indent with the decreaseIndent() function., 4), collapse= )) addNewLine(rtf) addNewLine(rtf) addParagraph(rtf, Table 1. Table of the iris data using the addTable() function.\n) tab - table(iris$Species, floor(iris$Sepal.Length)) names(dimnames(tab)) - c(Species, Sepal Length) addTable(rtf, tab, font.size=10, row.names=TRUE, NA.string=-, col.widths=c(1, 0.5, 0.5, 0.5, 0.5) ) newPlot - function() { par(pty=s, cex=0.7) plot(iris[, 1], iris[, 2]) abline(h=2.5, v=6.0, lty=2) } addPageBreak(rtf) addPlot(rtf, plot.fun=newPlot, width=5, height=5, res=300) addNewLine(rtf) addParagraph(rtf, Figure 1. Plot of the iris data using the addPlot() function.\n) addNewLine(rtf) addNewLine(rtf) addSessionInfo(rtf) done(rtf) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survey package question
On Fri, Oct 12, 2012 at 6:56 AM, Sebastián Daza sebastian.d...@gmail.com wrote: Hello, I have got a cluster sample using an election dataset where I already had the final results of a county-specific election. I am trying to figure out what would be the best sampling design for my data. The structure of the dataset is: 1) polling station (in general schools where people vote, for a county, for example, there are 15 polling stations) 2) inside each polling station, there are voting units, where people actually vote (on average there are about 40 voting units for polling station) 3) for each voting unit I have the total votes by candidate (e.g., candidate 1 =322, candidate 2=122, candidate 3= 89) The initial sampling design is: 1) selection of 5 polling stations PPS (based on number of voters) 2) selection of 10 voting units (SRS) I am interested in estimating the proportion of votes by candidate (let's assume we have 3 candidates). My naive estimate would be: votes for candidate 1 / all valid votes = proportion e.g. candidate 1= 2132 / 10874= .1906 candidate 2= 5323 / 10874= .4895 candidate 3= 3419 / 10874= .3144 In this case, the unit of analysis is voters (or votes). If I specify the sampling design using the survey package in this way... design -svydesign(id=~station + unit fpc=~probstation +probunit, data=sample, pps=brewer) svyciprop(~I(candidate1/totalVotes), design) ... I am assuming that the unit of analysis is the voting unit, right? and I am estimating an average among voting units? You want a ratio estimator svyratio(~candidate1, ~totalVotes, design) -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] epiR//Incidence rate//beginner question on syntax
Hi, new in R and I would like to start with calculating an incidence rate. My data is imported into R from a tab delimited txt file, as shown below: ID DATE_BIRTH DATE_UNT EVENT TIME_EV 1 4867 08/02/1959 19/10/2001 1 31 2 52 15/07/1941 08/02/1999 1 6 3 63 02/01/1946 11/02/1999 1 6 4 710 21/10/1965 23/03/1999 010 5 1808 07/05/1952 18/06/1999 17 6 554 19/08/1947 15/03/1999 0 10 ... event (EVENT=1) censoring (EVENT=0) How do I calculate the incidence rate in R for different strata of age? 1) number of events (EVENT) / personmonths to event or censoring (TIME_EV) 2) number of events (EVENT) / 12 personmonths to event or censoring (TIME_EV) I am lost here, and did not succeed to understand epiR, and the option to create a matrix for simple incidence rates, and not incidence rate ratios. If anyone could help me out with a simple to understand syntaxt to get further, I would aprreciate so much and thanks in advance!! nb -- View this message in context: http://r.789695.n4.nabble.com/epiR-Incidence-rate-beginner-question-on-syntax-tp4645896.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems with getURL (RCurl) to obtain list files of an ftp directory
Dear all, I have a problem with the command 'getURL' from the RCurl package, which I have been using to obtain a ftp directory list from the MOD16 (ET, DSI) products, and then to download them. (part of the script by Tomislav Hengl, spatial-analyst). Instead of the list of files (from ftp), I am getting the complete html code. Anyone knows why this might happen? This are the steps i have been doing: MOD16A2.doy- ' ftp://ftp.ntsg.umt.edu/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/' items - strsplit(getURL(MOD16A2.doy, .opts=curlOptions(ftplistonly=TRUE)), \n)[[1]] items #results [1] !DOCTYPE HTML PUBLIC \-//W3C//DTD HTML 4.01 Transitional//EN\ \ http://www.w3.org/TR/html4/loose.dtd\;\n!-- HTML listing generated by Squid 2.7.STABLE9 --\n!-- Wed, 10 Oct 2012 13:43:53 GMT --\nHTMLHEADTITLE\nFTP Directory: ftp://ftp.ntsg.umt.edu/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/\n/TITLE\nSTYLE type=\text/css\!--BODY{background-color:#ff;font-family:verdana,sans-serif}--/STYLE\n/HEADBODY\nH2\nFTP Directory: A HREF=\/\ftp://ftp.ntsg.umt.edu/A/A HREF=\/pub/\pub/A/A HREF=\/pub/MODIS/\MODIS/A/A HREF=\/pub/MODIS/Mirror/\Mirror/A/A HREF=\/pub/MODIS/Mirror/MOD16/\MOD16/A/A HREF=\/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/\MOD16A2.105_MERRAGMAO/A//H2\nPRE\nA HREF=\../\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dirup.gif\; ALT=\[DIRUP]\/A A HREF=\../\Parent Directory/A \nA HREF=\GEOTIFF_0.05degree/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\GEOTIFF_0.05degree/\GEOTIFF_0.05degree/A . . . . . . . Jun 3 18:00\nA HREF=\GEOTIFF_0.5degree/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\GEOTIFF_0.5degree/\GEOTIFF_0.5degree/A. . . . . . . . Jun 3 18:01\nA HREF=\Y2000/\IMG border=\0\ SRC=\http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2000/\Y2000/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2001/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2001/\Y2001/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2002/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2002/\Y2002/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2003/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2003/\Y2003/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2004/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2004/\Y2004/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2005/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2005/\Y2005/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2006/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2006/\Y2006/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2007/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2007/\Y2007/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2008/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2008/\Y2008/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2009/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2009/\Y2009/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2010/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2010/\Y2010/A. . . . . . . . . . . . . . Feb 20 2011\nA HREF=\Y2011/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2011/\Y2011/A. . . . . . . . . . . . . . Mar 12 2012\n/PRE\nHR noshade size=\1px\\nADDRESS\nGenerated Wed, 10 Oct 2012 13:43:53 GMT by localhost (squid/2.7.STABLE9)\n/ADDRESS/BODY/HTML\n The curious is that the command getURL was working well until I don't know what happened. And using the same command in Windows works fine. The sessionInfo() have given me the next: R version 2.14.1 (2011-12-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] MODIS_0.5-8 maptools_0.8-16 lattice_0.20-0
[R] Selecting n observation
Hello R help, I have a question similar to what is posted by someone before. my problem is that Instead of last assessment, I want to choose last two. I have a data set with several time assessments for each participant. I want to select the last assessment for each participant. My dataset looks like this: ID week outcome 1 2 14 1 4 28 1 6 42 4 2 14 4 6 46 4 9 64 4 9 71 4 12 85 9 2 14 9 4 28 9 6 51 9 9 66 9 12 84 Here is one solution for choosing last assessment do.call(rbind, by(df, INDICES=df$ID, FUN=function(DF) DF[which.max(DF$week), ])) ID week outcome 1 16 42 4 4 12 85 9 9 12 84 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] simple parsing question?
I am using the getQuote function in the Quantmod package to retrieve the % change for a stock as follows: getQuote(aapl,what=yahooQF(c(Change Percent (Real-time Trade Time %Change (RT) aapl 2012-10-11 03:41:00 N/A - -1.67% How can I extract the numeric change % which is being returned as a factor so that I can use it in other calculations? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exporting summary plm results to latex
HI Sebastian, Sorry, I found an error in my solution (the values and coefficients got mixed up in sorting). Try this: library(reshape) extract.plm - function(model) { if (!class(model)[1] == plm) { stop(Internal error: Incorrect model type! Should be a plm object!) } zz1-summary(model)$coef[,c(1,2,4)] zz2-as.data.frame(apply(zz1,2,function(x) sprintf(%.3f,x))) zz2[]-sapply(zz2,function(x) as.numeric(as.character(x))) zz3-data.frame(Coefficient=row.names(zz1),zz2) zz3-melt(zz3,by=Coefficient) zz4-within(zz3,{Coefficient-as.character(Coefficient);variable-as.character(variable)}) zz5-ddply(zz4,.(Coefficient),function(x) x) zz5$value[zz5$variable==Estimate]-ifelse(zz5$value[zz5$variable==Pr...t..]0.05 zz5$value[zz5$variable==Pr...t..]=0.01,gsub((.*),\\1*,zz5$value[zz5$variable==Estimate]),ifelse(zz5$value[zz5$variable==Pr...t..]0.01,gsub((.*),\\1**,zz5$value[zz5$variable==Estimate]),zz5$value[zz5$variable==Estimate])) zz5$value[zz5$variable==Std..Error]-gsub((.*),(\\1),zz5$value[zz5$variable==Std..Error]) zz6-zz5[!zz5$variable==Pr...t..,] rownames(zz6)-1:nrow(zz6) res-zz6[,c(1,3)] res } library(plm) data(Produc, package = plm) zz - plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, index = c(state,year)) extract.plm(zz) #Using Coefficient as id variables # Coefficient value #1 log(emp) 0.768** #2 log(emp) (0.03) #3 log(pc) 0.292** #4 log(pc) (0.025) #5 log(pcap) -0.026 #6 log(pcap) (0.029) #7 unemp -0.005** #8 unemp (0.001) summary(zz)$coef # Estimate Std. Error t-value Pr(|t|) #log(pcap) -0.026149654 0.0290015755 -0.9016632 3.675200e-01 #log(pc) 0.292006925 0.0251196728 11.6246309 7.075069e-29 #log(emp) 0.768159473 0.0300917394 25.5272539 2.021455e-104 #unemp -0.005297741 0.0009887257 -5.3581508 1.113946e-07 library(xtable) xtable(extract.plm(zz)) Using Coefficient as id variables % latex table generated in R 2.15.0 by xtable 1.7-0 package % Thu Oct 11 15:28:12 2012 \begin{table}[ht] \begin{center} \begin{tabular}{rll} \hline Coefficient value \\ \hline 1 log(emp) 0.768** \\ 2 log(emp) (0.03) \\ 3 log(pc) 0.292** \\ 4 log(pc) (0.025) \\ 5 log(pcap) -0.026 \\ 6 log(pcap) (0.029) \\ 7 unemp -0.005** \\ 8 unemp (0.001) \\ \hline \end{tabular} \end{center} \end{table} I used this example because your example is a bit restricted in the sense that there was only one independent variable. In that case, some adjustments need to be made in the function: #With your example dataset x - rnorm(270) y - rnorm(270) t - rep(1:3,30) i - rep(1:90, each=3) data - data.frame(i,t,x,y) fe - plm(y~x,data=data,model=within) extract.plm - function(model) { if (!class(model)[1] == plm) { stop(Internal error: Incorrect model type! Should be a plm object!) } tab1 - summary(model)$coef[,1:2] tab1[1]-ifelse(summary(model)$coef[,4]0.05 summary(model)$coef[,4]=0.01, gsub((.*),\\1*,tab1[1]),ifelse(summary(model)$coef[,4]0.01,gsub((.*),\\1**,tab1[1]),tab1[1])) tab2-melt(tab1) row.names(tab2)[2]- tab2-within(tab2,{value=as.character(value)}) tab2[2,1]-gsub((.*),(\\1),sprintf(%.3f,as.numeric(as.character(tab2[2,1] tab2 } extract.plm(fe) xtable(extract.plm(fe)) % latex table generated in R 2.15.0 by xtable 1.7-0 package % Thu Oct 11 15:56:20 2012 \begin{table}[ht] \begin{center} \begin{tabular}{rl} \hline value \\ \hline Estimate -0.154513026282509* \\ (0.074) \\ \hline \end{tabular} \end{center} \end{table} I hope this helps. A.K. - Original Message - From: Sebastian Barfort sb3...@nyu.edu To: Duncan Mackay mac...@northnet.com.au Cc: r-help-r-project.org r-help@r-project.org Sent: Wednesday, October 10, 2012 7:45 PM Subject: Re: [R] Exporting summary plm results to latex I am also interested in the standard errors, but beneath not next to the point estimates which is standard in the xtable package. If you by any chance remember the name of the package or how to do it that would be much appreciated! Cheers, Sebastian On Oct 10, 2012, at 7:10 PM, Duncan Mackay mac...@northnet.com.au wrote: Hi If you just want the coefficients. xtable(summary(fe)$coef) % latex table generated in R 2.15.1 by xtable 1.7-0 package % Thu Oct 11 09:04:59 2012 \begin{table}[ht] \begin{center} \begin{tabular}{r} \hline Estimate Std. Error t-value Pr($$$|$t$|$) \\ \hline x 0.12 0.07 1.78 0.08 \\ \hline \end{tabular} \end{center} \end{table} There is another package whose name eludes me which may help for tables which have different outputs to the output of lm etc HTH Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England Armidale NSW 2351 Email: home: mac...@northnet.com.au At 05:09 11/10/2012, you wrote: HI, May be you can use library(texreg): library(plm) #generating some data x - rnorm(270) y - rnorm(270) t - rep(1:3,30) i - rep(1:90, each=3) data -
Re: [R] Selecting n observation
On 2012-10-11 12:48, bibek sharma wrote: Hello R help, I have a question similar to what is posted by someone before. my problem is that Instead of last assessment, I want to choose last two. I have a data set with several time assessments for each participant. I want to select the last assessment for each participant. My dataset looks like this: ID week outcome 1 2 14 1 4 28 1 6 42 4 2 14 4 6 46 4 9 64 4 9 71 4 12 85 9 2 14 9 4 28 9 6 51 9 9 66 9 12 84 Here is one solution for choosing last assessment do.call(rbind, by(df, INDICES=df$ID, FUN=function(DF) DF[which.max(DF$week), ])) ID week outcome 1 16 42 4 4 12 85 9 9 12 84 With the plyr package: library(plyr) ddply(df, .(ID), function(x) tail(x, 2)) or, slightly simpler: ddply(df, .(ID), tail, 2) Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optim and nlminb
I have already try optimx but I got this error message. How to solve it. fn is Linn Function has 10 arguments par[ 1 ]: 0 ? 0.5 ? 1 In Bounds par[ 2 ]: 0 ? 0.5 ? 1 In Bounds In Bounds par[ 3 ]: 0 ? 0.5 ? 1 In Bounds In Bounds In Bounds par[ 4 ]: -Inf ? 1 ? Inf In Bounds In Bounds In Bounds In Bounds par[ 5 ]: -Inf ? 1 ? Inf In Bounds In Bounds In Bounds In Bounds In Bounds par[ 6 ]: -Inf ? 1 ? Inf In Bounds In Bounds In Bounds In Bounds In Bounds In Bounds par[ 7 ]: -Inf ? 1 ? Inf In Bounds In Bounds In Bounds In Bounds In Bounds In Bounds In Bounds par[ 8 ]: -Inf ? 1 ? Inf In Bounds In Bounds In Bounds In Bounds In Bounds In Bounds In Bounds In Bounds par[ 9 ]: -Inf ? 1 ? Inf In Bounds In Bounds In Bounds In Bounds In Bounds In Bounds In Bounds In Bounds In Bounds par[ 10 ]: -Inf ? 1 ? Inf In Bounds In Bounds In Bounds In Bounds In Bounds In Bounds In Bounds In Bounds In Bounds In Bounds Error in optimx(init.par, Linn, gr = NULL, method = L-BFGS-B, hessian = TRUE, : Function provided is not returning a scalar number Regards, Serdar -- View this message in context: http://r.789695.n4.nabble.com/optim-and-nlminb-tp4645772p4645907.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple parsing question?
qs - getQuote(c(aapl,tibx,gm,badWolf),what=yahooQF(c(Change Percent (Real-time qs Trade Time %Change (RT) aapl2012-10-11 04:00:00 N/A - -2.00% tibx2012-10-11 04:00:00 N/A - -0.85% gm 2012-10-11 04:00:00 N/A - +1.77% badWolfNA N/A - 0.00% as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, as.character(qs[[2]]))) [1] -2.00 -0.85 1.77 0.00 The \\1 in the replacement argument to sub() means the text matched by the first parenthesized subpattern in the pattern argument. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Fuchs Ira Sent: Thursday, October 11, 2012 12:58 PM To: r-help@r-project.org Subject: [R] simple parsing question? I am using the getQuote function in the Quantmod package to retrieve the % change for a stock as follows: getQuote(aapl,what=yahooQF(c(Change Percent (Real-time Trade Time %Change (RT) aapl 2012-10-11 03:41:00 N/A - -1.67% How can I extract the numeric change % which is being returned as a factor so that I can use it in other calculations? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple parsing question?
I'm glad I asked as I would have thought that this was a common requirement and quantmod itself or a simple R function would have done the conversion. You saved me from having to master R's sub function. One remaining thing…when I use your snippet for AAPL, I get: aapl=getQuote(aapl,what=yahooQF(c(Change Percent (Real-time as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, as.character(aapl[[2]]))) [1] -2 not the -2.00 that you got. Do I have a setting that is causing it to not show the significant digits? Thanks. On Oct 11, 2012, at 4:27 PM, William Dunlap wrote: qs - getQuote(c(aapl,tibx,gm,badWolf),what=yahooQF(c(Change Percent (Real-time qs Trade Time %Change (RT) aapl2012-10-11 04:00:00 N/A - -2.00% tibx2012-10-11 04:00:00 N/A - -0.85% gm 2012-10-11 04:00:00 N/A - +1.77% badWolfNA N/A - 0.00% as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, as.character(qs[[2]]))) [1] -2.00 -0.85 1.77 0.00 The \\1 in the replacement argument to sub() means the text matched by the first parenthesized subpattern in the pattern argument. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Fuchs Ira Sent: Thursday, October 11, 2012 12:58 PM To: r-help@r-project.org Subject: [R] simple parsing question? I am using the getQuote function in the Quantmod package to retrieve the % change for a stock as follows: getQuote(aapl,what=yahooQF(c(Change Percent (Real-time Trade Time %Change (RT) aapl 2012-10-11 03:41:00 N/A - -1.67% How can I extract the numeric change % which is being returned as a factor so that I can use it in other calculations? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple parsing question?
But I thought the intention was to turn the string into a number, not into another string. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Thursday, October 11, 2012 1:54 PM To: Fuchs Ira Cc: R help; William Dunlap Subject: Re: [R] simple parsing question? HI, Try this: sprintf(%.2f,as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, as.character(aapl[[2]] #[1] -2.00 A.K. - Original Message - From: Fuchs Ira irafu...@gmail.com To: r-help@r-project.org Cc: Sent: Thursday, October 11, 2012 4:45 PM Subject: Re: [R] simple parsing question? I'm glad I asked as I would have thought that this was a common requirement and quantmod itself or a simple R function would have done the conversion. You saved me from having to master R's sub function. One remaining thing…when I use your snippet for AAPL, I get: aapl=getQuote(aapl,what=yahooQF(c(Change Percent (Real-time as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, as.character(aapl[[2]]))) [1] -2 not the -2.00 that you got. Do I have a setting that is causing it to not show the significant digits? Thanks. On Oct 11, 2012, at 4:27 PM, William Dunlap wrote: qs - getQuote(c(aapl,tibx,gm,badWolf),what=yahooQF(c(Change Percent (Real-time qs Trade Time %Change (RT) aapl 2012-10-11 04:00:00 N/A - -2.00% tibx 2012-10-11 04:00:00 N/A - -0.85% gm 2012-10-11 04:00:00 N/A - +1.77% badWolf NA N/A - 0.00% as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, as.character(qs[[2]]))) [1] -2.00 -0.85 1.77 0.00 The \\1 in the replacement argument to sub() means the text matched by the first parenthesized subpattern in the pattern argument. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Fuchs Ira Sent: Thursday, October 11, 2012 12:58 PM To: r-help@r-project.org Subject: [R] simple parsing question? I am using the getQuote function in the Quantmod package to retrieve the % change for a stock as follows: getQuote(aapl,what=yahooQF(c(Change Percent (Real-time Trade Time %Change (RT) aapl 2012-10-11 03:41:00 N/A - -1.67% How can I extract the numeric change % which is being returned as a factor so that I can use it in other calculations? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple parsing question?
HI, Try this: sprintf(%.2f,as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, as.character(aapl[[2]] #[1] -2.00 A.K. - Original Message - From: Fuchs Ira irafu...@gmail.com To: r-help@r-project.org Cc: Sent: Thursday, October 11, 2012 4:45 PM Subject: Re: [R] simple parsing question? I'm glad I asked as I would have thought that this was a common requirement and quantmod itself or a simple R function would have done the conversion. You saved me from having to master R's sub function. One remaining thing…when I use your snippet for AAPL, I get: aapl=getQuote(aapl,what=yahooQF(c(Change Percent (Real-time as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, as.character(aapl[[2]]))) [1] -2 not the -2.00 that you got. Do I have a setting that is causing it to not show the significant digits? Thanks. On Oct 11, 2012, at 4:27 PM, William Dunlap wrote: qs - getQuote(c(aapl,tibx,gm,badWolf),what=yahooQF(c(Change Percent (Real-time qs Trade Time %Change (RT) aapl 2012-10-11 04:00:00 N/A - -2.00% tibx 2012-10-11 04:00:00 N/A - -0.85% gm 2012-10-11 04:00:00 N/A - +1.77% badWolf NA N/A - 0.00% as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, as.character(qs[[2]]))) [1] -2.00 -0.85 1.77 0.00 The \\1 in the replacement argument to sub() means the text matched by the first parenthesized subpattern in the pattern argument. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Fuchs Ira Sent: Thursday, October 11, 2012 12:58 PM To: r-help@r-project.org Subject: [R] simple parsing question? I am using the getQuote function in the Quantmod package to retrieve the % change for a stock as follows: getQuote(aapl,what=yahooQF(c(Change Percent (Real-time Trade Time %Change (RT) aapl 2012-10-11 03:41:00 N/A - -1.67% How can I extract the numeric change % which is being returned as a factor so that I can use it in other calculations? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple parsing question?
Yes, in my case it would be re-learning regular expressions. Unlike riding a bicycle, this is something I have managed to forget (except for the simplest cases). I even have an old O'reilly book on the subject which I can dust off. I was thinking (hoping?) that quantmod had functions to manipulate the information returned by Yahoo but I guess that is not the case. Anyway thanks to everyone's help, I now know how to proceed. Best, Ira On Oct 11, 2012, at 4:59 PM, Bert Gunter wrote: Just a comment. On Thu, Oct 11, 2012 at 1:45 PM, Fuchs Ira irafu...@gmail.com wrote: I'm glad I asked as I would have thought that this was a common requirement and quantmod itself or a simple R function would have done the conversion. **You saved me from having to master R's sub function.** Actually, it's not R's sub function, it's regular expressions, which are independent of R and used in many other languages for text processing. They also have an interesting history in computer science. You might wish to have a look at Wikipedia's or other source's page on regular expressions to get some background. Depending on the nature of your work, you may also wish to reconsider your avoidance of learning the regular expression syntax, which is, however, a chore. Best, Bert One remaining thing…when I use your snippet for AAPL, I get: aapl=getQuote(aapl,what=yahooQF(c(Change Percent (Real-time as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, as.character(aapl[[2]]))) [1] -2 not the -2.00 that you got. Do I have a setting that is causing it to not show the significant digits? Thanks. On Oct 11, 2012, at 4:27 PM, William Dunlap wrote: qs - getQuote(c(aapl,tibx,gm,badWolf),what=yahooQF(c(Change Percent (Real-time qs Trade Time %Change (RT) aapl2012-10-11 04:00:00 N/A - -2.00% tibx2012-10-11 04:00:00 N/A - -0.85% gm 2012-10-11 04:00:00 N/A - +1.77% badWolfNA N/A - 0.00% as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, as.character(qs[[2]]))) [1] -2.00 -0.85 1.77 0.00 The \\1 in the replacement argument to sub() means the text matched by the first parenthesized subpattern in the pattern argument. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Fuchs Ira Sent: Thursday, October 11, 2012 12:58 PM To: r-help@r-project.org Subject: [R] simple parsing question? I am using the getQuote function in the Quantmod package to retrieve the % change for a stock as follows: getQuote(aapl,what=yahooQF(c(Change Percent (Real-time Trade Time %Change (RT) aapl 2012-10-11 03:41:00 N/A - -1.67% How can I extract the numeric change % which is being returned as a factor so that I can use it in other calculations? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting n observation
On Oct 11, 2012, at 12:48 PM, bibek sharma wrote: Hello R help, I have a question similar to what is posted by someone before. my problem is that Instead of last assessment, I want to choose last two. I have a data set with several time assessments for each participant. I want to select the last assessment for each participant. My dataset looks like this: ID week outcome 1 2 14 1 4 28 1 6 42 4 2 14 4 6 46 4 9 64 4 9 71 4 12 85 9 2 14 9 4 28 9 6 51 9 9 66 9 12 84 Here is one solution for choosing last assessment do.call(rbind, by(df, INDICES=df$ID, FUN=function(DF) DF[which.max(DF$week), ])) Why wouldn't the solution be something along the lines of: do.call(rbind, by(df, INDICES=df$ID, FUN=function(DF) tail(DF, 2) )) ID week outcome 1 16 42 4 4 12 85 9 9 12 84 David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replacing ugly for loops
Dear Bert-- I tried your function on the data that I provided (data.df) and it worked beautifully (after I added a missing final parenthesis), producing exactly the same output as my function. This is an excellent example of what I was looking for, because it is (a) 50% shorter than mine, (b) fully vectorized, and (c) uses three functions that I have never used before: with, unique, and do.call I am going to spend a happy afternoon working through this command by command and at the end I am confident that I will have learned some valuable new ( to me) tricks. Thanks! Warmest Regards, AndrewH -- View this message in context: http://r.789695.n4.nabble.com/replacing-ugly-for-loops-tp4645821p4645914.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replacing ugly for loops
I hate to decline such praise, but honesty demands that I must. In fact, my solution is **not** fully vectorized at all! The tapply() and mapply() calls are, in fact, in some sense hidden loops at the interpreted levels. They do have the virtue of being true to R's functional paradigm, but they are loops, nevertheless. For this reason, they may not be more efficient then the explicit loops you've written. But I hope the code is more transparent. AndI did send a follow-up note to the list both acknowledging my erroneous accusation that you did not provide data and confirming that my proposed solution worked with the example you did, in fact, provide. But thanks for the kind words anyway. -- Bert On Thu, Oct 11, 2012 at 2:16 PM, andrewH ahoer...@rprogress.org wrote: Dear Bert-- I tried your function on the data that I provided (data.df) and it worked beautifully (after I added a missing final parenthesis), producing exactly the same output as my function. This is an excellent example of what I was looking for, because it is (a) 50% shorter than mine, (b) fully vectorized, and (c) uses three functions that I have never used before: with, unique, and do.call I am going to spend a happy afternoon working through this command by command and at the end I am confident that I will have learned some valuable new ( to me) tricks. Thanks! Warmest Regards, AndrewH -- View this message in context: http://r.789695.n4.nabble.com/replacing-ugly-for-loops-tp4645821p4645914.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fonts in *.Rd files.
I wanted to put a certain string in sans serif font in an *.Rd file that I was writing. I tried {\sf ...} and \textsf{...} but both resulted in the warning unknown macro. The manual on Writing R Extensions seems to me to imply that one should be able to invoke such LaTeX macros (section 2.3): Each of the above commands takes LaTeX-like input, so other macros may be used within text. Is there something else I need to do to get this to work? Or some other way to get sans serif? I would appreciate any pointers. cheers, Rolf Turner [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in file(file, rt) : cannot open the connection
Hi, I am using R package QT which call runs alongwith SAS I get this error : Error in file(file, rt) : cannot open the connection I have tried using setwd or running R directly from that directory but still get the same error. Any help would be appreciated setwd(C:\\Documents and Settings\\\\two) data= read.csv(data.csv, header=T) head(data) info - list( saspath=\C:/Program Files/SAS/SASFoundation/9.2, output=C:\\Documents and Settings\\...\\two,device=tiff, ... ) Thanks -- Navin Goyal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Formatting data for bootstrapping for confidence intervals
Hi, Try this: dat1-read.table(text= Area NAME DATE X Xn Y 1 X 1/10/10 1 1 0 1 Y 1/11/10 0 0 1 1 X 1/12/10 1 0 0 1 X 1/12/10 1 0 0 1 X 1/12/10 1 0 0 2 X 2/12/10 1 1 0 2 X 2/12/10 1 0 0 2 Y 2/12/10 0 0 1 2 X 2/13/10 1 0 0 2 X 2/13/10 1 0 0 2 X 2/13/10 1 0 0 2 X 2/14/10 1 0 0 2 X 2/14/10 1 0 0 2 X 2/14/10 1 1 0 2 X 2/14/10 1 0 0 3 X 7/27/11 1 0 0 3 X 7/27/11 1 1 0 3 X 7/27/11 1 0 0 3 X 7/28/11 1 0 0 3 X 7/28/11 1 1 0 3 X 7/28/11 1 0 0 3 X 7/28/11 1 0 0 3 Y 7/28/11 0 0 1 3 X 7/28/11 1 0 0 3 X 7/28/11 1 1 0 3 Y 7/28/11 0 0 1 3 X 7/28/11 1 0 0 3 X 7/29/11 1 0 0 3 X 7/29/11 1 0 0 3 X 7/29/11 1 1 0 ,sep=,header=TRUE,stringsAsFactors=FALSE) #You can either use aggregate(), ddply() from library(plyr) or using library(data.table) library(data.table) dat2-data.table(dat1) dat2[,list(X=sum(X),Xn=sum(Xn),Y=sum(Y)),list(Area,DATE)] # Area DATE X Xn Y #1: 1 1/10/10 1 1 0 #2: 1 1/11/10 0 0 1 #3: 1 1/12/10 3 0 0 #4: 2 2/12/10 2 1 1 #5: 2 2/13/10 3 0 0 #6: 2 2/14/10 4 1 0 #7: 3 7/27/11 3 1 0 #8: 3 7/28/11 7 2 2 #9: 3 7/29/11 3 1 0 library(plyr) ddply(dat1,.(Area,DATE),colwise(sum,c(X,Xn,Y))) # Area DATE X Xn Y #1 1 1/10/10 1 1 0 #2 1 1/11/10 0 0 1 #3 1 1/12/10 3 0 0 #4 2 2/12/10 2 1 1 #5 2 2/13/10 3 0 0 #6 2 2/14/10 4 1 0 #7 3 7/27/11 3 1 0 #8 3 7/28/11 7 2 2 #9 3 7/29/11 3 1 0 A.K. - Original Message - From: Paul Wennekes paul.wenne...@evobio.eu To: r-help@r-project.org Cc: Sent: Thursday, October 11, 2012 11:55 AM Subject: [R] Formatting data for bootstrapping for confidence intervals Hi all, New to R, so this may be obvious to some. I've been trying to figure this out for a while, I have a dataset events that looks something like this: Area NAME DATE X Xn Y 1 X 1/10/10 1 1 0 1 Y 1/11/10 0 0 1 1 X 1/12/10 1 0 0 1 X 1/12/10 1 0 0 1 X 1/12/10 1 0 0 2 X 2/12/10 1 1 0 2 X 2/12/10 1 0 0 2 Y 2/12/10 0 0 1 2 X 2/13/10 1 0 0 2 X 2/13/10 1 0 0 2 X 2/13/10 1 0 0 2 X 2/14/10 1 0 0 2 X 2/14/10 1 0 0 2 X 2/14/10 1 1 0 2 X 2/14/10 1 0 0 3 X 7/27/11 1 0 0 3 X 7/27/11 1 1 0 3 X 7/27/11 1 0 0 3 X 7/28/11 1 0 0 3 X 7/28/11 1 1 0 3 X 7/28/11 1 0 0 3 X 7/28/11 1 0 0 3 Y 7/28/11 0 0 1 3 X 7/28/11 1 0 0 3 X 7/28/11 1 1 0 3 Y 7/28/11 0 0 1 3 X 7/28/11 1 0 0 3 X 7/29/11 1 0 0 3 X 7/29/11 1 0 0 3 X 7/29/11 1 1 0 X and Y are events. Every row represents a single event happening, with a 1 indicating which one happens at that time. Xn indicates X happening at night. I want to bootstrap these events over days but I think I need to summarize them first, ie. get something that looks like this: Area DATE X Xn Y 1 1/10/10 1 1 0 1 1/11/10 0 0 1 1 1/12/10 3 0 0 2 2/12/10 2 1 1 etc. and then for each Area, bootstrap the data over the days. Any ideas? I've tried using the 'reshape' package but I don't know how to sum over parts of the columns as defined by the DATE values... Many thanks ahead! -- View this message in context:
[R] model selection with spg and AIC (or, convert list to fitted model object)
Adam, See the attached R code that solves your problem and beyond. One important issue is that you are enforcing constraints only indirectly. You need to make sure that P1, P2, and P3 (which are functions of original parameters and time) are all between 0 and 1. It is not enough to impose constraints on original parameters p1, p2, mu1 and mu2. I also show you how to do a likelihood ratio test with the results from spg. You can also do the same for nlminb or Rvmmin. Finally, I also show you how to use optimx to compare different algorithms. This shows you that in addition to spg, you also get very good results (as seen from objective function values and KKT conditions) with a number of other algorithms (e.g., Rvmmin, nlminb), many of which are much faster than spg. This example illustrates the utility of optimx(). I am always surprised as to why more R users doing optimization are not using optimx. This is a very powerful for benchmarking unconstrained and box-constrained optimization problems. It deserves to be used widely, in my biased, but correct, opinion. Ravi Ravi Varadhan, Ph.D. Assistant Professor The Center on Aging and Health Division of Geriatric Medicine Gerontology Johns Hopkins University rvarad...@jhmi.edumailto:rvarad...@jhmi.edu 410-502-2619 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] a question
Dear R-helpers, I need to read some data from output of garchFit in fGarch. my model is garch(1,1) and i want to read coefficients(omega,alpha,beta) and timeseries(x) and conditional SD(s). because i need them to use in other formula. for example :omega+x[1]+s[3] and maybe i have several simulation then i need a general way to read them, not to read with my eyes for example the quantity of omega then subsitute in formula. Best. M.Izadi __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] In vegan package: running adonis (or similar) on a distance matrix
Hi, Using Vegan package I was wondering if there's a way to use a distance matrix as an input for adonis (or any of the other similar hypothesis testing functions) instead of the usual species by sample table. Working in the field of microbial ecology, what I'm trying to do is to overcome the problem of having to use discrete units such as species or OTUs, which are problematic in microbial ecology (if not outright theoretically false). What I have instead is a phylometric distance matrix between all my samples based on a phylogenetic tree. Some people have apparently made such a python implementation http://qiime.org/scripts/compare_categories.html, but I'd rather use R. Thanks in advance, Roey __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Changing NA to 0 in selected columns of a dataframe
I've been beating my head on the table for hours now and don't understand why this doesn't work. I have a dataframe that I want to change NAs to 0 for some of the columns and not others. Consider this... #create dataframe A = c(1:5) B = c(6, 7, NA, NA, NA) C = c(NA, NA, 13, 14, 15) D = c(16:20) E = c(21, NA, NA, NA, 25) data = as.data.frame ( cbind ( A, B, C, D, E ) ) #convert NAs in columns B C to 0 data [ is.na ( data [ , 2:3] ) ] = 0 Error in `[-.data.frame`(`*tmp*`, is.na(data[, 2:3]), value = 0) : only logical matrix subscripts are allowed in replacement I only want to change NA in columns B and C. When I run this I get this error. Why can't I designate rows using is.na()? -- View this message in context: http://r.789695.n4.nabble.com/Changing-NA-to-0-in-selected-columns-of-a-dataframe-tp4645917.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] characters, mathematical expressions and computed values
Hello, I have to add Age (bar(x)=14.3) as a title on a chart. I am unable to get this to working. I have tried bquote, substitute and expression, but they are only doing a part of the job. new- c(14.3, 18.5, 18.1, 17.7, 18, 15.9, 19.6, 17.3, 17.8, 17.5, 15.4, 16.3, 15, 17.1, 17.1, 16.4, 15.2, 16.7, 16.7, 16.9, 14.5, 16.6, 15.8, 15.2, 16.2, 15.6, 15, 17.1, 16.7, 15.6, 15, 15.8, 16.8, 17, 15.2, 15.8, 15.7, 14.7, 17.3, 14.9, 16.8, 14.6, 19.3, 15.3, 14.7, 13.3, 16.5, 16, 14.2, 16.1, 15.2, 13.4, 17.7, 15.5, 14.5, 15.7, 13.6, 14.1, 20, 17.2, 16.5, 14.3, 13.7, 14.7, 15.4, 13.6, 17, 17.3, 15.4, 15.5, 16.6, 15.8, 15.7, 14.7, 14.2, 14.2, 14, 14.2, 19.1, 17.2, 18.3, 13.9, 16, 15.9, 14.9, 14.6, 15.9, 12.2, 14.1, 12, 12.8, 17.1, 17, 15, 15.8, 15.9, 16.1, 18, 14.7, 18.9 ) hist(new, xlab='30-day Death Rate',xlim=c(7,22),main=expression(Heart Attack( * bar(X) * )= * mean(new))) I would appreciate any pointers on getting this correct. Thanks -- View this message in context: http://r.789695.n4.nabble.com/characters-mathematical-expressions-and-computed-values-tp4645916.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on survival
Hi, I'm going crazy trying to plot a quite simple graph. i need to plot estimated hazard rate from a cox model. supposing the model i like this: coxPhMod=coxph(Surv(TIME, EV) ~ AGE+A+B+strata(C) data=data) with 4 level for C. how can i obtain a graph with 4 estimated (better smoothed) hazard curve (base-line hazard + 3 proportional) to highlight the effect of C. thanks!! laudan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Changing NA to 0 in selected columns of a dataframe
Actually what does only logical matrix subscripts are allowed in replacement mean. I can designate columns using is.na. -- View this message in context: http://r.789695.n4.nabble.com/Changing-NA-to-0-in-selected-columns-of-a-dataframe-tp4645917p4645918.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Formatting data for bootstrapping for confidence intervals
Thank you! That had me stuck for quite a while and this worked like a charm! -- View this message in context: http://r.789695.n4.nabble.com/Formatting-data-for-bootstrapping-for-confidence-intervals-tp4645860p4645920.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Changing NA to 0 in selected columns of a dataframe
Hi, Try this: dat1 = as.data.frame ( cbind ( A, B, C, D, E ) ) dat1$B[is.na(dat1$B)]-0 dat1$C[is.na(dat1$C)]-0 dat1 # A B C D E #1 1 6 0 16 21 #2 2 7 0 17 NA #3 3 0 13 18 NA #4 4 0 14 19 NA #5 5 0 15 20 25 A.K. - Original Message - From: scoyoc sco...@gmail.com To: r-help@r-project.org Cc: Sent: Thursday, October 11, 2012 6:05 PM Subject: [R] Changing NA to 0 in selected columns of a dataframe I've been beating my head on the table for hours now and don't understand why this doesn't work. I have a dataframe that I want to change NAs to 0 for some of the columns and not others. Consider this... #create dataframe A = c(1:5) B = c(6, 7, NA, NA, NA) C = c(NA, NA, 13, 14, 15) D = c(16:20) E = c(21, NA, NA, NA, 25) data = as.data.frame ( cbind ( A, B, C, D, E ) ) #convert NAs in columns B C to 0 data [ is.na ( data [ , 2:3] ) ] = 0 Error in `[-.data.frame`(`*tmp*`, is.na(data[, 2:3]), value = 0) : only logical matrix subscripts are allowed in replacement I only want to change NA in columns B and C. When I run this I get this error. Why can't I designate rows using is.na()? -- View this message in context: http://r.789695.n4.nabble.com/Changing-NA-to-0-in-selected-columns-of-a-dataframe-tp4645917.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Changing NA to 0 in selected columns of a dataframe
On Thu, Oct 11, 2012 at 11:58 PM, arun smartpink...@yahoo.com wrote: Hi, Try this: dat1 = as.data.frame ( cbind ( A, B, C, D, E ) ) No. Do not try this. It is a Very Bad Thing to use as.data.frame(cbind(...)) instead of data.frame(...) for reasons I've mentioned before on this list. In short, cbind() forces all its arguments to a single mode, thereby missing the entire point of a data frame. Cheers, Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a question
Hello, What a terribly asked question. Let me rephrase it. You have a time series 'x' simulated with garchSim from package fGarch and have fitted a model using garchFit. 1. You want to extract the coefficients. coef(fit) 2. You want the series of observations (simulated) and of conditional sd. x$garch x$sigma Hope this helps, Rui Barradas Em 11-10-2012 21:29, mina izadi escreveu: Dear R-helpers, I need to read some data from output of garchFit in fGarch. my model is garch(1,1) and i want to read coefficients(omega,alpha,beta) and timeseries(x) and conditional SD(s). because i need them to use in other formula. for example :omega+x[1]+s[3] and maybe i have several simulation then i need a general way to read them, not to read with my eyes for example the quantity of omega then subsitute in formula. Best. M.Izadi __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Changing NA to 0 in selected columns of a dataframe
Hi Michael, Sorry! You are right. That was OP's code, which I cut and paste without noticing it. To Scoyoc: You can also try this: dat1 -data.frame(A, B, C, D, E ) dat1new-dat1[,2:3] dat1new[is.na(dat1new)]-0 dat1[,2:3]-dat1new dat1 # A B C D E #1 1 6 0 16 21 #2 2 7 0 17 NA #3 3 0 13 18 NA #4 4 0 14 19 NA #5 5 0 15 20 25 A.K. - Original Message - From: R. Michael Weylandt michael.weyla...@gmail.com To: arun smartpink...@yahoo.com Cc: scoyoc sco...@gmail.com; R help r-help@r-project.org Sent: Thursday, October 11, 2012 7:04 PM Subject: Re: [R] Changing NA to 0 in selected columns of a dataframe On Thu, Oct 11, 2012 at 11:58 PM, arun smartpink...@yahoo.com wrote: Hi, Try this: dat1 = as.data.frame ( cbind ( A, B, C, D, E ) ) No. Do not try this. It is a Very Bad Thing to use as.data.frame(cbind(...)) instead of data.frame(...) for reasons I've mentioned before on this list. In short, cbind() forces all its arguments to a single mode, thereby missing the entire point of a data frame. Cheers, Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] characters, mathematical expressions and computed values
I think that bquote, with its .() operator, suffices for [almost?] any single title; don't bother fiddling with expression(), substitute(), or parse(). (You can make those work in many situations, but if you stick with just bquote then you can spend your time on the title itself.) E.g., hist(new, main=bquote(Heart Attack ( * bar(X)==.(mean(new)) * ))) or, if you want to limit the number of digits after the decimal point, hist(new, main=bquote(Heart Attack ( * bar(X)==.(round(mean(new),1)) * ))) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of 1Rnwb Sent: Thursday, October 11, 2012 2:32 PM To: r-help@r-project.org Subject: [R] characters, mathematical expressions and computed values Hello, I have to add Age (bar(x)=14.3) as a title on a chart. I am unable to get this to working. I have tried bquote, substitute and expression, but they are only doing a part of the job. new- c(14.3, 18.5, 18.1, 17.7, 18, 15.9, 19.6, 17.3, 17.8, 17.5, 15.4, 16.3, 15, 17.1, 17.1, 16.4, 15.2, 16.7, 16.7, 16.9, 14.5, 16.6, 15.8, 15.2, 16.2, 15.6, 15, 17.1, 16.7, 15.6, 15, 15.8, 16.8, 17, 15.2, 15.8, 15.7, 14.7, 17.3, 14.9, 16.8, 14.6, 19.3, 15.3, 14.7, 13.3, 16.5, 16, 14.2, 16.1, 15.2, 13.4, 17.7, 15.5, 14.5, 15.7, 13.6, 14.1, 20, 17.2, 16.5, 14.3, 13.7, 14.7, 15.4, 13.6, 17, 17.3, 15.4, 15.5, 16.6, 15.8, 15.7, 14.7, 14.2, 14.2, 14, 14.2, 19.1, 17.2, 18.3, 13.9, 16, 15.9, 14.9, 14.6, 15.9, 12.2, 14.1, 12, 12.8, 17.1, 17, 15, 15.8, 15.9, 16.1, 18, 14.7, 18.9 ) hist(new, xlab='30-day Death Rate',xlim=c(7,22),main=expression(Heart Attack( * bar(X) * )= * mean(new))) I would appreciate any pointers on getting this correct. Thanks -- View this message in context: http://r.789695.n4.nabble.com/characters-mathematical- expressions-and-computed-values-tp4645916.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fonts in *.Rd files.
On 12-10-11 6:05 PM, Rolf Turner wrote: I wanted to put a certain string in sans serif font in an *.Rd file that I was writing. I tried {\sf ...} and \textsf{...} but both resulted in the warning unknown macro. The manual on Writing R Extensions seems to me to imply that one should be able to invoke such LaTeX macros (section 2.3): Each of the above commands takes LaTeX-like input, so other macros may be used within text. Is there something else I need to do to get this to work? Or some other way to get sans serif? I would appreciate any pointers. LaTeX-like refers to the way the parser works, it doesn't imply that all LaTeX macros are supported. I think you have two ways to get what you want. You can look through the available macros (listed in the Writing R Extensions manual, section 2.3) for something. Most of the macros describe the type of text, rather than the formatting. \code{} or \verb{} might be what you're after. The other choice, which works only for LaTeX output, is to say you want some actual LaTeX macros. These would only be output when producing a PDF of the help page. Then you really do have all of LaTeX at your disposal. Doing that is described in 2.11 on conditional text. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error msg using na.approx x and index must have the same length
Below I have written out some simplified data from my dataset. My goal is to interpolate Price based on timestamp. Therefore the closer a Price is in time to another price, the more like that price it will be. I want the interpolations for each St and not across St (St is a factor with levels A, B, and C). Unfortunately, I get error messages from code I wrote. In the end only IDs 10 and 14 will receive interpolated values because all other NAs occur at the beginning of a level. My code is given below the dataset. ID is int St is factor with 3 levels timestamp is POSIXlt Price is num Data.frame name is portfolio ID St timestamp Price 1 A2012-01-01 12:50:24.760 NA 2 A2012-01-01 12:51:25.860 72.09 3 A2012-01-01 12:52:21.613 72.09 4 A2012-01-01 12:52:42.010 75.30 5 A2012-01-01 12:52:42.113 75.30 6 B2012-01-01 12:56:20.893 NA 7 B2012-01-01 12:56:46.02367.70 8 B2012-01-01 12:57:19.30076.06 9 B2012-01-01 12:58:20.75077.85 10 B2012-01-01 12:58:20.797 NA 11 B2012-01-01 12:59:19.52779.57 12 C2012-01-01 13:00:21.84781.53 13 C2012-01-01 13:00:21.86081.53 14 C2012-01-01 13:00:21.873 NA 15 C2012-01-01 13:00:43.49384.69 16 D2012-01-01 12:01:21.52024.63 17 D2012-01-01 12:02:18.88021.13 I tried the following using na.approx from zoo package interpolatedPrice-unlist(tapply(portfolio$Price, portfolio$St, na.approx, portfolio$timestamp, na.rm=FALSE)) but keep getting error Error in na.approx.default(X[[1L]], ...) : x and index must have the same length I checked the length of every variable in the formula and they all have the same length so I am not sure why I get the error message. Jay [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survey package question
Hello Thomas, I use both svymean (with the expanded sample = people), and svyratio (voting unit level), using the same design: design -svydesign(id=~station + unit, fpc=~probstation+probunits, data=sample, pps=brewer) I got different results using the same sample: svyratio (voting unit) Ratio 2.5% 97.5% Result Cand1 0.05252871 0.04537301 0.05968441 0.05181146 Cand20.47226973 0.45215097 0.49238849 0.49041590 Cand3 0.47520156 0.45460831 0.49579482 0.45777264 svymean (expanded sample, individuals or votes) Mean SE 2.5 % 97.5 %Results Cand1 0.0528433 0.004562755 0.04390047 0.06178614 0.05181146 Cand2 0.4717504 0.010201398 0.45175605 0.49174480 0.49041590 Cand30.4754063 0.010429222 0.45496538 0.49584718 0.45777264 Point estimators are different, and confidence intervals are more narrow using svyratio. Could you give me any clue about what is going on? Thank you in advance. Sebastian On Thu, Oct 11, 2012 at 7:50 PM, Sebastián Daza sebastian.d...@gmail.com wrote: Hello Thomas, I use both svymean (with the expanded sample = people), and svyratio (voting unit level), using the same design: design -svydesign(id=~station + unit, fpc=~probstation+probunits, data=sample, pps=brewer) I got different results using the same sample: svyratio (voting unit) Ratio 2.5% 97.5% Result Cand1 0.05252871 0.04537301 0.05968441 0.05181146 Cand20.47226973 0.45215097 0.49238849 0.49041590 Cand3 0.47520156 0.45460831 0.49579482 0.45777264 svymean (expanded sample, individuals or votes) Mean SE 2.5 % 97.5 %Results Cand1 0.0528433 0.004562755 0.04390047 0.06178614 0.05181146 Cand2 0.4717504 0.010201398 0.45175605 0.49174480 0.49041590 Cand30.4754063 0.010429222 0.45496538 0.49584718 0.45777264 Point estimators are different, and confidence intervals are more narrow using svyratio. Could you give me any clue about what is going on? Thank you in advance. Sebastian On Thu, Oct 11, 2012 at 3:56 PM, Sebastián Daza sebastian.d...@gmail.com wrote: Thank you Thomas! On Thu, Oct 11, 2012 at 2:33 PM, Thomas Lumley tlum...@uw.edu wrote: On Fri, Oct 12, 2012 at 6:56 AM, Sebastián Daza sebastian.d...@gmail.com wrote: Hello, I have got a cluster sample using an election dataset where I already had the final results of a county-specific election. I am trying to figure out what would be the best sampling design for my data. The structure of the dataset is: 1) polling station (in general schools where people vote, for a county, for example, there are 15 polling stations) 2) inside each polling station, there are voting units, where people actually vote (on average there are about 40 voting units for polling station) 3) for each voting unit I have the total votes by candidate (e.g., candidate 1 =322, candidate 2=122, candidate 3= 89) The initial sampling design is: 1) selection of 5 polling stations PPS (based on number of voters) 2) selection of 10 voting units (SRS) I am interested in estimating the proportion of votes by candidate (let's assume we have 3 candidates). My naive estimate would be: votes for candidate 1 / all valid votes = proportion e.g. candidate 1= 2132 / 10874= .1906 candidate 2= 5323 / 10874= .4895 candidate 3= 3419 / 10874= .3144 In this case, the unit of analysis is voters (or votes). If I specify the sampling design using the survey package in this way... design -svydesign(id=~station + unit fpc=~probstation +probunit, data=sample, pps=brewer) svyciprop(~I(candidate1/totalVotes), design) ... I am assuming that the unit of analysis is the voting unit, right? and I am estimating an average among voting units? You want a ratio estimator svyratio(~candidate1, ~totalVotes, design) -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland -- Sebastián Daza -- Sebastián Daza -- Sebastián Daza __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Columns and rows
Hi, Could you please advice some easy way to do the following for a dataframe (header=F) having unequal column- row- length. 1. Combine/stack/join contents from - a) multiple rows into one column. b) multiple columns into one row. 2. Stack contents from multiple columns (or, rows) into one column (or, row). Thank you. Cheers, Santana [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error with cForest
All -- I have been trying to work with the 'Party' package using R v2.15.1 and have cobbled together a (somewhat) functioning code from examples on the web. I need to run a series of unbiased, conditional, cForest tests on several subsets of data which I have made into a loop. The results ideally will be saved to an output file in matrix form. The two questions regarding the script in question (script below) include: 1). After the cForest prints the initial results the error below is displayed: Random Forest using Conditional Inference Trees Number of trees: 500 Response: Light Inputs: FormH, FormV, Uratio, Void, Transmis Number of observations: 660 FormH FormV UratioVoidTransmis 2259311332 713202692 4250413991 50551193145 57138 Error in print.default(occupied$Fan, predicted) : invalid 'digits' argument This error only occurs when I change the dependent variable name from Fan (the variable I used to develop and test the script with) to any of the other dependent variables I need to test. All variables being tested are either continuous or categorical. Could anyone provide me with more information about this error and possibly the source in the coding? 2). The results are saving successfully to a file as a list however, I wish to save the data into a matrix that resembles: Subset 1, Subset 2, Subset n, Var Importance:VI.1 VI.2VI.n mse: mse.1 mse.2 mse.n rsq:rsq.1rsq.2rsq.n IV-1:x.1 x.2 x.n IV-2:y.1 y.2 y.n IV-n:n.1 a.2 n.n How could I create output that would append/write sequential results as a new column in the file as opposed to being in list form? Your comments are appreciated -- Jay Script in question: library(party) rm(list=ls()) Dynamic - read.csv(file=Dynamic_DATA.csv) set.seed(1851) ctrl - cforest_unbiased(ntree=500, mtry=5) for (i in 1:4){ ## Climate subset + occupied - subset(Dynamic, WDOccupancy == 1 Climate == i, select = c(DataSet:DGI)) + Dynamic.cf - cforest(Fan ~ FormH + FormV + Uratio + Void + Transmis, data = occupied, control = ctrl) + print(Dynamic.cf) + ## round(varimp(Dynamic.cf), 4) + ## Standard importance values __ + imp=varimp(Dynamic.cf, conditional = TRUE) #use varimp defaults + ## plot(imp) + print(imp) + + ## predict variables _ + predicted=predict(Dynamic.cf,OOB = TRUE) + print(occupied$Fan,predicted) + + residual=occupied$Fan-predicted + mse=mean(residual^2) + rsq=1-mse/var(occupied$Fan) + + ##Correlation between fitted values and original values: + correl - paste(cor(occupied$Fan,predicted)) + Correlation -paste(MSE:,mse, Rsq:,rsq, Correlation between fitted values and original values:,correl) + print(Correlation) + + ## combine results for output ___ + nam - paste(Climate =,i, sep= ) + assign(nam, 1:i) + results - rbind(nam, mse, rsq, correl) + + ## Writing data to csv file _ + write.table(results, file = variable importance3.csv, append = TRUE, quote = FALSE, sep = , col.names = TRUE, row.names = TRUE,) + write.table(imp, file = variable importance3.csv, append = TRUE, quote = FALSE, sep = , eol = \r, na = N/A, row.names = TRUE, col.names = TRUE, qmethod = double) + } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mclogit
Hello, I am new to R and am trying to complete a mixed conditional logistic regression. There are two issues that I am currently having: 1. I am not sure how to insert the random effects variable into the equation. My current equation is model-mclogit(Presence~AllWet+AllAg+strata(Pair)) where Presence is a binary value (present or absent), AllWet and AllAg shows the proportion of the location polygons covered by each habitat type, and Pair is showing the paired used and random polygons. The random effects that I want to control for are Bird ID (same bird at multiple locations). Does anyone know how to write the formula properly to include the random effects? 2. When I enter the formula I keep getting Error: could not find function mclogit When I was using the clogit function I had to add the survival package to perform the analysis. What package do I have to add for mclogit? Any assistance on this subject would be greatly appreciated. Thank you, Katelyn -- Katelyn Weaver M.Sc. Candidate Long Point Waterfowl Western University Cell: 519-619-4472 Email: kwea...@uwo.ca www.longpointwaterfowl.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.