[R] How to import the large dataset into R
HI, R Users, I met the following problem: I was trying to import data of one table in .accdb database into my ODBC database for being imported into R. The table contains 1021965 records. I always got the following error msg even I change the drive: The query can not be completed. Either the size of the query result is larger than the maximum size of a database(2GB), or there is not enough storage space on the disk to store the query result. Have you got some hints why it happened like this? Is it possible that I get all data in that table in another way to be imported into R? Thanks a lot. Kind regards.Tammy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Beta-coefficients for ZINB model
Dear users, Does anyone have any idea how to generate standardised beta coefficients for a ZINB model in R to compare which explanatory variables are having the greatest impact on the dependent variable? Thanks, Jeremy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] PST version 0.81
Dear R users, The PST package (version 0.81) is now available on the CRAN. It enables the modelling and analysis of categorical sequences with probabilistic suffix trees (PST). The package fits variable length Markov chain models and store them in PSTs. It is specifically adapted to the field of social sciences by allowing to learn VLMC models from sets of individual sequences possibly containing missing values, and to account for case weights. Fitted PSTs can be used among other for sequence prediction (pattern mining), imputation and artificial sequences generation. The package also includes original visualization tools of both the model and outcomes of sequence prediction. Alexis Gabadinho NCCR LIVES Institute for demographic and life course studies University of Geneva Switzerland ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-hpc] recursion depth limitations
On 14 December 2012 12:13, Suzen, Mehmet su...@acm.org wrote: On 13 December 2012 23:21, Rui Barradas ruipbarra...@sapo.pt wrote: But it does, each recursive call will load another copy of the function, and another copy of the variables used. In fact, the cost can become quite large since everything is loaded in memory again. Hope this helps, Many thanks for the replies. What about tail-recursion? I have seen that there were discussions about this: https://stat.ethz.ch/pipermail/r-help/2006-April/102873.html Since it was 6 years ago, maybe now things are little different. Isn't it logical to translate any recursive function to tail-recursive internally? While tail-recursive version returns a value but the operation is essentially the same. I don't know how difficult to do it generically but maybe there is an advantage of keeping recursion as it is. What would that advantage be? For example, tail recursion would run but not the recursive version: options(expressions=50) f - function(x) if(x 0) f(x-1); system.time(f(1000)) Error: C stack usage is too close to the limit Error: C stack usage is too close to the limit f1-function(x) x-1 f - function(x) f1(x); system.time(f(1000)) user system elapsed 0 0 0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-hpc] recursion depth limitations
On Dec 14, 2012, at 6:25 AM, Suzen, Mehmet wrote: On 14 December 2012 12:13, Suzen, Mehmet su...@acm.org wrote: On 13 December 2012 23:21, Rui Barradas ruipbarra...@sapo.pt wrote: But it does, each recursive call will load another copy of the function, and another copy of the variables used. In fact, the cost can become quite large since everything is loaded in memory again. Hope this helps, Many thanks for the replies. What about tail-recursion? I have seen that there were discussions about this: https://stat.ethz.ch/pipermail/r-help/2006-April/102873.html Since it was 6 years ago, maybe now things are little different. Isn't it logical to translate any recursive function to tail-recursive internally? While tail-recursive version returns a value but the operation is essentially the same. I don't know how difficult to do it generically but maybe there is an advantage of keeping recursion as it is. What would that advantage be? For example, tail recursion would run but not the recursive version: options(expressions=50) f - function(x) if(x 0) f(x-1); system.time(f(1000)) Error: C stack usage is too close to the limit Error: C stack usage is too close to the limit f1-function(x) x-1 f - function(x) f1(x); system.time(f(1000)) user system elapsed 0 0 0 You may be a bit misinformed about with tail recursion means - it still needs to evaluate the function for each recursion step, the only difference is that in such special case there is no additional information that needs to be stored -- and you have also proven why it's not as simple as you think: f - function(x) if(x 0) f(x-1) (f(100)) NULL f1-function(x) x-1 f - function(x) f1(x); (f(100)) [1] 99 Your latter version doesn't actually do anything anywhere close to the recursion you defined. R doesn't have tail recursion elimination and as Thomas pointed out in the cited post it would break some existing things if it did (call frame introspection etc.). [Whether it would be ok to break it for the sake of optimization is another question] Note that in all such cases you can easily write the problem in an iterative fashion (admittedly it is less elegant -- but if you are dealing with such deep structures chances are that you probably care about efficiency and to go native code anyway). Cheers, Simon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem loading .Rdata file
Dear all, I'm having troubles migrating a large matrix from one system to another. #system 1: Ubuntu 12.04, 64-bit, running R 2.15.2 # do some simulations # save the simulations save(Output, file = Simulations.Rdata) #Output is a numeric matrix with 6 columns and about 2M rows. Use ftp to transfer the Simulations.Rdata file to system 2 #system2: Windows XP, 32-bit running R 2.15.2 #in a clean R session load(Simulations.Rdata) Error: bad restore file magic number (file may be corrupted) -- no data loaded In addition: Warning message: file 'Simulations.Rdata' has magic number '' Use of save versions prior to 2 is deprecated The load() functions works correct on system 1 Any ideas how to solve this problem? Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem loading .Rdata file
On 12-12-14 8:00 AM, ONKELINX, Thierry wrote: Dear all, I'm having troubles migrating a large matrix from one system to another. #system 1: Ubuntu 12.04, 64-bit, running R 2.15.2 # do some simulations # save the simulations save(Output, file = Simulations.Rdata) #Output is a numeric matrix with 6 columns and about 2M rows. Use ftp to transfer the Simulations.Rdata file to system 2 #system2: Windows XP, 32-bit running R 2.15.2 #in a clean R session load(Simulations.Rdata) Error: bad restore file magic number (file may be corrupted) -- no data loaded In addition: Warning message: file 'Simulations.Rdata' has magic number '' Use of save versions prior to 2 is deprecated The load() functions works correct on system 1 Any ideas how to solve this problem? I suspect you really did get a corrupted transfer. Try computing an md5 checksum before and after the transfer, using tools::md5sum. Duncan Murdoch Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rprof causing R to crash
Uwe, I am unfortunately not able to upgrade to R 2.15.2 right now, but I have seen a similar problem with several older R versions. If you want to test with a shorter script, you can try the lines below. These provoke a crash from a fresh R session on my machine (R 2.15.1 Windows 7): Rprof() z = 1 for (i in 1:1e8) z = z+1/i This runs without problems without the call to Rprof(). I was far from using all my memory with this call, but you could also check with a higher number of loops in case you can run these lines and it is somehow memory related. Just for comparison, I tried 1e9 loops after a call to Rprof() on one of our Linux servers (R 2.14.0) without any problems. Jon On 12-Dec-12 17:06, Uwe Ligges wrote: On 12.12.2012 00:05, Marian Talbert wrote: I'm trying to use Rprof() to identify bottlenecks and speed up a particullary slow section of code which reads in a portion of a tif file and compares each of the values to values of predictors used for model fitting. I've written up an example that anyone can run. Generally temp would be a section of a tif read into a data.frame and used later for other processing. The first portion which just records the time works in about 6 seconds the second part causes RGui to immediately close with no error or warning. Any advice on how to get Rprof to work or how to speed up this code would be greatly appreciated. I'm using Windows 7 (which might be my problem) and R version 2.15.0. The problem is rather the R version: I cannot reproduce errors with a recent R. Uwe Ligges CalcMESS-function(tiff.entry,pred.vect){ f-sum(pred.vecttiff.entry)/length(pred.vect)*100 if(is.na(f)) return(NA) if(f==0) return((tiff.entry-min(pred.vect))/(max(pred.vect)-min(pred.vect))*100) if(0f f=50) return(2*f) if(50=f f100) return(2*(100-f)) if(f==100) return((max(pred.vect)-tiff.entry)/(max(pred.vect)-min(pred.vect))*100) else return(NA) } train.dat - data.frame(a=runif(200),b=runif(200),c=runif(200),d=runif(200)) temp - data.frame(a=runif(13),b=runif(13),c=runif(13),d=runif(13)) pred.rng-temp vnames.final.mod - names(train.dat) nvars.final - length(vnames.final.mod) start.time-Sys.time() for(k in 1:nvars.final){ pred.range-train.dat[,match(vnames.final.mod[k],names(train.dat))] pred.rng[,k]-mapply(CalcMESS,tiff.entry=temp[,match(vnames.final.mod[k],names(temp))],MoreArgs=list(pred.vect=pred.range)) } Sys.time()-start.time Rprof(C:\\temp\\mapply.out) for(k in 1:nvars.final){ pred.range-train.dat[,match(vnames.final.mod[k],names(train.dat))] pred.rng[,k]-mapply(CalcMESS,tiff.entry=temp[,match(vnames.final.mod[k],names(temp))],MoreArgs=list(pred.vect=pred.range)) } Rprof(NULL) -- View this message in context: http://r.789695.n4.nabble.com/Rprof-causing-R-to-crash-tp4652846.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] winBUGS with R2WinBUGS on a mac – path issue
Hi, I try to execute a winBUGS model within R on a Mac. I use wine and the R2WinBUGS package. Now I have a small problem with the path variables; the path to the bugs directory include paranthesis and because of that it won't run. error message: sh: -c: line 0: syntax error near unexpected token `(' Is there any way to get winBUGS to run even with a path that includes parenthesis? thanks for any suggestions! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beta-coefficients for ZINB model
At 08:33 14/12/2012, Jeremy Goss wrote: Dear users, Does anyone have any idea how to generate standardised beta coefficients for a ZINB model in R to compare which explanatory variables are having the greatest impact on the dependent variable? You might like to look at Ulrike Grömping's work on relative importance in regression. Not sure how directly this translates into the non-linear case though. J Stat Soft October 2006 vol 17 number 1 Thanks, Jeremy [[alternative HTML version deleted]] Michael Dewey i...@aghmed.fsnet.co.uk http://www.aghmed.fsnet.co.uk/home.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] More efficient use of reshape?
I think David was pointing out that reshape() is not a reshape2 function. It is in the stats package. I am not sure exactly what you are doing but perhaps something along the lines of library(reshape2) mm - melt(clim.data, id = Cs(yr_frac, yr_mn,AMO, NINO34, SSTA)) is a start? I also don't think that the more recent versions of ggplot2 automatically load reshape2 so it may be that you are working with a relatively old installation of ggplot and reshape? sessionInfo() R version 2.15.2 (2012-10-26) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8 [4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] lubridate_1.2.0directlabels_2.9 RColorBrewer_1.0-5 gridExtra_0.9.1 stringr_0.6.2 [6] scales_0.2.3 plyr_1.8 reshape2_1.2.1 ggplot2_0.9.3 loaded via a namespace (and not attached): [1] colorspace_1.2-0 dichromat_1.2-4 digest_0.6.0 gtable_0.1.2 labeling_0.1 [6] MASS_7.3-22 munsell_0.4 proto_0.3-9.2tools_2.15.2 John Kane Kingston ON Canada -Original Message- From: natemille...@gmail.com Sent: Thu, 13 Dec 2012 09:58:34 -0800 To: dwinsem...@comcast.net Subject: Re: [R] More efficient use of reshape? Sorry David, In my attempt to simplify example and just include the code I felt was necessary I left out the loading of ggplot2, which then imports reshape2, and which was actually used in the code I provided. Sorry to the mistake and my misunderstanding of where the reshape function was coming from. Should have checked that more carefully. Thanks, Nate On Thu, Dec 13, 2012 at 9:48 AM, David Winsemius dwinsem...@comcast.netwrote: On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote: Hi all, I have played a bit with the reshape package and function along with melt and cast, but I feel I still don't have a good handle on how to use them efficiently. Below I have included a application of reshape that is rather clunky and I'm hoping someone can offer advice on how to use reshape (or melt/cast) more efficiently. You do realize that the 'reshape' function is _not_ in the reshape package, right? And also that the reshape package has been superseded by the reshape2 package? -- David. #For this example I am using climate change data available on-line file - ( http://processtrends.com/**Files/RClimate_consol_temp_**anom_latest.csvhttp://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv ) clim.data - read.csv(file, header=TRUE) library(lubridate) library(reshape) #I've been playing with the lubridate package a bit to work with dates, but as the climate dataset only uses year and month I have #added a day to each entry in the yr_mn column and then used dym from lubridate to generate the POSIXlt formatted dates in #a new column clim.data$date clim.data$yr_mn-paste(01, clim.data$yr_mn, sep=) clim.data$date-dym(clim.data$**yr_mn) #Now to the reshape. The dataframe is in a wide format. The columns GISS, HAD, NOAA, RSS, and UAH are all different sources #from which the global temperature anomaly has been calculated since 1880 (actually only 1978 for RSS and UAH). What I would like to #do is plot the temperature anomaly vs date and use ggplot to facet by the different data source (GISS, HAD, etc.). Thus I need the #data in long format with a date column, a temperature anomaly column, and a data source column. The code below works, but its #really very clunky and I'm sure I am not using these tools as efficiently as I can. #The varying=list(3:7) specifies the columns in the dataframe that corresponded to the sources (GISS, etc.), though then in the resulting #reshaped dataframe the sources are numbered 1-5, so I have to reassigned their names. In addition, the original dataframe has #additional data columns I do not want and so after reshaping I create another! dataframe with just the columns I need, and #then I have to rename them so that I can keep track of what everything is. Whew! Not the most elegant of code. d-reshape(clim.data, varying=list(3:7),idvar=date**, v.names=anomaly,direction=**long) d$time-ifelse(d$time==1,**GISS,d$time) d$time-ifelse(d$time==2,HAD**,d$time) d$time-ifelse(d$time==3,**NOAA,d$time) d$time-ifelse(d$time==4,RSS**,d$time) d$time-ifelse(d$time==5,UAH**,d$time) new.data-data.frame(d$date,d$**time,d$anomaly) names(new.data)-c(date,**source,anomaly) I realize this is a mess, though it works. I think with just some help on how better to work this
[R] Hello R User
Hello R User, In the sample data given below, time is recorded for each id subsequently. For the analysis, for each id, I would like to set 1st recorded time to zero and thereafter find the difference from previous time. I.e. for ID==1, I would like to see Time=0,3,1,3,6. This needs to be implemented to big data set. Any suggestions are much appreciated! Thanks, Bibek ID Time 1 3 1 6 1 7 1 10 1 16 2 12 2 18 2 19 2 25 2 28 2 30 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beta-coefficients for ZINB model
I always liked Johan Bring's (1994. How to standardize regression coefficients. Am. Stat. 48(3):209-213.) arguments for standardizing the beta estimates in a linear model by their partial standard deviations which equates to comparing ratios of t-statistics between variables to determine their relative importance. It seems to me that this approach ought to work reasonably well for any of the generalized linear models. Brian On Fri, Dec 14, 2012 at 8:00 AM, Michael Dewey i...@aghmed.fsnet.co.ukwrote: At 08:33 14/12/2012, Jeremy Goss wrote: Dear users, Does anyone have any idea how to generate standardised beta coefficients for a ZINB model in R to compare which explanatory variables are having the greatest impact on the dependent variable? You might like to look at Ulrike Grömping's work on relative importance in regression. Not sure how directly this translates into the non-linear case though. J Stat Soft October 2006 vol 17 number 1 Thanks, Jeremy [[alternative HTML version deleted]] Michael Dewey i...@aghmed.fsnet.co.uk http://www.aghmed.fsnet.co.uk/**home.htmlhttp://www.aghmed.fsnet.co.uk/home.html __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Define a custom-built loss function and combine it with nls()
Dear R helpers, For an allometric analysis (allometric equation y = a*x^b) I would like to apply a non-linear regression instead of using log-log transformations of the measured parameters x and y and a Model II linear regression. Since both of the variables x and y are random, I would like to apply a Model II non-linear analog of either Reduced Major Axis or Major Axis Regression. The allometric equation for the non-linear regression model would be: y = a*x^b, with y as the dimension of a body part (dependent variable) and x a dimension of body size (independent variable). With applying a non-linear regression I would like to get estimates for the parameters a and b. Fortunately, I already have a recommended loss function, proposed by Ebert Russel 1994, J. Theor. Biol.: loss = abs(y*abs(x-(y/a)^(1/b))-a/(b+1)*abs(x^(b+1)-(y/a)^((b+1)/b))) How may I define the loss function above and combine it with a non-linear regression nls() in R? Example for a data frame with the measured variables x and y: x-1:10 y-c(0.3,0.7,0.9,1.3,2.0,2.4,3.3,3.8,5.0,5.8) d-data.frame(x,y) Non-linear regression with inbuilt loss-function (sum of squared residuals): nlmodel-nls(y~a*x^b,data=d,start=list(a=1,b=0.1),trace=T) Many thanks!!! Christiana -- Christiana Anagnostou * Department of Evolutionary Ecology and Genetics Zoological Institute Christian-Albrechts-University of Kiel Am Botanischen Garten 9 24118 Kiel Germany Tel: +49-431-8804145 (work) Tel: +49-431-5709649 (home) Fax: +49-431-8802403 Email: canagnos...@zoologie.uni-kiel.de Web: www.uni-kiel.de/zoologie/evoecogen/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] More efficient use of reshape?
Thanks John, I appreciate your help. With the help of Dennis Murphy, code along the lines of this climData_melt - melt(clim.data, id = 'date', measure = c(GISS, HAD, NOAA, RSS, UAH)) actually gets the data into the correct form very simply and from there its easy to plot with faceting in ggplot2. Thanks for helping clear up the reshape and reshape2 issues. I try to make sure I have those types of things figured out before posting to the list. Didn't mean to confuse the issue by incorrectly referring to the packages/functions or their locations. Thanks, Nate On Fri, Dec 14, 2012 at 7:38 AM, John Kane jrkrid...@inbox.com wrote: I think David was pointing out that reshape() is not a reshape2 function. It is in the stats package. I am not sure exactly what you are doing but perhaps something along the lines of library(reshape2) mm - melt(clim.data, id = Cs(yr_frac, yr_mn,AMO, NINO34, SSTA)) is a start? I also don't think that the more recent versions of ggplot2 automatically load reshape2 so it may be that you are working with a relatively old installation of ggplot and reshape? sessionInfo() R version 2.15.2 (2012-10-26) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8 [4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] lubridate_1.2.0directlabels_2.9 RColorBrewer_1.0-5 gridExtra_0.9.1stringr_0.6.2 [6] scales_0.2.3 plyr_1.8 reshape2_1.2.1 ggplot2_0.9.3 loaded via a namespace (and not attached): [1] colorspace_1.2-0 dichromat_1.2-4 digest_0.6.0 gtable_0.1.2 labeling_0.1 [6] MASS_7.3-22 munsell_0.4 proto_0.3-9.2tools_2.15.2 John Kane Kingston ON Canada -Original Message- From: natemille...@gmail.com Sent: Thu, 13 Dec 2012 09:58:34 -0800 To: dwinsem...@comcast.net Subject: Re: [R] More efficient use of reshape? Sorry David, In my attempt to simplify example and just include the code I felt was necessary I left out the loading of ggplot2, which then imports reshape2, and which was actually used in the code I provided. Sorry to the mistake and my misunderstanding of where the reshape function was coming from. Should have checked that more carefully. Thanks, Nate On Thu, Dec 13, 2012 at 9:48 AM, David Winsemius dwinsem...@comcast.netwrote: On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote: Hi all, I have played a bit with the reshape package and function along with melt and cast, but I feel I still don't have a good handle on how to use them efficiently. Below I have included a application of reshape that is rather clunky and I'm hoping someone can offer advice on how to use reshape (or melt/cast) more efficiently. You do realize that the 'reshape' function is _not_ in the reshape package, right? And also that the reshape package has been superseded by the reshape2 package? -- David. #For this example I am using climate change data available on-line file - ( http://processtrends.com/**Files/RClimate_consol_temp_**anom_latest.csv http://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv ) clim.data - read.csv(file, header=TRUE) library(lubridate) library(reshape) #I've been playing with the lubridate package a bit to work with dates, but as the climate dataset only uses year and month I have #added a day to each entry in the yr_mn column and then used dym from lubridate to generate the POSIXlt formatted dates in #a new column clim.data$date clim.data$yr_mn-paste(01, clim.data$yr_mn, sep=) clim.data$date-dym(clim.data$**yr_mn) #Now to the reshape. The dataframe is in a wide format. The columns GISS, HAD, NOAA, RSS, and UAH are all different sources #from which the global temperature anomaly has been calculated since 1880 (actually only 1978 for RSS and UAH). What I would like to #do is plot the temperature anomaly vs date and use ggplot to facet by the different data source (GISS, HAD, etc.). Thus I need the #data in long format with a date column, a temperature anomaly column, and a data source column. The code below works, but its #really very clunky and I'm sure I am not using these tools as efficiently as I can. #The varying=list(3:7) specifies the columns in the dataframe that corresponded to the sources (GISS, etc.), though then in the resulting #reshaped dataframe the sources are numbered 1-5, so I have to reassigned their names. In addition, the original dataframe has #additional data columns I do not want and so after reshaping I
[R] Plotting from a matrix, where column 1 is your x value
Hello, I'm very new to R, and have managed to plot xy graphs, but can't seem to plot a matrix properly. The first column is my time points, and the following columns 2-4 are the replicates of my experiment. I've tried using row.names=1 to make the first column the value for that row (whereas before I had 1-31 sequence) and then my 1st column) but I can't work out how to tell it that column 1 of my dataset is the xvalue for the data in that row? It's probably really simple, but I can't seem to figure it out! Many thanks, any help appreciated! Laura -- View this message in context: http://r.789695.n4.nabble.com/Plotting-from-a-matrix-where-column-1-is-your-x-value-tp4653076.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems with displaying graphics in Linux
Hi R users, I've previously updated my R installation to R 2.15.2 in my Arch Linux 64 bit. I haven't used it in a while after updating, but now that I'm using it again, I find that the graphics device is not functioning properly. For any plots that I create, it's going to show a blank window. Upon clicking the close button of the window, the plot that it's supposed to show flickers very briefly and then the window closes. Running dev.list() when the window is visible (without any plots) shows this output. X11cairo 2 It changes to NULL, as expected, if I run it after closing the window. Does anyone have any pointers on how to fix this? Thanks in advance, Bow __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] COZIGAM removed from CRAN
Hi All, Another quick question - I noticed that COZIGAM has been removed from CRAN, and that you are referred to the archive for previous versions (last updated 23 July 2012). Is this package still ok to use? Or is there an alternative which might also fit zero-inflated GAMs? Thanks, Raeanne The Scottish Association for Marine Science (SAMS) is registered in Scotland as a Company Limited by Guarantee (SC009292) and is a registered charity (9206). SAMS has an actively trading wholly owned subsidiary company: SAMS Research Services Ltd a Limited Company (SC224404). All Companies in the group are registered in Scotland and share a registered office at Scottish Marine Institute, Oban Argyll PA37 1QA. The content of this message may contain personal views which are not the views of SAMS unless specifically stated. Please note that all email traffic is monitored for purposes of security and spam filtering. As such individual emails may be examined in more detail. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plotting log regression
I want to plot the regression line of a log regression into a plot with my normal, nonlog, data. for example x - (1,2,3,4,5) y - (6,7,8,9,10) plot (x,y) I tried a log-regression by a - glm (log(y) ~ log(x)) and then i tried to insert the answer to my graph, where the standard values are shown: abline (a, col=red) unfortunately it does not work -- View this message in context: http://r.789695.n4.nabble.com/plotting-log-regression-tp4653087.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-hpc] recursion depth limitations
On 13 December 2012 23:21, Rui Barradas ruipbarra...@sapo.pt wrote: But it does, each recursive call will load another copy of the function, and another copy of the variables used. In fact, the cost can become quite large since everything is loaded in memory again. Hope this helps, Many thanks for the replies. What about tail-recursion? I have seen that there were discussions about this: https://stat.ethz.ch/pipermail/r-help/2006-April/102873.html Since it was 6 years ago, maybe now things are little different. Best -m Mehmet __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combined Marimekko/heatmap
Neal- Perhaps the following code is a start for what you want. -tgs par(mar=c(1,1,1,1), oma = c(0,0,0,0), mgp=c(1.5,.2,0), tcl=0, cex.axis=.75, col.axis=black, pch=16) Z - textConnection( country A1 A2 A3 A 3 4 5 B 6 9 8 C 6 9 5) ddd - read.table(Z,header=TRUE) close(Z) CountryPcts - rowSums(ddd[,-1]) / sum(ddd[,-1]) plot.new() plot.window(ylim=c(0,1),xlim=c(0,1),xaxs = 'i',yaxs='i',las=1) box() abline(h = cumsum(CountryPcts),lwd=2) labxats - NULL vlines - ddd[,-1] / sum(ddd[,-1]) / CountryPcts vlines - t(apply(vlines,1,cumsum)) yyy - c(0,rep(cumsum(CountryPcts),each=2)) yyy - head(yyy,-1) for(i in 1:nrow(ddd) ){ xxx - rep(vlines[,i],each=2) lines(xxx,yyy,col=red,lwd=3) labxats[i] - rev(xxx)[1] } labxats - (labxats + c(0,head(labxats,-1)))/2 labyats - (cumsum(CountryPcts) + c(0,head(cumsum(CountryPcts),-1)))/2 axis(2,at=labyats,labels = ddd[,1],las=1 ) axis(3,at=labxats,labels = colnames(ddd)[-1],las=1 ) On Thu, Dec 13, 2012 at 6:09 PM, Neal Humphrey nhumph...@clasponline.orgwrote: Hi all, I'm trying to figure out a way to create a data graphic that I haven't ever seen an example of before, but hopefully there's an R package out there for it. The idea is to essentially create a heatmap, but to allow each column and/or row to be a different width, rather than having uniform column and row height. This is sort of like a Marimekko chart in appearance, except that rather than use a single color to represent the category, the color represents a value and all the y-axis heights in each column line up with each other. That way color represents one variable, while the area of the cell represents another. In my application, my heatmap has discrete categorical data rather than continuous. Rows are countries, columns are appliances, and I want to scale the width and height of each column to be the fraction of global energy consumed by the country and the fraction of energy use consumed by that appliance type. The color coding would then indicate whether or not that appliance is regulated in that country. Any ideas how to make such a chart, or even what it might be called? Neal Humphrey nhumph...@clasponline.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hello R User
HI, Try this: dat1-read.table(text= ID Time 1 3 1 6 1 7 1 10 1 16 2 12 2 18 2 19 2 25 2 28 2 30 ,sep=,header=TRUE) dat1$Time1-ave(dat1$Time,dat1$ID,FUN=function(x) c(0,diff(x))) head(dat1,3) # ID Time Time1 #1 1 3 0 #2 1 6 3 #3 1 7 1 #or dat2-unsplit(lapply(split(dat1,dat1$ID),function(x) {x$Time-c(0,diff(x[,2])); return(x)}),dat1$ID) head(dat2,3) # ID Time #1 1 0 #2 1 3 #3 1 1 A.K. - Original Message - From: bibek sharma mbhpat...@gmail.com To: R help r-help@r-project.org Cc: Sent: Friday, December 14, 2012 10:51 AM Subject: [R] Hello R User Hello R User, In the sample data given below, time is recorded for each id subsequently. For the analysis, for each id, I would like to set 1st recorded time to zero and thereafter find the difference from previous time. I.e. for ID==1, I would like to see Time=0,3,1,3,6. This needs to be implemented to big data set. Any suggestions are much appreciated! Thanks, Bibek ID Time 1 3 1 6 1 7 1 10 1 16 2 12 2 18 2 19 2 25 2 28 2 30 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hello R User
dataset-data.frame(id=c(1,1,2,3,3,3),time=c(3,5,1,2,4,6)) dataset id time 1 13 2 15 3 21 4 32 5 34 6 36 ids-unique(dataset$id) for(id in ids){ + dataset$time[dataset$id==id]-c(0,diff(dataset$time[dataset$id==id])) + } dataset id time 1 10 2 12 3 20 4 30 5 32 6 32 might not be the fastest though. On 14.12.2012, at 16:51, bibek sharma wrote: Hello R User, In the sample data given below, time is recorded for each id subsequently. For the analysis, for each id, I would like to set 1st recorded time to zero and thereafter find the difference from previous time. I.e. for ID==1, I would like to see Time=0,3,1,3,6. This needs to be implemented to big data set. Any suggestions are much appreciated! Thanks, Bibek IDTime 1 3 1 6 1 7 1 10 1 16 2 12 2 18 2 19 2 25 2 28 2 30 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hello R User
Hi Bibek, how about this? dta-read.table(textConnection(ID Time 1 3 1 6 1 7 1 10 1 16 2 12 2 18 2 19 2 25 2 28 2 30),header=T) dta$delta-with(dta,ave(Time,ID,FUN=function(x)c(0,diff(x dta hth. Am 14.12.2012 16:51, schrieb bibek sharma: Hello R User, In the sample data given below, time is recorded for each id subsequently. For the analysis, for each id, I would like to set 1st recorded time to zero and thereafter find the difference from previous time. I.e. for ID==1, I would like to see Time=0,3,1,3,6. This needs to be implemented to big data set. Any suggestions are much appreciated! Thanks, Bibek IDTime 1 3 1 6 1 7 1 10 1 16 2 12 2 18 2 19 2 25 2 28 2 30 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting from a matrix, where column 1 is your x value
Hello, There are several ways of doing this, perhaps the easiest is with ?matplot. You should provide some data example, like the posting guide says. Since you haven't, I've made up some. y - replicate(3, cumsum(rnorm(10))) x - matrix(c(1:10, y), ncol = 4) matplot(x[, 1], x[, -1], type = l) Hope this helps, Rui Barradas Em 14-12-2012 11:28, masepot escreveu: Hello, I'm very new to R, and have managed to plot xy graphs, but can't seem to plot a matrix properly. The first column is my time points, and the following columns 2-4 are the replicates of my experiment. I've tried using row.names=1 to make the first column the value for that row (whereas before I had 1-31 sequence) and then my 1st column) but I can't work out how to tell it that column 1 of my dataset is the xvalue for the data in that row? It's probably really simple, but I can't seem to figure it out! Many thanks, any help appreciated! Laura -- View this message in context: http://r.789695.n4.nabble.com/Plotting-from-a-matrix-where-column-1-is-your-x-value-tp4653076.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting log regression
Hello, How can you expect to see the fit line if you are ploting x and y values, not their logarithms? And your definitions of x and y are wrong, they should use c(). x - c(1,2,3,4,5) y - c(6,7,8,9,10) plot(log(x), log(y)) fit - lm(log(y) ~ log(x)) # Same as glm abline(fit) Please read R-intro, in the doc folder of your installation of R. Hope this helps, Rui Barradas Em 14-12-2012 13:58, mrkooper escreveu: I want to plot the regression line of a log regression into a plot with my normal, nonlog, data. for example x - (1,2,3,4,5) y - (6,7,8,9,10) plot (x,y) I tried a log-regression by a - glm (log(y) ~ log(x)) and then i tried to insert the answer to my graph, where the standard values are shown: abline (a, col=red) unfortunately it does not work -- View this message in context: http://r.789695.n4.nabble.com/plotting-log-regression-tp4653087.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Define a custom-built loss function and combine it with nls()
Inline. Note: My comments subject to confirmation by true experts. On Thu, Dec 13, 2012 at 11:09 PM, Christiana Anagnostou canagnos...@zoologie.uni-kiel.de wrote: Dear R helpers, For an allometric analysis (allometric equation y = a*x^b) I would like to apply a non-linear regression instead of using log-log transformations of the measured parameters x and y and a Model II linear regression. Since both of the variables x and y are random, I would like to apply a Model II non-linear analog of either Reduced Major Axis or Major Axis Regression. The allometric equation for the non-linear regression model would be: y = a*x^b, with y as the dimension of a body part (dependent variable) and x a dimension of body size (independent variable). With applying a non-linear regression I would like to get estimates for the parameters a and b. Fortunately, I already have a recommended loss function, proposed by Ebert Russel 1994, J. Theor. Biol.: loss = abs(y*abs(x-(y/a)^(1/b))-a/(b+1)*abs(x^(b+1)-(y/a)^((b+1)/b))) How may I define the loss function above and combine it with a non-linear regression nls() in R? -- Doubt that you can. You'll need to use an explicit optimization function. See ?optim and the CRAN Optimization task view. Note that you cannot use a gradient-based optimizer as your function is not continuously differentiable (due to abs()). -- Bert Example for a data frame with the measured variables x and y: x-1:10 y-c(0.3,0.7,0.9,1.3,2.0,2.4,3.3,3.8,5.0,5.8) d-data.frame(x,y) Non-linear regression with inbuilt loss-function (sum of squared residuals): nlmodel-nls(y~a*x^b,data=d,start=list(a=1,b=0.1),trace=T) Many thanks!!! Christiana -- Christiana Anagnostou * Department of Evolutionary Ecology and Genetics Zoological Institute Christian-Albrechts-University of Kiel Am Botanischen Garten 9 24118 Kiel Germany Tel: +49-431-8804145 (work) Tel: +49-431-5709649 (home) Fax: +49-431-8802403 Email: canagnos...@zoologie.uni-kiel.de Web: www.uni-kiel.de/zoologie/evoecogen/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] COZIGAM removed from CRAN
On 14.12.2012 16:55, Raeanne Miller wrote: Hi All, Another quick question - I noticed that COZIGAM has been removed from CRAN, and that you are referred to the archive for previous versions (last updated 23 July 2012). Is this package still ok to use? Or is there an alternative which might also fit zero-inflated GAMs? Please ask the package maintainer for details. The CRAN team decides to archive packages on maintainer request, if the package appears to be unmaintained (no reaction to CRAN requests), if problems in the packages are not fixed after given deadlines passed, or possibly also for other reasons. Best, Uwe Ligges Thanks, Raeanne The Scottish Association for Marine Science (SAMS) is registered in Scotland as a Company Limited by Guarantee (SC009292) and is a registered charity (9206). SAMS has an actively trading wholly owned subsidiary company: SAMS Research Services Ltd a Limited Company (SC224404). All Companies in the group are registered in Scotland and share a registered office at Scottish Marine Institute, Oban Argyll PA37 1QA. The content of this message may contain personal views which are not the views of SAMS unless specifically stated. Please note that all email traffic is monitored for purposes of security and spam filtering. As such individual emails may be examined in more detail. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combined Marimekko/heatmap
Thomas, This is a big help for getting me started. Brand new to R, so I'm unfamiliar with how to 'manually' drawing graphs (instead of packages). The graph your code makes is more like a Marimekko chart. What I'm thinking of is like a heatmap but each row has a different width, and each column has a different width. But, for any particular column, the width is the same all the way down. I used your code to figure out how to draw lines on a chart the way I need this to look. Now what I need to figure out is how to add color coding to the respective squares. In my example it's binary data (yes/no), but a more robust approach would allow for a true heatmap. Any help on getting this to the next step of color coding would be much appreciated! The code below will draw a grid that is scaled per the sample data on the x and y axes. Perhaps what is needed is to draw boxes rather than lines to make the grid? I'm not sure. Neal par(mar=c(1,1,1,1), oma = c(0,0,0,0), mgp=c(1.5,.2,0), tcl=0, cex.axis=.75, col.axis=black, pch=16) #Input the sample data--- Z - textConnection( country TWh Australia 244 Canada 522 China 3503 EU 2500 India 689.5 Japan 997 USA 3960 ) CountryEnergy - read.table(Z,header=TRUE) close(Z) Z - textConnection( Product TWh Freezer 72.4 Refrigerator 379 HVAC 466 Lighting 123 Television 152 ) ProductEnergy - read.table(Z,header=TRUE) close(Z) #-Binary data indicating whether that country/product combination has a standard #-Rows correspond to countries (in the same order as the CountryEnergy matrix)-- #-Columns correspond to products (in the same order as the ProductEnergy matrix) Z - textConnection( country TWh 0 1 0 0 1 0 1 1 1 0 1 0 1 0 0 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1 1 0 0 0 0 ) ddd - read.table(Z,header=FALSE) close(Z) #---rewrite the data table so that the vector is numbers only, and label the rows-- row.names(CountryEnergy) - CountryEnergy$Country CountryEnergy-CountryEnergy[,2:2] row.names(ProductEnergy) - ProductEnergy$Product ProductEnergy - ProductEnergy[,2:2] #---plot the grid plot.new() plot.window(ylim=c(0,sum(CountryEnergy)),xlim=c(0,sum(ProductEnergy)),xaxs = 'i',yaxs='i',las=1) box() abline(h = cumsum(CountryEnergy),lwd=2, col=gray60) #lwd - line width abline(v = cumsum(ProductEnergy), lwd=2, col=gray60) labxats - NULL #--Use ddd data to code the cells as yes/no for having a standard- # # # this is the part I need help with # # # #-- Neal Humphrey nhumph...@clasponline.org From: Thomas Stewart [mailto:t...@live.unc.edu] Sent: Friday, December 14, 2012 10:36 AM To: Neal Humphrey Cc: r-help@r-project.org Subject: Re: [R] Combined Marimekko/heatmap Neal- Perhaps the following code is a start for what you want. -tgs par(mar=c(1,1,1,1), oma = c(0,0,0,0), mgp=c(1.5,.2,0), tcl=0, cex.axis=.75, col.axis=black, pch=16) Z - textConnection( country A1 A2 A3 A 3 4 5 B 6 9 8 C 6 9 5) ddd - read.table(Z,header=TRUE) close(Z) CountryPcts - rowSums(ddd[,-1]) / sum(ddd[,-1]) plot.new() plot.window(ylim=c(0,1),xlim=c(0,1),xaxs = 'i',yaxs='i',las=1) box() abline(h = cumsum(CountryPcts),lwd=2) labxats - NULL vlines - ddd[,-1] / sum(ddd[,-1]) / CountryPcts vlines - t(apply(vlines,1,cumsum)) yyy - c(0,rep(cumsum(CountryPcts),each=2)) yyy - head(yyy,-1) for(i in 1:nrow(ddd) ){ xxx - rep(vlines[,i],each=2) lines(xxx,yyy,col=red,lwd=3) labxats[i] - rev(xxx)[1] } labxats - (labxats + c(0,head(labxats,-1)))/2 labyats - (cumsum(CountryPcts) + c(0,head(cumsum(CountryPcts),-1)))/2 axis(2,at=labyats,labels = ddd[,1],las=1 ) axis(3,at=labxats,labels = colnames(ddd)[-1],las=1 ) On Thu, Dec 13, 2012 at 6:09 PM, Neal Humphrey nhumph...@clasponline.org wrote: Hi all, I'm trying to figure out a way to create a data graphic that I haven't ever seen an example of before, but hopefully there's an R package out there for it. The idea is to essentially create a heatmap, but to allow each column and/or row to be a different width, rather than having uniform column and row height. This is sort of like a Marimekko chart in appearance, except that rather than use a single color to represent the category, the color represents a value and all the y-axis heights in each column line up with each other. That way color represents one variable, while the area of the cell represents another. In my application, my heatmap has discrete categorical data rather than continuous. Rows are countries, columns are appliances, and I want to scale the width and height of each column to be the fraction of global energy consumed by the country and the fraction of energy use consumed by that appliance type. The color coding would then indicate whether or not that appliance is regulated in that
Re: [R] Rprof causing R to crash
On 14.12.2012 14:22, Jon Olav Skoien wrote: Uwe, I am unfortunately not able to upgrade to R 2.15.2 right now, but I have seen a similar problem with several older R versions. If you want to test with a shorter script, you can try the lines below. These provoke a crash from a fresh R session on my machine (R 2.15.1 Windows 7): Rprof() z = 1 for (i in 1:1e8) z = z+1/i This runs without problems without the call to Rprof(). I was far from using all my memory with this call, but you could also check with a higher number of loops in case you can run these lines and it is somehow memory related. Just for comparison, I tried 1e9 loops after a call to Rprof() on one of our Linux servers (R 2.14.0) without any problems. Works for me. Uwe Jon On 12-Dec-12 17:06, Uwe Ligges wrote: On 12.12.2012 00:05, Marian Talbert wrote: I'm trying to use Rprof() to identify bottlenecks and speed up a particullary slow section of code which reads in a portion of a tif file and compares each of the values to values of predictors used for model fitting. I've written up an example that anyone can run. Generally temp would be a section of a tif read into a data.frame and used later for other processing. The first portion which just records the time works in about 6 seconds the second part causes RGui to immediately close with no error or warning. Any advice on how to get Rprof to work or how to speed up this code would be greatly appreciated. I'm using Windows 7 (which might be my problem) and R version 2.15.0. The problem is rather the R version: I cannot reproduce errors with a recent R. Uwe Ligges CalcMESS-function(tiff.entry,pred.vect){ f-sum(pred.vecttiff.entry)/length(pred.vect)*100 if(is.na(f)) return(NA) if(f==0) return((tiff.entry-min(pred.vect))/(max(pred.vect)-min(pred.vect))*100) if(0f f=50) return(2*f) if(50=f f100) return(2*(100-f)) if(f==100) return((max(pred.vect)-tiff.entry)/(max(pred.vect)-min(pred.vect))*100) else return(NA) } train.dat - data.frame(a=runif(200),b=runif(200),c=runif(200),d=runif(200)) temp - data.frame(a=runif(13),b=runif(13),c=runif(13),d=runif(13)) pred.rng-temp vnames.final.mod - names(train.dat) nvars.final - length(vnames.final.mod) start.time-Sys.time() for(k in 1:nvars.final){ pred.range-train.dat[,match(vnames.final.mod[k],names(train.dat))] pred.rng[,k]-mapply(CalcMESS,tiff.entry=temp[,match(vnames.final.mod[k],names(temp))],MoreArgs=list(pred.vect=pred.range)) } Sys.time()-start.time Rprof(C:\\temp\\mapply.out) for(k in 1:nvars.final){ pred.range-train.dat[,match(vnames.final.mod[k],names(train.dat))] pred.rng[,k]-mapply(CalcMESS,tiff.entry=temp[,match(vnames.final.mod[k],names(temp))],MoreArgs=list(pred.vect=pred.range)) } Rprof(NULL) -- View this message in context: http://r.789695.n4.nabble.com/Rprof-causing-R-to-crash-tp4652846.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rprof causing R to crash
On 14/12/2012 13:22, Jon Olav Skoien wrote: Uwe, I am unfortunately not able to upgrade to R 2.15.2 right now, but I have Why not? Note that is part of the R-help contract: we only offer any support for the current version of R (see the posting guide). The posting guide also asked for 'at a minimum' information: we do not know (amongst other things) if this is 32- or 64-bit R. (Your example works for me in both.) seen a similar problem with several older R versions. If you want to test with a shorter script, you can try the lines below. These provoke a crash from a fresh R session on my machine (R 2.15.1 Windows 7): Rprof() z = 1 for (i in 1:1e8) z = z+1/i This runs without problems without the call to Rprof(). I was far from using all my memory with this call, but you could also check with a higher number of loops in case you can run these lines and it is somehow memory related. Just for comparison, I tried 1e9 loops after a call to Rprof() on one of our Linux servers (R 2.14.0) without any problems. Which is irrelevant: a different version of R and a completely different implementation of Rprof. Jon On 12-Dec-12 17:06, Uwe Ligges wrote: On 12.12.2012 00:05, Marian Talbert wrote: I'm trying to use Rprof() to identify bottlenecks and speed up a particullary slow section of code which reads in a portion of a tif file and compares each of the values to values of predictors used for model fitting. I've written up an example that anyone can run. Generally temp would be a section of a tif read into a data.frame and used later for other processing. The first portion which just records the time works in about 6 seconds the second part causes RGui to immediately close with no error or warning. Any advice on how to get Rprof to work or how to speed up this code would be greatly appreciated. I'm using Windows 7 (which might be my problem) and R version 2.15.0. The problem is rather the R version: I cannot reproduce errors with a recent R. Uwe Ligges CalcMESS-function(tiff.entry,pred.vect){ f-sum(pred.vecttiff.entry)/length(pred.vect)*100 if(is.na(f)) return(NA) if(f==0) return((tiff.entry-min(pred.vect))/(max(pred.vect)-min(pred.vect))*100) if(0f f=50) return(2*f) if(50=f f100) return(2*(100-f)) if(f==100) return((max(pred.vect)-tiff.entry)/(max(pred.vect)-min(pred.vect))*100) else return(NA) } train.dat - data.frame(a=runif(200),b=runif(200),c=runif(200),d=runif(200)) temp - data.frame(a=runif(13),b=runif(13),c=runif(13),d=runif(13)) pred.rng-temp vnames.final.mod - names(train.dat) nvars.final - length(vnames.final.mod) start.time-Sys.time() for(k in 1:nvars.final){ pred.range-train.dat[,match(vnames.final.mod[k],names(train.dat))] pred.rng[,k]-mapply(CalcMESS,tiff.entry=temp[,match(vnames.final.mod[k],names(temp))],MoreArgs=list(pred.vect=pred.range)) } Sys.time()-start.time Rprof(C:\\temp\\mapply.out) for(k in 1:nvars.final){ pred.range-train.dat[,match(vnames.final.mod[k],names(train.dat))] pred.rng[,k]-mapply(CalcMESS,tiff.entry=temp[,match(vnames.final.mod[k],names(temp))],MoreArgs=list(pred.vect=pred.range)) } Rprof(NULL) -- View this message in context: http://r.789695.n4.nabble.com/Rprof-causing-R-to-crash-tp4652846.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A question on list and lapply
Dear all, let say I have following list: Dat - vector(list, length = 26) names(Dat) - LETTERS My_Function - function(x) return(rnorm(5)) Dat1 - lapply(Dat, My_Function) However I want to apply my function 'My_Function' for all elements of 'Dat' except the elements having 'names(Dat) == P'. Here I have specified the name P just for illustration however this will be some name specified by user. Is there any direct way to achieve this, using 'lapply'? Thanks for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A question on list and lapply
What about: lapply(Dat[names(Dat) != P], My_Function) You could use %in% if you actually want to match a longer set of names. Sarah On Fri, Dec 14, 2012 at 1:58 PM, Christofer Bogaso bogaso.christo...@gmail.com wrote: Dear all, let say I have following list: Dat - vector(list, length = 26) names(Dat) - LETTERS My_Function - function(x) return(rnorm(5)) Dat1 - lapply(Dat, My_Function) However I want to apply my function 'My_Function' for all elements of 'Dat' except the elements having 'names(Dat) == P'. Here I have specified the name P just for illustration however this will be some name specified by user. Is there any direct way to achieve this, using 'lapply'? Thanks for your help. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A question on list and lapply
How about Dat1 - lapply(subset(Dat, Dat!=P), My_Function) --Mark From: Christofer Bogaso bogaso.christo...@gmail.com To: r-help@r-project.org Sent: Friday, December 14, 2012 1:58 PM Subject: [R] A question on list and lapply Dear all, let say I have following list: Dat - vector(list, length = 26) names(Dat) - LETTERS My_Function - function(x) return(rnorm(5)) Dat1 - lapply(Dat, My_Function) However I want to apply my function 'My_Function' for all elements of 'Dat' except the elements having 'names(Dat) == P'. Here I have specified the name P just for illustration however this will be some name specified by user. Is there any direct way to achieve this, using 'lapply'? Thanks for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hello R User
Hi, You could also use library(data.table) to do this faster. dat1-read.table(text= ID Time 1 3 1 6 1 7 1 10 1 16 2 12 2 18 2 19 2 25 2 28 2 30 ,sep=,header=TRUE) library(data.table) dat2-data.table(dat1) res-dat2[,Time1:=c(0,diff(Time)),by=ID] head(res,3) # ID Time Time1 #1: 1 3 0 #2: 1 6 3 #3: 1 7 1 #Comparing different approaches: set.seed(55) dat3- data.frame(ID=rep(1:1000,each=500),Value=sample(1:800,5e5,replace=TRUE)) dat4-data.table(dat3) system.time(dat3$Value1-ave(dat3$Value,dat3$ID,FUN=function(x) c(0,diff(x # user system elapsed # 0.312 0.000 0.313 ids-unique(dat3$ID) system.time({ for(id in ids){ dat3$Value[dat3$ID==id]-c(0,diff(dat3$Value[dat3$ID==id])) } }) # user system elapsed # 36.938 0.868 37.873 system.time(dat5-dat4[,Value1:=c(0,diff(Value)),by=ID]) # user system elapsed # 0.036 0.000 0.037 head(dat5) # ID Value Value1 #1: 1 439 0 #2: 1 175 -264 #3: 1 28 -147 #4: 1 634 606 #5: 1 449 -185 #6: 1 60 -389 head(dat3) # ID Value Value1 #1 1 0 0 #2 1 -264 -264 #3 1 -147 -147 #4 1 606 606 #5 1 -185 -185 #6 1 -389 -389 A.K. - Original Message - From: bibek sharma mbhpat...@gmail.com To: R help r-help@r-project.org Cc: Sent: Friday, December 14, 2012 10:51 AM Subject: [R] Hello R User Hello R User, In the sample data given below, time is recorded for each id subsequently. For the analysis, for each id, I would like to set 1st recorded time to zero and thereafter find the difference from previous time. I.e. for ID==1, I would like to see Time=0,3,1,3,6. This needs to be implemented to big data set. Any suggestions are much appreciated! Thanks, Bibek ID Time 1 3 1 6 1 7 1 10 1 16 2 12 2 18 2 19 2 25 2 28 2 30 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hello R User
HI, Try: dat2[,Time1:=c(0,diff(Time)),by=ID] dat2[,CumSTime1:=cumsum(Time1),by=ID] head(dat2,4) # ID Time Time1 CumSTime1 #1: 1 3 0 0 #2: 1 6 3 3 #3: 1 7 1 4 #4: 1 10 3 7 A.K. - Original Message - From: bibek sharma mbhpat...@gmail.com To: arun smartpink...@yahoo.com Cc: Sent: Friday, December 14, 2012 12:56 PM Subject: Re: [R] Hello R User Hi Arun, Great! Once we get, time1, I again wanna add time to its previous value for example, wanna get 0,3 4 etc... can I have suggestion? On Fri, Dec 14, 2012 at 9:42 AM, arun smartpink...@yahoo.com wrote: Hi, You could also use library(data.table) to do this faster. dat1-read.table(text= ID Time 1 3 1 6 1 7 1 10 1 16 2 12 2 18 2 19 2 25 2 28 2 30 ,sep=,header=TRUE) library(data.table) dat2-data.table(dat1) res-dat2[,Time1:=c(0,diff(Time)),by=ID] head(res,3) # ID Time Time1 #1: 1 3 0 #2: 1 6 3 #3: 1 7 1 #Comparing different approaches: set.seed(55) dat3- data.frame(ID=rep(1:1000,each=500),Value=sample(1:800,5e5,replace=TRUE)) dat4-data.table(dat3) system.time(dat3$Value1-ave(dat3$Value,dat3$ID,FUN=function(x) c(0,diff(x # user system elapsed # 0.312 0.000 0.313 ids-unique(dat3$ID) system.time({ for(id in ids){ dat3$Value[dat3$ID==id]-c(0,diff(dat3$Value[dat3$ID==id])) } }) # user system elapsed # 36.938 0.868 37.873 system.time(dat5-dat4[,Value1:=c(0,diff(Value)),by=ID]) # user system elapsed # 0.036 0.000 0.037 head(dat5) # ID Value Value1 #1: 1 439 0 #2: 1 175 -264 #3: 1 28 -147 #4: 1 634 606 #5: 1 449 -185 #6: 1 60 -389 head(dat3) # ID Value Value1 #1 1 0 0 #2 1 -264 -264 #3 1 -147 -147 #4 1 606 606 #5 1 -185 -185 #6 1 -389 -389 A.K. - Original Message - From: bibek sharma mbhpat...@gmail.com To: R help r-help@r-project.org Cc: Sent: Friday, December 14, 2012 10:51 AM Subject: [R] Hello R User Hello R User, In the sample data given below, time is recorded for each id subsequently. For the analysis, for each id, I would like to set 1st recorded time to zero and thereafter find the difference from previous time. I.e. for ID==1, I would like to see Time=0,3,1,3,6. This needs to be implemented to big data set. Any suggestions are much appreciated! Thanks, Bibek ID Time 1 3 1 6 1 7 1 10 1 16 2 12 2 18 2 19 2 25 2 28 2 30 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-hpc] recursion depth limitations
On 14 December 2012 13:46, Simon Urbanek simon.urba...@r-project.org wrote: You may be a bit misinformed about with tail recursion means - it still needs to evaluate the function for each recursion step, the only difference is that in such special case there is no additional information that needs to be stored -- and you have also proven why it's not as simple as you think: Thank you Simon for your time and patient reply. I was thinking calling another function over again is called indirect recursion. f - function(x) if(x 0) f(x-1) (f(100)) NULL f1-function(x) x-1 f - function(x) f1(x); (f(100)) [1] 99 Your latter version doesn't actually do anything anywhere close to the recursion you defined. I see what you mean. I was confused. My intention was having 100 calls in the second one with f1 as well. Maybe I can make the point with using another example via mutual recursion, 'fibRping/pong' below. But I was thinking 'fibRping' version would need less memory then the plain recursion, which I now understand that I was mistaken. Every call counts then. Even though I had an impression that calling the same function more then once like fibRping/pong would not create additional memory requirement ... (Probably the issue is noting to do with Rs internals though.. Using Dirk's blog entry (http://dirk.eddelbuettel.com/blog/2011/09/08/) : ## R implementation of recursive Fibonacci sequence fibR - function(n) { if (n == 0) return(0) if (n == 1) return(1) return (fibR(n - 1) + fibR(n - 2)) } # Now Mutual recursion fibRping - function(n) { if (n == 0) return(0) if (n == 1) return(1) return (fibRpong(n - 1) + fibRpong(n - 2)) } fibRpong - function(n) { if (n == 0) return(0) if (n == 1) return(1) return (fibRping(n - 1) + fibRping(n - 2)) } options(expressions=50) fibR(5) options(expressions=50) fibR(5) Error: C stack usage is too close to the limit fibRping(5) Error: C stack usage is too close to the limit __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] format.pval () and printCoefmat ()
Hi List, My goal is to force R not to print in scientific notation in the sixth column (rel_diff - for the p-value) of my data frame (not a matrix). I have used the format.pval () and printCoefmat () functions on the data frame. The R script is appended below. This issue is that use of the format.pval () and printCoefmat () functions on the data frame gives me the desired results, but coerces the character string into NAs for the two character variables, because my object is a data frame, not a matrix. Please see the first output below: contrast_level1 contrast_level2). Is there a way I could have avoid printing the NAs in the character fields when using the format.pval () and printCoefmat () on the data frame? I would appreciate receiving your help. Thanks, Pradip setwd (F:/PR1/R_PR1) load (file = sigtests_overall_withid.rdata) #format.pval(tt$p.value, eps=0.0001) # keep only selected columns from the above data frame keep_cols1 - c(contrast_level1, contrast_level2,mean_level1, mean_level2, rel_diff, p_mean, cohens_d) #subset the data frame y0410_1825_mf_alc - subset (sigtests_overall_withid, years==0410 age_group==1825 gender_group==all drug==alc contrast_level1==wh, select=keep_cols1) #change the row.names row.names (y0410_1825_mf_alc)= 1:dim(y0410_1825_mf_alc)[1] #force format.pval(y0410_1825_mf_alc$p_mean, eps=0.0001) #print the observations from the sub-data frame options (width=120,digits=3 ) #y0410_1825_mf_alc printCoefmat(y0410_1825_mf_alc, has.Pvalue=TRUE, eps.Pvalue=0.0001) ### When format.pval () and printCoefmat () used contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean cohens_d 1 NA NA 18.744 11.9110.574 0.00 0.175 2 NA NA 18.744 14.4550.297 0.00 0.110 3 NA NA 18.744 13.5400.384 0.00 0.133 4 NA NA 18.744 6.0022.123 0.00 0.333 5 NA NA 18.744 5.8342.213 0.00 0.349 6 NA NA 18.744 7.9331.363 0.00 0.279 7 NA NA 18.744 10.8490.728 0.00 0.203 8 NA NA 18.744 7.1301.629 0.00 0.298 9 NA NA 18.744 9.7200.928 0.00 0.242 10 NA NA 18.744 9.6000.952 0.00 0.242 11 NA NA 18.744 16.1350.162 0.17 0.067 . 12 NA NA 18.744 NA NA NA NA 13 NA NA 18.744 10.4650.791 0.00 0.213 14 NA NA 18.744 15.1490.237 0.02 0.092 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Warning messages: 1: In data.matrix(x) : NAs introduced by coercion 2: In data.matrix(x) : NAs introduced by coercion ### When format.pval () and printCoefmat () not used contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diffp_mean cohens_d 1 wh2+hi18.7 11.910.574 1.64e-05 0.1753 2 wh2+rc18.7 14.460.297 9.24e-06 0.1101 3 whaian18.7 13.540.384 9.01e-05 0.1335 4 whasan18.76.002.123 2.20e-119 0.3326 5 whblck18.75.832.213 0.00e+00 0.3490 6 whcsam18.77.931.363 1.27e-47 0.2793 7 wh cub18.7 10.850.728 6.12e-08 0.2025 8 whdmcn18.77.131.629 1.59e-15 0.2981 9 whhisp18.79.720.928 3.27e-125 0.2420 10 wh mex18.79.600.952 8.81e-103 0.2420 11 whnhpi18.7 16.140.162 1.74e-01 0.0669 12 whothh18.7 NA NANA NA 13 wh pr18.7 10.470.791 3.64e-23 0.2131 14 wh spn18.7 15.150.237 1.58e-02 0.0922 Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health
Re: [R] format.pval () and printCoefmat ()
On Dec 14, 2012, at 11:48 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hi List, My goal is to force R not to print in scientific notation in the sixth column (rel_diff - for the p-value) of my data frame (not a matrix). I have used the format.pval () and printCoefmat () functions on the data frame. The R script is appended below. This issue is that use of the format.pval () and printCoefmat () functions on the data frame gives me the desired results, but coerces the character string into NAs for the two character variables, because my object is a data frame, not a matrix. Please see the first output below: contrast_level1 contrast_level2). Is there a way I could have avoid printing the NAs in the character fields They are probably factor columns. when using the format.pval () and printCoefmat () on the data frame? I would appreciate receiving your help. Thanks, Pradip setwd (F:/PR1/R_PR1) load (file = sigtests_overall_withid.rdata) #format.pval(tt$p.value, eps=0.0001) # keep only selected columns from the above data frame keep_cols1 - c(contrast_level1, contrast_level2,mean_level1, mean_level2, rel_diff, p_mean, cohens_d) #subset the data frame y0410_1825_mf_alc - subset (sigtests_overall_withid, years==0410 age_group==1825 gender_group==all drug==alc contrast_level1==wh, select=keep_cols1) #change the row.names row.names (y0410_1825_mf_alc)= 1:dim(y0410_1825_mf_alc)[1] #force format.pval(y0410_1825_mf_alc$p_mean, eps=0.0001) Presumably that call will produce desired results since it is on only one column. (I'm not sure why you think format.pval contributed to your NA output.) #print the observations from the sub-data frame options (width=120,digits=3 ) #y0410_1825_mf_alc printCoefmat(y0410_1825_mf_alc, has.Pvalue=TRUE, eps.Pvalue=0.0001) Why not use `cbind.data.frame` rather than trying to get `printCoefmat` to do something it (apparently) wasn't designed to do? cbind( y0410_1825_mf_alc[ 1:2], printCoefmat(y0410_1825_mf_alc[ -(1:2) ], has.Pvalue=TRUE, eps.Pvalue=0.0001) ) -- David. ### When format.pval () and printCoefmat () used contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean cohens_d 1 NA NA 18.744 11.9110.574 0.00 0.175 2 NA NA 18.744 14.4550.297 0.00 0.110 3 NA NA 18.744 13.5400.384 0.00 0.133 4 NA NA 18.744 6.0022.123 0.00 0.333 5 NA NA 18.744 5.8342.213 0.00 0.349 6 NA NA 18.744 7.9331.363 0.00 0.279 7 NA NA 18.744 10.8490.728 0.00 0.203 8 NA NA 18.744 7.1301.629 0.00 0.298 9 NA NA 18.744 9.7200.928 0.00 0.242 10 NA NA 18.744 9.6000.952 0.00 0.242 11 NA NA 18.744 16.1350.162 0.17 0.067 . 12 NA NA 18.744 NA NA NA NA 13 NA NA 18.744 10.4650.791 0.00 0.213 14 NA NA 18.744 15.1490.237 0.02 0.092 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Warning messages: 1: In data.matrix(x) : NAs introduced by coercion 2: In data.matrix(x) : NAs introduced by coercion ### When format.pval () and printCoefmat () not used contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diffp_mean cohens_d 1 wh2+hi18.7 11.910.574 1.64e-05 0.1753 2 wh2+rc18.7 14.460.297 9.24e-06 0.1101 3 whaian18.7 13.540.384 9.01e-05 0.1335 4 whasan18.76.002.123 2.20e-119 0.3326 5 whblck18.75.832.213 0.00e+00 0.3490 6 whcsam18.77.931.363 1.27e-47 0.2793 7 wh cub18.7 10.850.728 6.12e-08 0.2025 8 whdmcn18.77.131.629 1.59e-15 0.2981 9 whhisp18.79.720.928 3.27e-125 0.2420 10 wh mex18.79.600.952 8.81e-103 0.2420 11 whnhpi18.7 16.140.162 1.74e-01 0.0669 12 wh
Re: [R] A question on list and lapply
Hi, If you want the list element P to be present as NULL in the result you could use this: set.seed(51) lapply(lapply(Dat,My_Function),function(x) {if(names(Dat)[match.call()[[2]][[3]]]%in% P) NULL else x}) A.K. - Original Message - From: Christofer Bogaso bogaso.christo...@gmail.com To: r-help@r-project.org Cc: Sent: Friday, December 14, 2012 1:58 PM Subject: [R] A question on list and lapply Dear all, let say I have following list: Dat - vector(list, length = 26) names(Dat) - LETTERS My_Function - function(x) return(rnorm(5)) Dat1 - lapply(Dat, My_Function) However I want to apply my function 'My_Function' for all elements of 'Dat' except the elements having 'names(Dat) == P'. Here I have specified the name P just for illustration however this will be some name specified by user. Is there any direct way to achieve this, using 'lapply'? Thanks for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A question on list and lapply
lapply(lapply(Dat,My_Function),function(x) {if(names(Dat)[match.call()[[2]][[3]]]%in% P) NULL else x}) match.call()[[2]][[3]], gack! In lapply(X, FUN), FUN is applied to X[[i]], which has lost the names attribute that X may have had. X[i] retains a part of the names attribute (since it is a sublist of X, not an element of X). Hence FUN can look at the name associated with X[i] with code like the following: lapply(seq_along(X), FUN=function(i) { Xi - X[i] ; names(Xi) }) E.g., to apply one sort of processing to elements named P and another sort to those not named P you can do: bases - list(O=Oak Harbor,P=Pensicola,Q=Quonset Point) lapply(seq_along(bases), function(i){ base - bases[i] ; if (names(base) != P) paste0((,base,)) else tolower(base) } ) [[1]] [1] (Oak Harbor) [[2]] [1] pensicola [[3]] [1] (Quonset Point) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of arun Sent: Friday, December 14, 2012 12:11 PM To: Christofer Bogaso Cc: R help Subject: Re: [R] A question on list and lapply Hi, If you want the list element P to be present as NULL in the result you could use this: set.seed(51) lapply(lapply(Dat,My_Function),function(x) {if(names(Dat)[match.call()[[2]][[3]]]%in% P) NULL else x}) A.K. - Original Message - From: Christofer Bogaso bogaso.christo...@gmail.com To: r-help@r-project.org Cc: Sent: Friday, December 14, 2012 1:58 PM Subject: [R] A question on list and lapply Dear all, let say I have following list: Dat - vector(list, length = 26) names(Dat) - LETTERS My_Function - function(x) return(rnorm(5)) Dat1 - lapply(Dat, My_Function) However I want to apply my function 'My_Function' for all elements of 'Dat' except the elements having 'names(Dat) == P'. Here I have specified the name P just for illustration however this will be some name specified by user. Is there any direct way to achieve this, using 'lapply'? Thanks for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A question on list and lapply
Yes, my example should have either applied lapply to structure(seq_along(bases), names=names(bases)) instead of just seq_along(bases) or added the names(bases) to the ouput of lapply (assuming one wanted the names of 'bases' on the output). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Friday, December 14, 2012 1:37 PM To: William Dunlap Cc: R help Subject: Re: [R] A question on list and lapply HI, By applying: bases #$O #[1] Oak Harbor #$P #[1] Pensicola # #$Q #[1] Quonset Point res2- lapply(bases,function(x) {if(names(bases)[match.call()[[2]][[3]]]%in% P) tolower(x) else paste0((,x,))}) res2 #$O #[1] (Oak Harbor) # #$P #[1] pensicola # #$Q #[1] (Quonset Point) #the names of the list elements are not lost unless, it is named again. res1-lapply(seq_along(bases), function(i){ base - bases[i] ; if (names(base) != P) paste0((,base,)) else tolower(base) } ) names(res1) #NULL names(res2) #[1] O P Q A.K. - Original Message - From: William Dunlap wdun...@tibco.com To: arun smartpink...@yahoo.com; Christofer Bogaso bogaso.christo...@gmail.com Cc: R help r-help@r-project.org Sent: Friday, December 14, 2012 3:59 PM Subject: RE: [R] A question on list and lapply lapply(lapply(Dat,My_Function),function(x) {if(names(Dat)[match.call()[[2]][[3]]]%in% P) NULL else x}) match.call()[[2]][[3]], gack! In lapply(X, FUN), FUN is applied to X[[i]], which has lost the names attribute that X may have had. X[i] retains a part of the names attribute (since it is a sublist of X, not an element of X). Hence FUN can look at the name associated with X[i] with code like the following: lapply(seq_along(X), FUN=function(i) { Xi - X[i] ; names(Xi) }) E.g., to apply one sort of processing to elements named P and another sort to those not named P you can do: bases - list(O=Oak Harbor,P=Pensicola,Q=Quonset Point) lapply(seq_along(bases), function(i){ base - bases[i] ; if (names(base) != P) paste0((,base,)) else tolower(base) } ) [[1]] [1] (Oak Harbor) [[2]] [1] pensicola [[3]] [1] (Quonset Point) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of arun Sent: Friday, December 14, 2012 12:11 PM To: Christofer Bogaso Cc: R help Subject: Re: [R] A question on list and lapply Hi, If you want the list element P to be present as NULL in the result you could use this: set.seed(51) lapply(lapply(Dat,My_Function),function(x) {if(names(Dat)[match.call()[[2]][[3]]]%in% P) NULL else x}) A.K. - Original Message - From: Christofer Bogaso bogaso.christo...@gmail.com To: r-help@r-project.org Cc: Sent: Friday, December 14, 2012 1:58 PM Subject: [R] A question on list and lapply Dear all, let say I have following list: Dat - vector(list, length = 26) names(Dat) - LETTERS My_Function - function(x) return(rnorm(5)) Dat1 - lapply(Dat, My_Function) However I want to apply my function 'My_Function' for all elements of 'Dat' except the elements having 'names(Dat) == P'. Here I have specified the name P just for illustration however this will be some name specified by user. Is there any direct way to achieve this, using 'lapply'? Thanks for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] format.pval () and printCoefmat ()
Hi David, Thank you so much for helping me with the code. Your suggested code gives me the following results. Please see below. I don't understand why I am getting two blocks of prints (5 columns, and then 7 columns), with some columns repeated. Regards, Pradip # cbind( y0410_1825_mf_alc[ 1:2], + printCoefmat(y0410_1825_mf_alc[ -(1:2) ], has.Pvalue=TRUE, eps.Pvalue=0.0001) + ) mean_level1 mean_level2 rel_diff p_mean cohens_d 1 18.744 11.9110.574 0.000.175 2 18.744 14.4550.297 0.000.110 3 18.744 13.5400.384 0.000.133 4 18.744 6.0022.123 0.000.333 5 18.744 5.8342.213 0.000.349 6 18.744 7.9331.363 0.000.279 7 18.744 10.8490.728 0.000.203 8 18.744 7.1301.629 0.000.298 9 18.744 9.7200.928 0.000.242 10 18.744 9.6000.952 0.000.242 11 18.744 16.1350.162 0.170.067 . 12 18.744 NA NA NA NA 13 18.744 10.4650.791 0.000.213 14 18.744 15.1490.237 0.020.092 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diffp_mean cohens_d 1 wh2+hi18.7 11.910.574 1.64e-05 0.1753 2 wh2+rc18.7 14.460.297 9.24e-06 0.1101 3 whaian18.7 13.540.384 9.01e-05 0.1335 4 whasan18.76.002.123 2.20e-119 0.3326 5 whblck18.75.832.213 0.00e+00 0.3490 6 whcsam18.77.931.363 1.27e-47 0.2793 7 wh cub18.7 10.850.728 6.12e-08 0.2025 8 whdmcn18.77.131.629 1.59e-15 0.2981 9 whhisp18.79.720.928 3.27e-125 0.2420 10 wh mex18.79.600.952 8.81e-103 0.2420 11 whnhpi18.7 16.140.162 1.74e-01 0.0669 12 whothh18.7 NA NANA NA 13 wh pr18.7 10.470.791 3.64e-23 0.2131 14 wh spn18.7 15.150.237 1.58e-02 0.0922 Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Friday, December 14, 2012 3:22 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R help Subject: Re: [R] format.pval () and printCoefmat () On Dec 14, 2012, at 11:48 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hi List, My goal is to force R not to print in scientific notation in the sixth column (rel_diff - for the p-value) of my data frame (not a matrix). I have used the format.pval () and printCoefmat () functions on the data frame. The R script is appended below. This issue is that use of the format.pval () and printCoefmat () functions on the data frame gives me the desired results, but coerces the character string into NAs for the two character variables, because my object is a data frame, not a matrix. Please see the first output below: contrast_level1 contrast_level2). Is there a way I could have avoid printing the NAs in the character fields They are probably factor columns. when using the format.pval () and printCoefmat () on the data frame? I would appreciate receiving your help. Thanks, Pradip setwd (F:/PR1/R_PR1) load (file = sigtests_overall_withid.rdata) #format.pval(tt$p.value, eps=0.0001) # keep only selected columns from the above data frame keep_cols1 - c(contrast_level1, contrast_level2,mean_level1, mean_level2, rel_diff, p_mean, cohens_d) #subset the data frame y0410_1825_mf_alc - subset (sigtests_overall_withid, years==0410 age_group==1825 gender_group==all drug==alc contrast_level1==wh, select=keep_cols1) #change the row.names row.names (y0410_1825_mf_alc)= 1:dim(y0410_1825_mf_alc)[1] #force format.pval(y0410_1825_mf_alc$p_mean, eps=0.0001) Presumably that
Re: [R] Combined Marimekko/heatmap
Here is a pretty simple function. -tgs ColorBlocks - function(rows, columns, ColorMatrix, RowNames, ColumnNames, TextMatrix = NULL){ # This function takes the following inputs: # rows : a numeric vector denoting the relative heights of rows # columns : a numeric vector denoting the relative widths of columns # ColorMatrix : a numeric matrix of integers representing colors # RowNames : a character vector of row names # ColumnNames : a character vector of column names # TextMatrix : optional, a character matrix of text to print in the center of each rectangle # Colors are selected from palette(). To change colors, see help(palette). rows - rows / sum(rows) columns - columns / sum(columns) old.pars - par(no.readonly=TRUE) LeftMargin - max(strwidth(RowNames,inches)) * 1.04 TopMargin - max(strheight(ColumnNames,inches)) * 1.4 par(tcl = 0, mgp = c(1,.2,0),mai=c(0.04,LeftMargin,TopMargin,0.04),cex.axis=.8) plot.new() plot.window(xlim=c(0,1),ylim=c(-1,0),xaxs='i',yaxs='i') rrr - -c(0,cumsum(rows)) ccc - c(0,cumsum(columns)) for(i in 1:nrow(ColorMatrix)){ for(j in 1:ncol(ColorMatrix)){ rect(ccc[j],rrr[i+1],ccc[j+1],rrr[i],col=ColorMatrix[i,j],border=white) if(!is.null(TextMatrix[i,j])) text( mean(ccc[j:(j+1)]), mean(rrr[i:(i+1)]), TextMatrix[i,j]) } } axis(2,at = -(cumsum(rows) + head(-rrr,-1))/2, labels = RowNames,line=0,las=1) axis(3,at = (cumsum(columns) + head(ccc,-1))/2, labels = ColumnNames) box() par(old.pars) } Z - textConnection( country TWh Australia 244 Canada 522 China 3503 EU 2500 India 689.5 Japan 997 USA 3960 ) CountryEnergy - read.table(Z,header=TRUE) close(Z) Z - textConnection( Product TWh Freezer 72.4 Refrigerator 379 HVAC 466 Lighting 123 Television 152 ) ProductEnergy - read.table(Z,header=TRUE) close(Z) #Generate Random Color Matrix set.seed(325034) RandomColors - matrix(sample(1:100,35),nrow=7,ncol=5) #Choose the entire palette of colors palette(colors()) # ColorBlocks( rows = CountryEnergy[,2], columns = ProductEnergy[,2], ColorMatrix = RandomColors, RowNames = CountryEnergy[,1], ColumnNames = ProductEnergy[,1] ) On Fri, Dec 14, 2012 at 4:12 PM, Neal Humphrey nhumph...@clasponline.orgwrote: Your right, the way Ive proposed the graphic it simplifies the story. In this graph, the size of the box is approximately how much energy that country uses for that appliance but not exactly. For most appliances it should be an okay estimate, although for example air conditioners in Russia would be an exception. ** ** Im proposing this format mostly for readability when there are 20-30 countries and appliances, Im concerned about misaligning the appliances from their labels. The main story Im trying to tell with the graphic is how much standards coverage there is and for which appliances, while the amount of energy is a secondary variable. Its also secondary because the actual energy savings potential from a standard will vary widely based on how strict the standard is. ** ** I was planning to complement this with a treemap graphichttp://flowingdata.com/2010/02/11/an-easy-way-to-make-a-treemap/, where you trade off readability of the secondary category for accurate representation of per country per appliance energy use. ** ** Talking through it now, maybe youre right that just going with the Marimekko-like plot, with some careful formatting cleanup in Illustrator, is a better representation? Id need to make some assumptions to fill holes in my data (its hard to get per country-per appliance data at the level of granularity Im aiming for). ** ** Im open to critiques or suggestions about the approach. Either way, I still need to figure out how to do the true/false color coding so any pointers on that are welcome. ** ** ** ** ** ** *Neal Humphrey* Tel: +1 202.662.7241 | Skype: neal.s.humphrey | nhumph...@clasponline.org* *** www.clasponline.org ** ** ** ** *From:* tgstew...@gmail.com [mailto:tgstew...@gmail.com] *On Behalf Of *Thomas Stewart *Sent:* Friday, December 14, 2012 3:18 PM *To:* Neal Humphrey *Subject:* Re: [R] Combined Marimekko/heatmap ** ** Neal- ** ** Help me understand something: In my mind, the size of each box---say Country A, appliance A1---should communicate the percentage of (total) energy consumption in country A using appliance A1. They way you've set up your plot does not do that. I guess my question is: What information does the size of the box communicate in your plot? ** ** I'd be happy to help once I understand. ** ** -tgs ** ** On Fri, Dec 14, 2012 at 1:03 PM, Neal Humphrey nhumph...@clasponline.org wrote: Thomas, This is a big help for getting me started. Brand new to R, so I'm unfamiliar with how to 'manually' drawing graphs (instead of packages). The graph your code makes is more like a Marimekko chart. What I'm thinking of is like a heatmap but each row has a different width, and each column
Re: [R] A question on list and lapply
HI Bill, bases #$O #[1] Oak Harbor #$P #[1] Pensicola # #$Q #[1] Quonset Point res2- lapply(bases,function(x) {if(names(bases)[match.call()[[2]][[3]]]%in% P) tolower(x) else paste0((,x,))}) res2 #$O #[1] (Oak Harbor) # #$P #[1] pensicola # #$Q #[1] (Quonset Point) #the names of the list elements are not lost here, while in your solution, it is not there and needs to be named again. res1-lapply(seq_along(bases), function(i){ base - bases[i] ; if (names(base) != P) paste0((,base,)) else tolower(base) } ) names(res1) #NULL names(res2) #[1] O P Q A.K. - Original Message - From: William Dunlap wdun...@tibco.com To: arun smartpink...@yahoo.com; Christofer Bogaso bogaso.christo...@gmail.com Cc: R help r-help@r-project.org Sent: Friday, December 14, 2012 3:59 PM Subject: RE: [R] A question on list and lapply lapply(lapply(Dat,My_Function),function(x) {if(names(Dat)[match.call()[[2]][[3]]]%in% P) NULL else x}) match.call()[[2]][[3]], gack! In lapply(X, FUN), FUN is applied to X[[i]], which has lost the names attribute that X may have had. X[i] retains a part of the names attribute (since it is a sublist of X, not an element of X). Hence FUN can look at the name associated with X[i] with code like the following: lapply(seq_along(X), FUN=function(i) { Xi - X[i] ; names(Xi) }) E.g., to apply one sort of processing to elements named P and another sort to those not named P you can do: bases - list(O=Oak Harbor,P=Pensicola,Q=Quonset Point) lapply(seq_along(bases), function(i){ base - bases[i] ; if (names(base) != P) paste0((,base,)) else tolower(base) } ) [[1]] [1] (Oak Harbor) [[2]] [1] pensicola [[3]] [1] (Quonset Point) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of arun Sent: Friday, December 14, 2012 12:11 PM To: Christofer Bogaso Cc: R help Subject: Re: [R] A question on list and lapply Hi, If you want the list element P to be present as NULL in the result you could use this: set.seed(51) lapply(lapply(Dat,My_Function),function(x) {if(names(Dat)[match.call()[[2]][[3]]]%in% P) NULL else x}) A.K. - Original Message - From: Christofer Bogaso bogaso.christo...@gmail.com To: r-help@r-project.org Cc: Sent: Friday, December 14, 2012 1:58 PM Subject: [R] A question on list and lapply Dear all, let say I have following list: Dat - vector(list, length = 26) names(Dat) - LETTERS My_Function - function(x) return(rnorm(5)) Dat1 - lapply(Dat, My_Function) However I want to apply my function 'My_Function' for all elements of 'Dat' except the elements having 'names(Dat) == P'. Here I have specified the name P just for illustration however this will be some name specified by user. Is there any direct way to achieve this, using 'lapply'? Thanks for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] format.pval () and printCoefmat ()
Hi Pradip, May be this helps: dat1-read.table(text= contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean cohens_d 1 wh 2+hi 18.7 11.91 0.574 1.64e-05 0.1753 2 wh 2+rc 18.7 14.46 0.297 9.24e-06 0.1101 3 wh aian 18.7 13.54 0.384 9.01e-05 0.1335 4 wh asan 18.7 6.00 2.123 2.20e-119 0.3326 5 wh blck 18.7 5.83 2.213 0.00e+00 0.3490 6 wh csam 18.7 7.93 1.363 1.27e-47 0.2793 7 wh cub 18.7 10.85 0.728 6.12e-08 0.2025 8 wh dmcn 18.7 7.13 1.629 1.59e-15 0.2981 9 wh hisp 18.7 9.72 0.928 3.27e-125 0.2420 10 wh mex 18.7 9.60 0.952 8.81e-103 0.2420 11 wh nhpi 18.7 16.14 0.162 1.74e-01 0.0669 12 wh othh 18.7 NA NA NA NA 13 wh pr 18.7 10.47 0.791 3.64e-23 0.2131 14 wh spn 18.7 15.15 0.237 1.58e-02 0.0922 ,sep=,header=TRUE,stringsAsFactors=FALSE) Lines1-capture.output(printCoefmat(dat1[,-c(1:2)],has.Pvalue=TRUE,eps.Pvalue=0.001)) Lines2-gsub(\\s+$,,gsub(\\.$,,Lines1[1:15])) res-data.frame(dat1[,1:2],read.table(text=Lines2,header=TRUE)) #or # res-cbind(dat1[,1:2],read.table(text=Lines2,header=TRUE)) res # contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean #1 wh 2+hi 18.7 11.91 0.574 0. #2 wh 2+rc 18.7 14.46 0.297 0. #3 wh aian 18.7 13.54 0.384 0.0001 - -- # cohens_d #1 0.1753 #2 0.1101 #3 0.1335 - - str(res) #'data.frame': 14 obs. of 7 variables: # $ contrast_level1: chr wh wh wh wh ... # $ contrast_level2: chr 2+hi 2+rc aian asan ... # $ mean_level1 : num 18.7 18.7 18.7 18.7 18.7 18.7 18.7 18.7 18.7 18.7 ... # $ mean_level2 : num 11.91 14.46 13.54 6 5.83 ... # $ rel_diff : num 0.574 0.297 0.384 2.123 2.213 ... # $ p_mean : num 0e+00 0e+00 1e-04 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 ... # $ cohens_d : num 0.175 0.11 0.134 0.333 0.349 ... A.K. - Original Message - From: Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov To: 'David Winsemius' dwinsem...@comcast.net Cc: R help r-help@r-project.org Sent: Friday, December 14, 2012 5:18 PM Subject: Re: [R] format.pval () and printCoefmat () Hi David, Thank you so much for helping me with the code. Your suggested code gives me the following results. Please see below. I don't understand why I am getting two blocks of prints (5 columns, and then 7 columns), with some columns repeated. Regards, Pradip # cbind( y0410_1825_mf_alc[ 1:2], + printCoefmat(y0410_1825_mf_alc[ -(1:2) ], has.Pvalue=TRUE, eps.Pvalue=0.0001) + ) mean_level1 mean_level2 rel_diff p_mean cohens_d 1 18.744 11.911 0.574 0.00 0.175 2 18.744 14.455 0.297 0.00 0.110 3 18.744 13.540 0.384 0.00 0.133 4 18.744 6.002 2.123 0.00 0.333 5 18.744 5.834 2.213 0.00 0.349 6 18.744 7.933 1.363 0.00 0.279 7 18.744 10.849 0.728 0.00 0.203 8 18.744 7.130 1.629 0.00 0.298 9 18.744 9.720 0.928 0.00 0.242 10 18.744 9.600 0.952 0.00 0.242 11 18.744 16.135 0.162 0.17 0.067 . 12 18.744 NA NA NA NA 13 18.744 10.465 0.791 0.00 0.213 14 18.744 15.149 0.237 0.02 0.092 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean cohens_d 1 wh 2+hi 18.7 11.91 0.574 1.64e-05 0.1753 2 wh 2+rc 18.7 14.46 0.297 9.24e-06 0.1101 3 wh aian 18.7 13.54 0.384 9.01e-05 0.1335 4 wh asan 18.7 6.00 2.123 2.20e-119 0.3326 5 wh blck 18.7 5.83 2.213 0.00e+00 0.3490 6 wh csam 18.7 7.93 1.363 1.27e-47 0.2793 7 wh cub 18.7 10.85 0.728 6.12e-08 0.2025 8
Re: [R] format.pval () and printCoefmat ()
On Dec 14, 2012, at 2:18 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hi David, Thank you so much for helping me with the code. Your suggested code gives me the following results. Please see below. I don't understand why I am getting two blocks of prints (5 columns, and then 7 columns), with some columns repeated. I am not sure why you think we will be able to answer this question. Your data is on your drive and you have not provided it in a form that would allow one to assist any further than the apparently inaccurate guesses already offered. You should consult this function: ?dput # use this with: y0410_1825_mf_alc -- David. Regards, Pradip # cbind( y0410_1825_mf_alc[ 1:2], + printCoefmat(y0410_1825_mf_alc[ -(1:2) ], has.Pvalue=TRUE, eps.Pvalue=0.0001) + ) mean_level1 mean_level2 rel_diff p_mean cohens_d 1 18.744 11.9110.574 0.000.175 2 18.744 14.4550.297 0.000.110 3 18.744 13.5400.384 0.000.133 4 18.744 6.0022.123 0.000.333 5 18.744 5.8342.213 0.000.349 6 18.744 7.9331.363 0.000.279 7 18.744 10.8490.728 0.000.203 8 18.744 7.1301.629 0.000.298 9 18.744 9.7200.928 0.000.242 10 18.744 9.6000.952 0.000.242 11 18.744 16.1350.162 0.170.067 . 12 18.744 NA NA NA NA 13 18.744 10.4650.791 0.000.213 14 18.744 15.1490.237 0.020.092 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diffp_mean cohens_d 1 wh2+hi18.7 11.910.574 1.64e-05 0.1753 2 wh2+rc18.7 14.460.297 9.24e-06 0.1101 3 whaian18.7 13.540.384 9.01e-05 0.1335 4 whasan18.76.002.123 2.20e-119 0.3326 5 whblck18.75.832.213 0.00e+00 0.3490 6 whcsam18.77.931.363 1.27e-47 0.2793 7 wh cub18.7 10.850.728 6.12e-08 0.2025 8 whdmcn18.77.131.629 1.59e-15 0.2981 9 whhisp18.79.720.928 3.27e-125 0.2420 10 wh mex18.79.600.952 8.81e-103 0.2420 11 whnhpi18.7 16.140.162 1.74e-01 0.0669 12 whothh18.7 NA NANA NA 13 wh pr18.7 10.470.791 3.64e-23 0.2131 14 wh spn18.7 15.150.237 1.58e-02 0.0922 Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Friday, December 14, 2012 3:22 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R help Subject: Re: [R] format.pval () and printCoefmat () On Dec 14, 2012, at 11:48 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hi List, My goal is to force R not to print in scientific notation in the sixth column (rel_diff - for the p-value) of my data frame (not a matrix). I have used the format.pval () and printCoefmat () functions on the data frame. The R script is appended below. This issue is that use of the format.pval () and printCoefmat () functions on the data frame gives me the desired results, but coerces the character string into NAs for the two character variables, because my object is a data frame, not a matrix. Please see the first output below: contrast_level1 contrast_level2). Is there a way I could have avoid printing the NAs in the character fields They are probably factor columns. when using the format.pval () and printCoefmat () on the data frame? I would appreciate receiving your help. Thanks, Pradip setwd (F:/PR1/R_PR1) load (file = sigtests_overall_withid.rdata) #format.pval(tt$p.value, eps=0.0001) # keep only selected columns from the above data frame keep_cols1 - c(contrast_level1, contrast_level2,mean_level1, mean_level2,