Re: [R] PERMANOVA+ and adonis in vegan package
On Sat, 2011-07-09 at 09:35 -0700, VG wrote: Hi, I was wondering if someone can tell me what is the difference between strata argument (function adonis in vegan package) and using random effects in PERMANOVA+ add-on package to PRIMER6 when doing permutational MANOVA-s? Is the way permutations are done the same? Thank you very much in advance, Vesna By the looks of the Primer page on this new add-on, there are substantial differences between adonis() and PERMANOVA+. The current way that adonis() permutes data is either to shuffle the data totally at random, or at random *within* the levels of `strata`. This conditions the permutations on the clusterings of samples, and reflects a null hypothesis where samples are not fully freely exchangeable (independent), but exchangeable only within the groups/clusters of samples. It appears PERMANOVA+ has facilities for more complex permutations that this. I have implemented some of these restricted permutations in my package `permute` - on r.forge within the vegan stable of packages - that will hopefully be used within vegan for all it's permutations, perhaps by as soon as the end of the summer break. As PERMANOVA+ is closed source and I have not seen any works from Marti Anderson describing the newer features of her software, which are included in the new PRIMER add-on, I would suggest you enquire with the PRIMER people as to the similarities/differences between adonis() and PERMANOVA+ and report back your findings? HTH G -- View this message in context: http://r.789695.n4.nabble.com/PERMANOVA-and-adonis-in-vegan-package-tp3656397p3656397.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave in R 2.13.1 doesn't support cp1250 encoding
Prof Brian Ripley ripley at stats.ox.ac.uk writes: On Mon, 11 Jul 2011, Tomaz wrote: I upgraded R on windows xp from 2.12.2 to 2.13.1 and now I can not process Rnw files with windows cp1250 encoding. Sweave complains: Which is of course not an ISO Standard encoding. One way out is to use the ISO encoding latin2, which is supported. file.Rnw declares an encoding that Sweave does not know about What can I do beside downgrade R? When will Sweave support more encodings? Has anybody found a solution? It is a great pity that you did not raise this during the alpha/beta/RC period of R 2.13.0, and that you did not give the 'at a minimum' information asked for in the posting guide so we do not know your locale. Nor do we have the 'commented, minimal, self-contained, reproducible' example we asked for. These things happen because of lack of cooperation from users with unusual requirements. Note that in several months no one else has reported a need for cp1250. The short answer is that now 2.13.1 was missed, it will be 2.14.0 and 2.13.1 patched: please do try a version of the latter dated tomorrow or later (r56361 or later) since there is no way we can test if the change made is adequate without your example. Next time you wish to request an enhancement to R for an very unusual usage case, please write to R-devel with full details and an example. Very much preferably, do so during the pre-release testing period. Regards, Tomaz __ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. I don't think that Windows-1250 a.k.a cp1250 encoding is unusual usage case as it is default native encoding for Rgui on Windows XP in Slovenia, Slovakia, Hungary, Czech Republic, Romania, Albania, Croatia, Bosnia and Serbia. I'm providing minimal example, it's a modified example from Sweave installation. %%% file: example-1.Rnw \documentclass[a4paper]{article} \usepackage[slovene]{babel} \usepackage[cp1250]{inputenc} \usepackage[T1]{fontenc} \title{Sweave Example 1} \author{Friedrich Leisch} \begin{document} \maketitle In this example we embed parts of the examples from the \texttt{kruskal.test} help page into a \LaTeX{} document: = data(airquality, package=datasets) library(stats) kruskal.test(Ozone ~ Month, data = airquality) @ which shows that the location parameter of the Ozone distribution varies significantly from month to month. Finally we include a boxplot of the data: \begin{center} fig=TRUE,echo=FALSE= library(graphics) boxplot(Ozone ~ Month, data = airquality) @ \end{center} \end{document} %%% Excerpt from R session: Sweave(example-1.Rnw) Error: 'example-1.Rnw' declares an encoding that Sweave does not know about sessionInfo() R version 2.13.1 (2011-07-08) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Slovenian_Slovenia.1250 LC_CTYPE=Slovenian_Slovenia.1250 [3] LC_MONETARY=Slovenian_Slovenia.1250 LC_NUMERIC=C [5] LC_TIME=Slovenian_Slovenia.1250 attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base Best regards, Tomaz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitdistr() Error
Any NA values or values outside the support region of your distribution? UWe On 11.07.2011 23:21, Peter Maclean wrote: I am trying to estimate a gamma function using real data and I am getting the following error messages. When I set a lower limit; the error message is L-BFGS-B needs finite values of fn For other method the error message is: Error in optim(x = c(0.1052867, 0.3472275, 2.057625, 0.329675, : non-finite finite-difference value [1] The codes works fine for simulated data (see below). I am using the same codes #Grouped vector n- c(1:100) yr-c(1:100) ny- list(yr=yr,n=n) require(utils) ny- expand.grid(ny) y = rgamma(1000, shape=1.5, rate = 1, scale = 2) Gdata- cbind(ny,y) #MLE Estimation of Gamma Distribution Parameters library(MASS) #Generate starting values y- as.numeric(Gdata2$y) me- mean(y) sde- sd(y) sh- sqrt(me/sde) sc- sqrt(sde)/me Gdata-split(Gdata,Gdata$n) parm- lapply(Gdata, function(x){ y- as.numeric(x$y) fitdistr(y ,gamma,list(shape=sh.mom, scale=sc), #method = c(Nelder-Mead, BFGS, CG, L-BFGS-B, SANN), lower=0, method = c(CG),control = list(maxit=1)) }) parmss-lapply(parm, function(x) x$estimate) parmss- t(as.data.frame(parmss)) #Estimates parmsd-lapply(parm, function(x) x$sd) parmsd- t(as.data.frame(parmsd)) #Standard errors __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave in R 2.13.1 doesn't support cp1250 encoding
On 12.07.2011 09:01, Tomaz wrote: Prof Brian Ripleyripleyat stats.ox.ac.uk writes: On Mon, 11 Jul 2011, Tomaz wrote: I upgraded R on windows xp from 2.12.2 to 2.13.1 and now I can not process Rnw files with windows cp1250 encoding. Sweave complains: Which is of course not an ISO Standard encoding. One way out is to use the ISO encoding latin2, which is supported. Have you read this and tried latin2? Uwe Ligges file.Rnw declares an encoding that Sweave does not know about What can I do beside downgrade R? When will Sweave support more encodings? Has anybody found a solution? It is a great pity that you did not raise this during the alpha/beta/RC period of R 2.13.0, and that you did not give the 'at a minimum' information asked for in the posting guide so we do not know your locale. Nor do we have the 'commented, minimal, self-contained, reproducible' example we asked for. These things happen because of lack of cooperation from users with unusual requirements. Note that in several months no one else has reported a need for cp1250. The short answer is that now 2.13.1 was missed, it will be 2.14.0 and 2.13.1 patched: please do try a version of the latter dated tomorrow or later (r56361 or later) since there is no way we can test if the change made is adequate without your example. Next time you wish to request an enhancement to R for an very unusual usage case, please write to R-devel with full details and an example. Very much preferably, do so during the pre-release testing period. Regards, Tomaz __ R-helpat r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. I don't think that Windows-1250 a.k.a cp1250 encoding is unusual usage case as it is default native encoding for Rgui on Windows XP in Slovenia, Slovakia, Hungary, Czech Republic, Romania, Albania, Croatia, Bosnia and Serbia. I'm providing minimal example, it's a modified example from Sweave installation. %%% file: example-1.Rnw \documentclass[a4paper]{article} \usepackage[slovene]{babel} \usepackage[cp1250]{inputenc} \usepackage[T1]{fontenc} \title{Sweave Example 1} \author{Friedrich Leisch} \begin{document} \maketitle In this example we embed parts of the examples from the \texttt{kruskal.test} help page into a \LaTeX{} document: = data(airquality, package=datasets) library(stats) kruskal.test(Ozone ~ Month, data = airquality) @ which shows that the location parameter of the Ozone distribution varies significantly from month to month. Finally we include a boxplot of the data: \begin{center}fig=TRUE,echo=FALSE= library(graphics) boxplot(Ozone ~ Month, data = airquality) @ \end{center} \end{document} %%% Excerpt from R session: Sweave(example-1.Rnw) Error: 'example-1.Rnw' declares an encoding that Sweave does not know about sessionInfo() R version 2.13.1 (2011-07-08) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Slovenian_Slovenia.1250 LC_CTYPE=Slovenian_Slovenia.1250 [3] LC_MONETARY=Slovenian_Slovenia.1250 LC_NUMERIC=C [5] LC_TIME=Slovenian_Slovenia.1250 attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base Best regards, Tomaz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with 'predict'
flags - c(rep(1, length(patient_indices)), rep(0, length(control_indices))) # dataset is a data.frame and param the parameter to be analysed: data1 - dataset[,param][c(patient_indices, control_indices)] fit1 - glm(flags ~ data1, family = binomial) new.data- seq(0, 300, 10) new.p - predict(fit1, data.frame(newdata = new.data), type = response) Should (probably) have been ... names of RHS variables need to be exact match: new.p - predict(fit1, newdata= data.frame(data1 = new.data), type = response) Thanks, David and Dennis. That's the thing. I have tried so many alterations overlooking that I mangled old and new variable names, unable to see the obvious. Well, again a nice demonstration of the perils of copy paste. Thanks, Christian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Spectral Coherence
On 12/07/11 09:04, Joseph Park wrote: Greetings, I would like to estimate a spectral coherence between two timeseries. The stats : spectrum() returns a coh matrix which estimates coherence (squared). A basic test which from which i expect near-zero coherence: x = rnorm(500) y = rnorm(500) xts = ts(x, frequency = 10) yts = ts(y, frequency = 10) gxy = spectrum( cbind( xts, yts ) ) plot( gxy $ freq, gxy $ coh ) yields a white spectrum of 1. Clearly i'm not using this correctly... or i mis-interpret the coh as a cross-spectral density estimate of coherence |Gxy|^2/(Gxx Gyy) Thanks in advance! By default spectrum() calls spec.pgram() with spans=NULL. The result is that it calculates the coherence between x and y as |I_{xy}(omega)|^2 / (I_{xx}(omega) * I_{yy}(omega)) where I_{xy}() is the cross periodogram and I_{xx}() and I_{yy}() are the respective periodograms. This quantity will indeed be identically 1 --- see equation 10.1 in Bloomfield, second ed., page 203. It would be nice if the help on spectrum mentioned this. According to Bloomfield what is needed is not the periodograms but rather estimated spectra, smoothed versions of the periodograms, s_{xy}() etc. Equation 10.4 in Bloomfield, second ed., page 206 indicates that the spectral estimates satisfy an *inequality* |s_{xy}(omega)|^2 = s_{xx}(omega) * s_{yy}(omega) whence the coherence is always between 0 and 1. To resolve the problem you need to specify the spans argument in the call to spectrum, e.g. gxy - spectrum(cbind(xts,yts),spans=c(5,7)) HTH cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Named numeric vectors with the same value but different names return different results when used as thresholds for calculating true positives
Hi, Am 11.07.2011 22:57, schrieb Lyndon Estes: ctch[ctch$threshold == 3.5, ] # [1] threshold val tpfptnfntpr fpr tnr fnr #0 rows (or 0-length row.names) this is the very effective FAQ 7.31 trap. http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f Welcome to the first circle of Patrick Burns' R Inferno! Also, unname() is a more intuitive way of removing names. And I think your code is quite inefficient, because you calculate quantiles many times, which involves repeated ordering of x, and you may use a inefficient size of bin (either to small and therefore calculating the same split many times or to large and then missing some splits). I'm a bit puzzled what is x and y in your code, so any further advise is vague but you might have a look at any package that calculates ROC-curves such as ROCR or pROC (and many more). Hth -- Eik Vettorazzi Department of Medical Biometry and Epidemiology University Medical Center Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] apply (or similar preferred) for multiple columns
Dear all, I would like to use the apply or a similar function belonging to this family, but applying for each column (or row) but let say for each q columns. For example I would like to apply a function FUN for the first q columns of matrix X then for q+1:2*q and so on. If I do apply (X, 2, FUN) it applies for each column and not for every q columns. Is that possible with any similar function? Thank you Dimitris -- View this message in context: http://r.789695.n4.nabble.com/apply-or-similar-preferred-for-multiple-columns-tp3661835p3661835.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating a zero matrix when a condition doesn´t get it
Hi all, I first create a matrix/data frame called d2 if another matrix accomplishes some restrictions dacc2 da2-da1[colSums(dacc2)9,] da2-da2[(da2[,13]=24),] write.csv(da2, file =paste('hggi', i,'.csv',sep = '')) The thing is if finally da2 cannot get/passs the filters, it cannot writte a csv because there is no any true condition. How can I create anyway a csv with zeros of one row and n columns (being n the number of columns of da2? I need a loop? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Brier score for extended Cox model
Dear all, I would like to obtain the Brier score prediction error at different times t for an extended Cox model. Previously I have used the 'pec' function (pec{pec}) to obtain prediction error curves for standard Cox PH models but now I have data in the counting process format (I have a covariate with a time-varying effect) and it seems that the pec function does not support the counting process format, or am I doing something wrong? Here's a (tiny) example of what I'm trying to do: # Original survival data set: dat time status x1 1 169 1 2 2 149 1 11 3 207 1 22 4 192 1 27 5 200 1 10 # Split original data at cutpoint 190. New data will be in counting process format: dat.x = survSplit(dat, cut=190, end=time, event=status, start=start) # New data set: dat.x time status x1 start 1 169 1 2 0 2 149 1 11 0 3 190 0 22 0 4 190 0 27 0 5 190 0 10 0 8 207 1 22 190 9 192 1 27 190 10 200 1 10 190 # Load pec and fit Cox model: library(pec) models = list(Cox= coxph(Surv(start,time,status) ~ x1, data=dat.x)) # Compute the apparent prediction error: predError = pec(object = models, formula=Surv(start,time,status) ~ x1, data=dat.x, exact=TRUE, cens.model=marginal, replan=none, B=0, verbose=TRUE) Error in pec.list(object = models, formula = Surv(start, time, status) ~ : Survival response must at least consist of two columns: time and status. Am I doing something wrong here or is it not possible to apply the pec function on counting process data? If I can't use pec, perhaps someone knows of some other function I could use instead to get the Brier score at different times t when using the counting process approach. Any guidance on these questions is much appreciated. Thanks, Ulf [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] as.numeric
Dear R user, After I imported data (csv format) in R, I called it out. But it is in non-numeric format. Then using as.numeric function. However, the output is really awful ! PE[1,90:99] V90 V91 V92 V93 V94 V95 V96 V97 V98 V99 1 16.8467742 17.5853166 19.7400328 21.7277241 21.5015489 19.1922102 20.3351524 18.1615471 18.5479946 16.8983887 as.numeric(PE[1,90:99]) [1] 11 10 11 10 11 9 10 9 9 8 How can I solve the above problem?? Thanks so much! Jessica -- View this message in context: http://r.789695.n4.nabble.com/as-numeric-tp3661739p3661739.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Running R on a Computer Cluster in the Cloud - cloudnumbers.com
Dear R community, cloudnumbers.com provides researchers and companies with the resources to perform high performance calculations in the cloud. As cloudnumbers.com's community manager I may invite you to register and test R on a computer cluster in the cloud for free: http://my.cloudnumbers.com/register Our aim is to change the way of research collaboration is done today by bringing together scientists and businesses from all over the world on a single platform. cloudnumbers.com is a Berlin (Germany) based international high-tech startup striving for enabling everyone to benefit from the High Performance Computing related advantages of the cloud. We provide easy access to applications running on any kind of computer hardware: from single core high memory machines up to 1000 cores computer clusters. Our platform provides several advantages: * Turn fixed into variable costs and pay only for the capacity you need. Watch our latest saving costs with cloudnumbers.com video: http://www.youtube.com/watch?v=ln_BSVigUhgfeature=player_embedded * Enter the cloud using an intuitive and user friendly platform. Watch our latest cloudnumbers.com in a nutshell video: http://www.youtube.com/watch?v=0ZNEpR_ElV0feature=player_embedded * Be released from ongoing technological obsolescence and continuous maintenance costs (e.g. linking to libraries or system dependencies) * Accelerated your R, C, C++, Fortran, Python, ... calculations through parallel processing and great computing capacity - more than 1000 cores are available and GPUs are coming soon. * Share your results worldwide (coming soon). * Get high speed access to public databases. * We have developed a security architecture that meets high requirements of data security and privacy. Read our security white paper: http://d1372nki7bx5yg.cloudfront.net/wp-content/uploads/2011/06/cloudnumberscom-security.whitepaper.pdf This is only a selection of our top features. To get more information check out our web-page (http://www.cloudnumbers.com/) or follow our blog about cloud computing, HPC and HPC applications (with R): http://cloudnumbers.com/blog Register and test for free now at cloudnumbers.com: http://my.cloudnumbers.com/register We are looking forward to get your feedback and consumer insights. Best Markus -- Dr. rer. nat. Markus Schmidberger Senior Community Manager Cloudnumbers.com GmbH Chausseestraße 6 10119 Berlin www.cloudnumbers.com E-Mail: markus.schmidber...@cloudnumbers.com * Amtsgericht München, HRB 191138 Geschäftsführer: Erik Muttersbach, Markus Fensterer, Moritz v. Petersdorff-Campen Diese Nachricht kann vertrauliche Informationen enthalten. Sollten Sie nicht der vorgesehene Empfänger sein, so bitten wir um eine kurze Nachricht. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulässig. Da wir nicht die Echtheit oder Vollständigkeit der in dieser Nachricht enthaltenen Informationen garantieren können, schließen wir die rechtliche Verbindlichkeit der vorstehenden Erklärungen und Äußerungen aus. This message may contain confidential information. If yo...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating a zero matrix when a condition doesn´t get it
Hi, You don't provide us with a reproducible example, so I can't provide you with actual code. But two approaches come to mind: 1. Create da2 with one row and n columns, then change the appropriate elements, if any, based on your conditions. 2. Do the conditional parts, then check to see whether da2 is empty. If it is, then replace the empty data frame with a data frame of one row and n columns. Sarah On Tue, Jul 12, 2011 at 3:51 AM, Trying To learn again tryingtolearnag...@gmail.com wrote: Hi all, I first create a matrix/data frame called d2 if another matrix accomplishes some restrictions dacc2 da2-da1[colSums(dacc2)9,] da2-da2[(da2[,13]=24),] write.csv(da2, file =paste('hggi', i,'.csv',sep = '')) The thing is if finally da2 cannot get/passs the filters, it cannot writte a csv because there is no any true condition. How can I create anyway a csv with zeros of one row and n columns (being n the number of columns of da2? I need a loop? Rarely. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.numeric
Jessica, This would be easier to solve if you gave us more information, like str(PE). However, my guess is that your data somewhere has a nonnumeric value in that column, so the entire column is being imported as factor. It's not really awful - R is converting those factor values to their numeric levels, just as you asked. The best solution is to find and deal with the nonnumeric value before you import your data (something else you did not tell us about). Failing that, you may find this useful: as.numeric(as.character(PE[1, 90:99])) Sarah On Tue, Jul 12, 2011 at 4:38 AM, Jessica Lam ma_lk...@yahoo.com.hk wrote: Dear R user, After I imported data (csv format) in R, I called it out. But it is in non-numeric format. Then using as.numeric function. However, the output is really awful ! PE[1,90:99] V90 V91 V92 V93 V94 V95 V96 V97 V98 V99 1 16.8467742 17.5853166 19.7400328 21.7277241 21.5015489 19.1922102 20.3351524 18.1615471 18.5479946 16.8983887 as.numeric(PE[1,90:99]) [1] 11 10 11 10 11 9 10 9 9 8 How can I solve the above problem?? Thanks so much! Jessica -- -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.numeric
On 07/12/2011 06:38 PM, Jessica Lam wrote: Dear R user, After I imported data (csv format) in R, I called it out. But it is in non-numeric format. Then using as.numeric function. However, the output is really awful ! PE[1,90:99] V90 V91 V92 V93 V94 V95 V96 V97 V98 V99 1 16.8467742 17.5853166 19.7400328 21.7277241 21.5015489 19.1922102 20.3351524 18.1615471 18.5479946 16.8983887 as.numeric(PE[1,90:99]) [1] 11 10 11 10 11 9 10 9 9 8 How can I solve the above problem?? Hi Jessica, Try as.numeric(as.character(PE[1,90:99])) If that works, your variable has probably managed to become a factor. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem finding p-value for entropy in reldist package
Have you noted you sent your message to the R-help list only and forgot to include the original poster? You also forgot to cite the original question (and any other former parts of the thread as far as there was any). Please do so when sending messages to a mailing list such as R-help. Thanks, Uwe Ligges On 11.07.2011 19:24, VictorDelgado wrote: Hi Amy Wesolowski, I don't have a straightfoward answer to you question. I have been working with reldist too, and the 'rpy' and 'rpluy' functions described by Applying Relative Distribution Methods in R are also not working here in my 2.9.1 R-version. I think its because they are reldist internall function, so, maybe its possible that they only work with previous objects and set ups... But if you look to internal parametres of reldist, you could set ci = TRUE, it's constructs the confidence interval for entropy by the proportion of original cohort. It's still unhelpfull to understand how this intervall is constructed, and also does not show the overall interval, but you can se the values with $ci. By the Handcock and Morris (1998) paper is posible to intuit that they are comparing the 0.00 entropy with the 95% Confidence Interval around the estimate. For example, in this artigle, pag. 74, they reach an overall entropy of 0.125, the lower 95%_CI is 0.092. The 0.00 comparision is far below this lower bound, so its resonable to think the p-value is realy 0.000. But it's only a clue to approximate the true p-value. But we still needing to see: 1) how this intervall is constructed (I have no idea what distribution the entropy should have, and if it changes by data) and 2) Knowing the first point, how to set alpha values). Good luck, Victor Delgado cedeplar.ufmg.br P.H.D. student www.fjp.mg.gov.br reseacher -- View this message in context: http://r.789695.n4.nabble.com/problem-finding-p-value-for-entropy-in-reldist-package-tp3659806p3660228.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gaussian low-pass filter
gfcoeffs - function(s, n) { t - seq(-n,n,1) ## assuming 2*n+1 taps return ( exp(-(t^2/(2*s^2)))/sqrt(2*pi*s^2) ) } 2011/6/29 Martin Wilkes m.wil...@worc.ac.uk: I want to filter my time series with a low-pass filter using a Gaussian smoothing function defined as: w(t) = (2πσ^2)^0.5 exp(-t^2/2σ^2) I was hoping to use an existing function to filter my data but help.search and Rsitesearch produced no useful results. Can anyone tell me if there is an existing function that will do the job? If not, how would I begin to go about building such a filter? Thanks Martin Wilkes University of Worcester __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Cheers, jcb! ___ http://twitter.com/jcborras __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Running R on a Computer Cluster in the Cloud - cloudnumbers.com
On Tue, Jul 12, 2011 at 7:15 AM, Markus Schmidberger schmi...@in.tum.de wrote: This is only a selection of our top features. To get more information check out our web-page (http://www.cloudnumbers.com/) or follow our blog about cloud computing, HPC and HPC applications (with R): http://cloudnumbers.com/blog Register and test for free now at cloudnumbers.com: http://my.cloudnumbers.com/register We are looking forward to get your feedback and consumer insights. Spam? Anyway, I quite like Dogbert's insights: http://dilbert.com/strips/comic/2011-01-07/ Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] timezones - any practical solution?
Hello all, Could someone help me with the time zones in understandable practical way? I got completely stucked with this. Have googled for a while and read the manuals, but without solutions... --- When data imported from Excel 2007 into R (2.13) all time variables, depending on date (summer or winter) get (un-asked for it!) a time zone addition CEST (for summer dates) or CET (for winter dates). Dataset Start End1 End2 days2End1.from.Exceldays2End2.from.Excel days2End1.in.R days2End2.in.R 1 2010-01-01 2011-01-01 2012-01-01 365 730 365 days 730. days 2 2010-02-01 2011-02-01 2012-01-01 365 699 365 days 699. days 3 2010-03-01 2011-03-01 2012-01-01 365 671 365 days 671. days 4 2010-04-01 2011-04-01 2012-01-01 365 640 365 days 640.0417 days 5 2010-05-01 2011-05-01 2012-01-01 365 610 365 days 610.0417 days 6 2010-06-01 2011-06-01 2012-01-01 365 579 365 days 579.0417 days 7 2010-07-01 2011-07-01 2012-01-01 365 549 365 days 549.0417 days 8 2010-08-01 2011-08-01 2012-01-01 365 518 365 days 518.0417 days 9 2010-09-01 2011-09-01 2012-01-01 365 487 365 days 487.0417 days 10 2010-10-01 2011-10-01 2012-01-01 365 457 365 days 457.0417 days 11 2010-11-01 2011-11-01 2012-01-01 365 426 365 days 426. days 12 2010-12-01 2011-12-01 2012-01-01 365 396 365 days 396. days Variables 'days2End1.from.Excel and 'days2End2.from.Excel' are alculated in Excel. Same calculation (with same outcome!) I would like to be able to perform with R. Variables 'days2End1.in.R' and 'days2End2.in.R are calculated with R. Dataset$days2End1.from.Excel [1] 365 365 365 365 365 365 365 365 365 365 365 365 Dataset$days2End1.in.R - with(Dataset, End1- Start) Dataset$days2End1.in.R Time differences in days [1] 365 365 365 365 365 365 365 365 365 365 365 365 attr(,tzone) [1] Dataset$days2End2.from.Excel [1] 730 699 671 640 610 579 549 518 487 457 426 396 Dataset$days2End2.in.R - with(Dataset, End2- Start) Dataset$days2End2.in.R Time differences in days [1] 730. 699. 671. 640.0417 610.0417 579.0417 549.0417 518.0417 487.0417 457.0417 426. 396. attr(,tzone) [1] Quastion 1: As you can see 'Dataset$days2End2.in.R' gives wrong 'day' calculation at time period April until October, when CEST (summer) times are recorded 640.0417 610.0417 579.0417 549.0417 518.0417 487.0417 457.0417 giving decimals on days, where round days expected (640 610 579 549 518 487 457). Can someone explain me how to deal with it in R? What is the best way to calculate days in R getting correct calculations? Question 2: As I only need to work with dates without time and without time zones, I would be happy to remove them if possible. I tried already the trunc() function but without succes. The result doesn't change. Dataset$days2End2.in.R.TRUNC - with(Dataset, trunc(End2)- trunc(Start)) Dataset$days2End2.in.R.TRUNC Time differences in days [1] 730. 699. 671. 640.0417 610.0417 579.0417 549.0417 518.0417 487.0417 457.0417 426. 396. attr(,tzone) [1] I would be happy if someone could light up this thing. Many thanks in advance! Laura [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] time zone - any practical solution?
Hello all, Could someone help me with the time zones in understandable practical way? I got completely stucked with this. Have googled for a while and read the manuals, but without solutions... --- When data imported from Excel 2007 into R (2.13) all time variables, depending on date (summer or winter) get (un-asked for it!) a time zone addition CEST (for summer dates) or CET (for winter dates). Dataset Start End1 End2 days2End1.from.Exceldays2End2.from.Excel days2End1.in.R days2End2.in.R 1 2010-01-01 2011-01-01 2012-01-01 365 730 365 days 730. days 2 2010-02-01 2011-02-01 2012-01-01 365 699 365 days 699. days 3 2010-03-01 2011-03-01 2012-01-01 365 671 365 days 671. days 4 2010-04-01 2011-04-01 2012-01-01 365 640 365 days 640.0417 days 5 2010-05-01 2011-05-01 2012-01-01 365 610 365 days 610.0417 days 6 2010-06-01 2011-06-01 2012-01-01 365 579 365 days 579.0417 days 7 2010-07-01 2011-07-01 2012-01-01 365 549 365 days 549.0417 days 8 2010-08-01 2011-08-01 2012-01-01 365 518 365 days 518.0417 days 9 2010-09-01 2011-09-01 2012-01-01 365 487 365 days 487.0417 days 10 2010-10-01 2011-10-01 2012-01-01 365 457 365 days 457.0417 days 11 2010-11-01 2011-11-01 2012-01-01 365 426 365 days 426. days 12 2010-12-01 2011-12-01 2012-01-01 365 396 365 days 396. days Dataset$Start [1] 2010-01-01 CET 2010-02-01 CET 2010-03-01 CET 2010-04-01 CEST 2010-05-01 CEST 2010-06-01 CEST 2010-07-01 CEST 2010-08-01 CEST 2010-09-01 CEST 2010-10-01 CEST 2010-11-01 CET 2010-12-01 CET Dataset$End1 [1] 2011-01-01 CET 2011-02-01 CET 2011-03-01 CET 2011-04-01 CEST 2011-05-01 CEST 2011-06-01 CEST 2011-07-01 CEST 2011-08-01 CEST 2011-09-01 CEST 2011-10-01 CEST 2011-11-01 CET 2011-12-01 CET Dataset$End2 [1] 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET Variables 'days2End1.from.Excel and 'days2End2.from.Excel' are calculated in Excel. Same calculation (with same outcome!) I would like to be able to perform with R. Variables 'days2End1.in.R' and 'days2End2.in.R are calculated with R. Dataset$days2End1.from.Excel [1] 365 365 365 365 365 365 365 365 365 365 365 365 Dataset$days2End1.in.R - with(Dataset, End1- Start) Dataset$days2End1.in.R Time differences in days [1] 365 365 365 365 365 365 365 365 365 365 365 365 attr(,tzone) [1] Dataset$days2End2.from.Excel [1] 730 699 671 640 610 579 549 518 487 457 426 396 Dataset$days2End2.in.R - with(Dataset, End2- Start) Dataset$days2End2.in.R Time differences in days [1] 730. 699. 671. 640.0417 610.0417 579.0417 549.0417 518.0417 487.0417 457.0417 426. 396. attr(,tzone) [1] Quastion 1: As you can see 'Dataset$days2End2.in.R' gives wrong 'day' calculation at time period April until October, when CEST (summer) times are recorded 640.0417 610.0417 579.0417 549.0417 518.0417 487.0417 457.0417 giving decimals on days, where round days expected (640 610 579 549 518 487 457). Can someone explain me how to deal with it in R? What is the best way to calculate days in R getting correct calculations? Question 2: As I only need to work with dates without time and without time zones, I would be happy to remove them if possible. I tried already the trunc() function but without succes. The result doesn't change. Dataset$days2End2.in.R.TRUNC - with(Dataset, trunc(End2)- trunc(Start)) Dataset$days2End2.in.R.TRUNC Time differences in days [1] 730. 699. 671. 640.0417 610.0417 579.0417 549.0417 518.0417 487.0417 457.0417 426. 396. attr(,tzone) [1] I would be happy if someone could light up this thing. Many thanks in advance! Laura [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal,
Re: [R] Sweave in R 2.13.1 doesn't support cp1250 encoding
Uwe Ligges ligges at statistik.tu-dortmund.de writes: On 12.07.2011 09:01, Tomaz wrote: Prof Brian Ripleyripleyat stats.ox.ac.uk writes: On Mon, 11 Jul 2011, Tomaz wrote: I upgraded R on windows xp from 2.12.2 to 2.13.1 and now I can not process Rnw files with windows cp1250 encoding. Sweave complains: Which is of course not an ISO Standard encoding. One way out is to use the ISO encoding latin2, which is supported. Have you read this and tried latin2? Uwe Ligges Windows XP doesn't have latin2 locale and I think that Sweave should support all locales (encodings) that are supported by latex package inputenc. If someone can show me how can I set Rgui/Rterm, Emacs/ESS, Sweave/Latex to use utf8 that would be helpful. My best current setup was to set all tools to use cp1250. Best regards, Tomaz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Spectral Coherence
Thanks you! I should have realized that without explicitly engaging some form of averaging (which raises a windowing question) that the coh is always 1. On 7/12/2011 4:48 AM, Rolf Turner wrote: On 12/07/11 09:04, Joseph Park wrote: Greetings, I would like to estimate a spectral coherence between two timeseries. The stats : spectrum() returns a coh matrix which estimates coherence (squared). A basic test which from which i expect near-zero coherence: x = rnorm(500) y = rnorm(500) xts = ts(x, frequency = 10) yts = ts(y, frequency = 10) gxy = spectrum( cbind( xts, yts ) ) plot( gxy $ freq, gxy $ coh ) yields a white spectrum of 1. Clearly i'm not using this correctly... or i mis-interpret the coh as a cross-spectral density estimate of coherence |Gxy|^2/(Gxx Gyy) Thanks in advance! By default spectrum() calls spec.pgram() with spans=NULL. The result is that it calculates the coherence between x and y as |I_{xy}(omega)|^2 / (I_{xx}(omega) * I_{yy}(omega)) where I_{xy}() is the cross periodogram and I_{xx}() and I_{yy}() are the respective periodograms. This quantity will indeed be identically 1 --- see equation 10.1 in Bloomfield, second ed., page 203. It would be nice if the help on spectrum mentioned this. According to Bloomfield what is needed is not the periodograms but rather estimated spectra, smoothed versions of the periodograms, s_{xy}() etc. Equation 10.4 in Bloomfield, second ed., page 206 indicates that the spectral estimates satisfy an *inequality* |s_{xy}(omega)|^2 = s_{xx}(omega) * s_{yy}(omega) whence the coherence is always between 0 and 1. To resolve the problem you need to specify the spans argument in the call to spectrum, e.g. gxy - spectrum(cbind(xts,yts),spans=c(5,7)) HTH cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 101, Issue 12
Július 7-től 14-ig irodán kívül vagyok, és az emailjeimet nem érem el. Sürgős esetben kérem forduljon Kárpáti Edithez (karpati.e...@gyemszi.hu). Üdvözlettel, Mihalicza Péter I will be out of the office from 7 July till 14 July with no access to my emails. In urgent cases please contact Ms. Edit Kárpáti (karpati.e...@gyemszi.hu). With regards, Peter Mihalicza __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Delete row takes ages. alternative?!
Thanks both of you for help! This is my conclusion based on Rolf's and David's suggestion: (x=data.frame) By for-looping over a data.frame and deleting certain rows with x=x[-i,] its better to collect all rows which need to be deleted in a vector and do one final delete step: collecting: vector=append(vector,i) x=x[-vector,] cheers, sven -- View this message in context: http://r.789695.n4.nabble.com/Delete-row-takes-ages-alternative-tp3656949p3661979.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help in error removal
Dear all, I am new to programming in R. I deal with microarray data,which is a data frame object type. I need to carry out a few statistical procedures on this, one of them being the pearson corelation. I need to do this between each row which is a gene. So the desired result is a square matrix with the pearson corelation value between each row. So the first column would be (1,1)=0,(1,2),(1,3) and so on. I uploaded the data frame as a:- a - read.csv(a.csv, header= TRUE, row.names=1) and then I started the script:- pearson - function(a){ r - matrix[x,y] for(x in as.vector(a[,1], mode=double)){ x++{ for(y in as.vector(a[2,], mode=double)){ y - x+1 x++ { r - (cor.test(as.vector(as.matrix(a)[x,], mode=double), as.vector(as.matrix(a)[y,], mode=double))$p.value) } } } r[x,y]==r[y,x] } return(r) } However whenever I run it,I get the error:- pearson(a) Error in matrix[x, y] : object of type 'closure' is not subsettable Please help! Best Regards Sumona Mitra __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] MC-Simulation with foreach: Some cores finish early
Dear R-Users, I run a MC-Simulation using the the packages foreach and doMC on a PowerMac with 24 cores. There are roughly a hundred parametersets and I parallelized the program in a way, that each core computes one of these parametersets completely. The problem ist, that some parametersets take a lot longer to compute than others. After a while there are only a quarter of the cores still computing (their first parameterset), while others are already finished. But some parametersets are still untouched. I have thought about changing my parameterfile in a way, that every combination takes roughly the same time (longer computations are offset with less repetitions), but maybe there is a more elegant solution. Is it somehow possible to wake the finished cores, while there is still work to do? ;-) Sincerly, H. Bumann -- View this message in context: http://r.789695.n4.nabble.com/MC-Simulation-with-foreach-Some-cores-finish-early-tp3661998p3661998.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Named numeric vectors with the same value but different names return different results when used as thresholds for calculating true positives
Also note that the statistical method you are using does not seem in line with decision theory, and you are assuming that the threshold actually exists. It is seldom the case that the relationship of a predictor with the response is flat on at least one side of the threshold. A smooth prediction model may be in order. Frank Eik Vettorazzi wrote: Hi, Am 11.07.2011 22:57, schrieb Lyndon Estes: ctch[ctch$threshold == 3.5, ] # [1] threshold val tpfptnfntpr fpr tnr fnr #0 rows (or 0-length row.names) this is the very effective FAQ 7.31 trap. http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f Welcome to the first circle of Patrick Burns' R Inferno! Also, unname() is a more intuitive way of removing names. And I think your code is quite inefficient, because you calculate quantiles many times, which involves repeated ordering of x, and you may use a inefficient size of bin (either to small and therefore calculating the same split many times or to large and then missing some splits). I'm a bit puzzled what is x and y in your code, so any further advise is vague but you might have a look at any package that calculates ROC-curves such as ROCR or pROC (and many more). Hth -- Eik Vettorazzi Department of Medical Biometry and Epidemiology University Medical Center Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Named-numeric-vectors-with-the-same-value-but-different-names-return-different-results-when-used-as-s-tp3660833p3662030.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] time zone - any practical solution?
Hi Jim, by dropping them down it gives 1 day less than it should do, on all timezone notations CEST and CET. start [1] 2002-09-04 CEST 2000-07-27 CEST 2003-01-04 CET 2001-06-29 CEST 2005-01-12 CET 2000-05-28 CEST 2002-06-01 CEST 2000-06-02 CEST 2000-02-27 CET 2000-09-29 CEST 2003-10-22 CEST 2002-06-03 CEST [13] 2004-12-30 CET 2000-04-07 CEST 2006-02-03 CET 2003-06-12 CEST 2004-07-15 CEST 2000-04-29 CEST 2000-05-06 CEST 2004-10-27 CEST start - format(as.Date(start,%Y-%m-%d),%Y-%m-%d) start [1] 2002-09-03 2000-07-26 2003-01-03 2001-06-28 2005-01-11 2000-05-27 2002-05-31 2000-06-01 2000-02-26 2000-09-28 2003-10-21 2002-06-02 2004-12-29 2000-04-06 2006-02-02 2003-06-11 2004-07-14 [18] 2000-04-28 2000-05-05 2004-10-26 2011/7/12 Jim Lemon j...@bitwrit.com.au On 07/12/2011 08:58 PM, B Laura wrote: Hello all, Could someone help me with the time zones in understandable practical way? I got completely stucked with this. Have googled for a while and read the manuals, but without solutions... --**--**--- When data imported from Excel 2007 into R (2.13) all time variables, depending on date (summer or winter) get (un-asked for it!) a time zone addition CEST (for summer dates) or CET (for winter dates). Dataset Start End1 End2 days2End1.from.Exceldays2End2.from.Excel days2End1.in.R days2End2.in.R 1 2010-01-01 2011-01-01 2012-01-01 365 730 365 days 730. days 2 2010-02-01 2011-02-01 2012-01-01 365 699 365 days 699. days 3 2010-03-01 2011-03-01 2012-01-01 365 671 365 days 671. days 4 2010-04-01 2011-04-01 2012-01-01 365 640 365 days 640.0417 days 5 2010-05-01 2011-05-01 2012-01-01 365 610 365 days 610.0417 days 6 2010-06-01 2011-06-01 2012-01-01 365 579 365 days 579.0417 days 7 2010-07-01 2011-07-01 2012-01-01 365 549 365 days 549.0417 days 8 2010-08-01 2011-08-01 2012-01-01 365 518 365 days 518.0417 days 9 2010-09-01 2011-09-01 2012-01-01 365 487 365 days 487.0417 days 10 2010-10-01 2011-10-01 2012-01-01 365 457 365 days 457.0417 days 11 2010-11-01 2011-11-01 2012-01-01 365 426 365 days 426. days 12 2010-12-01 2011-12-01 2012-01-01 365 396 365 days 396. days Dataset$Start [1] 2010-01-01 CET 2010-02-01 CET 2010-03-01 CET 2010-04-01 CEST 2010-05-01 CEST 2010-06-01 CEST 2010-07-01 CEST 2010-08-01 CEST 2010-09-01 CEST 2010-10-01 CEST 2010-11-01 CET 2010-12-01 CET Dataset$End1 [1] 2011-01-01 CET 2011-02-01 CET 2011-03-01 CET 2011-04-01 CEST 2011-05-01 CEST 2011-06-01 CEST 2011-07-01 CEST 2011-08-01 CEST 2011-09-01 CEST 2011-10-01 CEST 2011-11-01 CET 2011-12-01 CET Dataset$End2 [1] 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET Variables 'days2End1.from.Excel and 'days2End2.from.Excel' are calculated in Excel. Same calculation (with same outcome!) I would like to be able to perform with R. Variables 'days2End1.in.R' and 'days2End2.in.R are calculated with R. Dataset$days2End1.from.Excel [1] 365 365 365 365 365 365 365 365 365 365 365 365 Dataset$days2End1.in.R- with(Dataset, End1- Start) Dataset$days2End1.in.R Time differences in days [1] 365 365 365 365 365 365 365 365 365 365 365 365 attr(,tzone) [1] Dataset$days2End2.from.Excel [1] 730 699 671 640 610 579 549 518 487 457 426 396 Dataset$days2End2.in.R- with(Dataset, End2- Start) Dataset$days2End2.in.R Time differences in days [1] 730. 699. 671. 640.0417 610.0417 579.0417 549.0417 518.0417 487.0417 457.0417 426. 396. attr(,tzone) [1] Quastion 1: As you can see 'Dataset$days2End2.in.R' gives wrong 'day' calculation at time period April until October, when CEST (summer) times are recorded 640.0417 610.0417 579.0417 549.0417 518.0417 487.0417 457.0417 giving decimals on days, where round days expected (640 610 579 549 518 487 457). Can someone explain me how to deal with it in R? What is
Re: [R] time zone - any practical solution?
On Tue, Jul 12, 2011 at 6:58 AM, B Laura gm.spam2...@gmail.com wrote: Hello all, Could someone help me with the time zones in understandable practical way? I got completely stucked with this. Have googled for a while and read the manuals, but without solutions... --- When data imported from Excel 2007 into R (2.13) all time variables, depending on date (summer or winter) get (un-asked for it!) a time zone addition CEST (for summer dates) or CET (for winter dates). Read http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windowss=excel which gives many ways of reading Excel into R and read R News 4/1 which discusses appropriate R classes to use (you would b best to use Date, not POSIXct, in which case you could not have time zone problems in the first place) and internal representations of R vs. Excel. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.numeric
It may be helpful to make sure that, in the dialog that pops up when saving a spreadsheet to CSV, the option Save cell content as shown is checked - that would leave numbers as numbers, not wrapping them in . That has helped me at least in a similar situation! Rgds, Rainer On Tuesday 12 July 2011 06:09:18 Sarah Goslee wrote: Jessica, This would be easier to solve if you gave us more information, like str(PE). However, my guess is that your data somewhere has a nonnumeric value in that column, so the entire column is being imported as factor. It's not really awful - R is converting those factor values to their numeric levels, just as you asked. The best solution is to find and deal with the nonnumeric value before you import your data (something else you did not tell us about). Failing that, you may find this useful: as.numeric(as.character(PE[1, 90:99])) Sarah On Tue, Jul 12, 2011 at 4:38 AM, Jessica Lam ma_lk...@yahoo.com.hk wrote: Dear R user, After I imported data (csv format) in R, I called it out. But it is in non-numeric format. Then using as.numeric function. However, the output is really awful ! PE[1,90:99] V90 V91 V92 V93 V94 V95 V96 V97 V98 V99 1 16.8467742 17.5853166 19.7400328 21.7277241 21.5015489 19.1922102 20.3351524 18.1615471 18.5479946 16.8983887 as.numeric(PE[1,90:99]) [1] 11 10 11 10 11 9 10 9 9 8 How can I solve the above problem?? Thanks so much! Jessica -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MC-Simulation with foreach: Some cores finish early
peter_petersen henning.bumann at gmail.com writes: I run a MC-Simulation using the the packages foreach and doMC on a PowerMac with 24 cores. There are roughly a hundred parametersets and I parallelized the program in a way, that each core computes one of these parametersets completely. The problem ist, that some parametersets take a lot longer to compute than others. After a while there are only a quarter of the cores still computing (their first parameterset), while others are already finished. But some parametersets are still untouched. I have thought about changing my parameterfile in a way, that every combination takes roughly the same time (longer computations are offset with less repetitions), but maybe there is a more elegant solution. It sounds to me like this would require writing an entire batch scheduling system within R -- i.e., the system would have to maintain a queue and track which cores were finished. I'd love to know if someone's written it, but I sort of doubt it ... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave in R 2.13.1 doesn't support cp1250 encoding
On 11-07-12 6:42 AM, Tomaz wrote: Uwe Liggesliggesat statistik.tu-dortmund.de writes: On 12.07.2011 09:01, Tomaz wrote: Prof Brian Ripleyripleyat stats.ox.ac.uk writes: On Mon, 11 Jul 2011, Tomaz wrote: I upgraded R on windows xp from 2.12.2 to 2.13.1 and now I can not process Rnw files with windows cp1250 encoding. Sweave complains: Which is of course not an ISO Standard encoding. One way out is to use the ISO encoding latin2, which is supported. Have you read this and tried latin2? Uwe Ligges Windows XP doesn't have latin2 locale and I think that Sweave should support all locales (encodings) that are supported by latex package inputenc. If someone can show me how can I set Rgui/Rterm, Emacs/ESS, Sweave/Latex to use utf8 that would be helpful. My best current setup was to set all tools to use cp1250. Have you tried following Brian's advice, and testing the new version? It works for me. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] matplot with dates/times on horizontal axis
matplot(timestamp,xymatrix,type='l') where timestamp is a vector filled with POSIXct objects and xymatrix is a numeric 2x2 matrix plots but the horizontal axis labels are raw unformatted timestamps. I would like to format these in any of the available codes for strftime, for instance format=%H:%M. Passing a vector of formatted strings does not work. Any obvious other ways fail upon me as well. Any ideas to make this work? Thanks in advance, Alex van der Spek __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] time zone - any practical solution?
Dear Gabor http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windowss=excel doesnt describe handling dates with daylight saving time issues. R classes Date can remove time and timezone, however calculating days difference between two manipulated variables same problem appear if handling these without Dates. R News 4/1 doesnt provide solution to this neither. Have read and struggled with this stuff for 3 days. Anyone else who could help on this? Regards,Laura. 2011/7/12 Gabor Grothendieck ggrothendi...@gmail.com On Tue, Jul 12, 2011 at 6:58 AM, B Laura gm.spam2...@gmail.com wrote: Hello all, Could someone help me with the time zones in understandable practical way? I got completely stucked with this. Have googled for a while and read the manuals, but without solutions... --- When data imported from Excel 2007 into R (2.13) all time variables, depending on date (summer or winter) get (un-asked for it!) a time zone addition CEST (for summer dates) or CET (for winter dates). Read http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windowss=excel which gives many ways of reading Excel into R and read R News 4/1 which discusses appropriate R classes to use (you would b best to use Date, not POSIXct, in which case you could not have time zone problems in the first place) and internal representations of R vs. Excel. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] applying function to multiple columns of a matrix
Hi, I want to apply a function to a matrix, taking the columns 3 by 3. I could use a for loop: for(i in 1:3){ # here I assume my data matrix has 9 columns j = i*3 set = my.data[,c(j-2,j-1,j)] my.function(set) } which looks cumbersome and possibly slow. I was hoping there is some function in the apply()/lapply() families that could take 3 columns at a time. I though of turning mydata in a list, then using lapply() new.data = list(my.data[,1:3], my.data[,4:6], my.data[,7:9]) lapply(new.data, my.function) but that might incur in too much memory penalty and does have the issue of requiring a for loop to create the list (not all my data is conveniently of 9 columns only). Any suggestion would be much appreciated. Bw Federico -- Federico C. F. Calboli Department of Epidemiology and Biostatistics Imperial College, St. Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 75941602 Fax +44 (0)20 75943193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fixed effects Tobit, Honore style?
Hi all, Is there any code to run fixed effects Tobit models in the style of Honore (1992) in R? (The original Honore article is here: http://www.jstor.org/sici?sici=0012-9682%28199205%2960%3A3%3C533%3ATLALSE%3E2.0.CO%3B2-2) Cheers David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Print file updated/created date to console?
Hello, Are there any built in or user defined functions for printing the date created or date updated for a given file? Ideally a function that works across operating systems. Thanks! Scott Chamberlain [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help in error removal
On Jul 12, 2011, at 7:27 AM, Mitra, Sumona wrote: Dear all, I am new to programming in R. You see to think there is a ++ operation in R. That is not so. I deal with microarray data,which is a data frame object type. I need to carry out a few statistical procedures on this, one of them being the pearson corelation. I need to do this between each row which is a gene. So the desired result is a square matrix with the pearson corelation value between each row. So the first column would be (1,1)=0,(1,2),(1,3) and so on. I do not understand what that means. You should offer either a minimal dataset or at the very least the results of str(a). I uploaded the data frame as a:- If by that you mean you made a failed effort at attaching the data in a file, then you need to read the Posting Guide for what the server will accept as a file type. a - read.csv(a.csv, header= TRUE, row.names=1) and then I started the script:- pearson - function(a){ r - matrix[x,y] for(x in as.vector(a[,1], mode=double)){ I do not see a need for as.vector here or at any point later. a[,1] would already be a vector and if it is not numeric to begin with, then you are going to get junk. x++{ for(y in as.vector(a[2,], mode=double)){ y - x+1 x++ { r - (cor.test(as.vector(as.matrix(a)[x,], mode=double), as.vector(as.matrix(a)[y,], mode=double))$p.value) } } } r[x,y]==r[y,x] } return(r) } However whenever I run it,I get the error:- pearson(a) Error in matrix[x, y] : object of type 'closure' is not subsettable Please help! Best Regards Sumona Mitra __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Print file updated/created date to console?
Hi, file.info() does that. Cheers Am 12.07.2011 15:29, schrieb Scott Chamberlain: Hello, Are there any built in or user defined functions for printing the date created or date updated for a given file? Ideally a function that works across operating systems. Thanks! Scott Chamberlain [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Institut für Medizinische Biometrie und Epidemiologie Universitätsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help in error removal
On 12 July 2011 12:27, Mitra, Sumona sumona.mi...@kcl.ac.uk wrote: Dear all, I am new to programming in R. You sure are ;-) I deal with microarray data,which is a data frame object type. I need to carry out a few statistical procedures on this, one of them being the pearson corelation. I need to do this between each row which is a gene. So the desired result is a square matrix with the pearson corelation value between each row. So the first column would be (1,1)=0,(1,2),(1,3) and so on. I uploaded the data frame as a:- a - read.csv(a.csv, header= TRUE, row.names=1) and then I started the script:- pearson - function(a){ r - matrix[x,y] I bet the problem you are getting is here. You want r to be a x by y matrix. To do this, try r- matrix(nrow=x, ncol=y). But you haven't defined the x and y, unless we are missing that part of your code. As I understand it, you want the correlation matrix between all the rows of your matrix. If so, then look at the help file for cor. (ie type ?cor.) You will find that it automatically prints the correlatoins between all columns of a matrix. So, once your data is correctly read in, you should be able to just do: cor(t(a)) for(x in as.vector(a[,1], mode=double)){ x++{ for(y in as.vector(a[2,], mode=double)){ y - x+1 x++ This code looks like a horrible mess. It's almost never right to loop through your vectors. In addition, there is no such thing as ++, as somebody mentioned. { r - (cor.test(as.vector(as.matrix(a)[x,], mode=double), as.vector(as.matrix(a)[y,], mode=double))$p.value) } } } r[x,y]==r[y,x] } return(r) } However whenever I run it,I get the error:- pearson(a) Error in matrix[x, y] : object of type 'closure' is not subsettable Please help! Best Regards Sumona Mitra __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RES: applying function to multiple columns of a matrix
Hi Frederico. I would keep the data as it is, create two small vectors referring to the ranges and use a mapply (as a sapply but with multiple variables) for the function. Hope the example below is helpful, although as usual someone out there will have a better solution for it. dta - c() for (i in 1:12) dta - cbind(dta,matrix(i,5,1)) dta [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [1,]123456789101112 [2,]123456789101112 [3,]123456789101112 [4,]123456789101112 [5,]123456789101112 rng.a - seq(1,10,by=3) rng.b - seq(3,12,by=3) rng.a [1] 1 4 7 10 rng.b [1] 3 6 9 12 mapply(x=rng.a, y=rng.b, function(x,y) sum(dta[,c(x:y)])) [1] 30 75 120 165 Cheers, Filipe -Mensagem original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Em nome de Federico Calboli Enviada em: terça-feira, 12 de julho de 2011 10:07 Para: r-help Assunto: [R] applying function to multiple columns of a matrix Hi, I want to apply a function to a matrix, taking the columns 3 by 3. I could use a for loop: for(i in 1:3){ # here I assume my data matrix has 9 columns j = i*3 set = my.data[,c(j-2,j-1,j)] my.function(set) } which looks cumbersome and possibly slow. I was hoping there is some function in the apply()/lapply() families that could take 3 columns at a time. I though of turning mydata in a list, then using lapply() new.data = list(my.data[,1:3], my.data[,4:6], my.data[,7:9]) lapply(new.data, my.function) but that might incur in too much memory penalty and does have the issue of requiring a for loop to create the list (not all my data is conveniently of 9 columns only). Any suggestion would be much appreciated. Bw Federico -- Federico C. F. Calboli Department of Epidemiology and Biostatistics Imperial College, St. Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 75941602 Fax +44 (0)20 75943193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This message and its attachments may contain confidential and/or privileged information. If you are not the addressee, please, advise the sender immediately by replying to the e-mail and delete this message. Este mensaje y sus anexos pueden contener información confidencial o privilegiada. Si ha recibido este e-mail por error por favor bórrelo y envíe un mensaje al remitente. Esta mensagem e seus anexos podem conter informação confidencial ou privilegiada. Caso não seja o destinatário, solicitamos a imediata notificação ao remetente e exclusão da mensagem. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Print file updated/created date to console?
Eik, Thanks very much! Scott On Tuesday, July 12, 2011 at 8:34 AM, Eik Vettorazzi wrote: Hi, file.info (http://file.info)() does that. Cheers Am 12.07.2011 15:29, schrieb Scott Chamberlain: Hello, Are there any built in or user defined functions for printing the date created or date updated for a given file? Ideally a function that works across operating systems. Thanks! Scott Chamberlain [[alternative HTML version deleted]] __ R-help@r-project.org (mailto:R-help@r-project.org) mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Institut für Medizinische Biometrie und Epidemiologie Universitätsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] FW: lasso regression
Hi, I am trying to do a lasso regression using the lars package with the following data (see attached): FastestTime WinPercentage PlacePercentage ShowPercentage BreakAverage FinishAverage Time7Average Time3Average Finish 116.90 0.14 0.14 0.29 4.43 3.29 117.56 117.77 5.00 116.23 0.29 0.43 0.14 6.14 2.14 116.84 116.80 2.00 116.41 0.00 0.14 0.29 5.71 3.71 117.24 117.17 4.00 115.80 0.57 0.00 0.29 2.14 2.57 116.21 116.53 6.00 117.76 0.14 0.14 0.43 5.43 3.57 118.57 118.87 3.00 117.69 0.14 0.14 0.00 4.71 4.00 118.69 118.60 6.00 116.46 0.14 0.00 0.00 5.14 5.00 118.50 118.97 5.00 119.77 0.00 0.00 0.14 4.57 4.14 120.74 121.03 4.00 116.81 0.14 0.29 0.00 4.86 3.57 117.63 117.40 5.00 117.66 0.14 0.14 0.14 4.57 4.71 119.19 120.57 7.00 #load Data crs- read.csv(file:///C:/temp/Horse//horseracing.csvfile:///C:\temp\Horse\horseracing.csv, na.strings=c(,, NA, , ?), encoding=UTF-8) ## define x and y x= x-crs[,9]#predictor variables y= y-crs[1:8,] #response variable library(lars) cv.lars(x, y, K=10, trace=TRUE, plot.it = TRUE,se = TRUE, type=lasso) and I get: LASSO sequence Error in one %*% x : requires numeric/complex matrix/vector arguments Any idea on what I am doing wrong? Thank you!! Sincerely, tom [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RES: applying function to multiple columns of a matrix
I just realised that: apply(matrix(1:dim(my.data)[2], nrow =3), 2, function(x){my.function(my.data[,x])}) is the simplest possible method. bw F On 12 Jul 2011, at 14:44, Filipe Leme Botelho wrote: Hi Frederico. I would keep the data as it is, create two small vectors referring to the ranges and use a mapply (as a sapply but with multiple variables) for the function. Hope the example below is helpful, although as usual someone out there will have a better solution for it. dta - c() for (i in 1:12) dta - cbind(dta,matrix(i,5,1)) dta [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [1,]123456789101112 [2,]123456789101112 [3,]123456789101112 [4,]123456789101112 [5,]123456789101112 rng.a - seq(1,10,by=3) rng.b - seq(3,12,by=3) rng.a [1] 1 4 7 10 rng.b [1] 3 6 9 12 mapply(x=rng.a, y=rng.b, function(x,y) sum(dta[,c(x:y)])) [1] 30 75 120 165 Cheers, Filipe -Mensagem original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Em nome de Federico Calboli Enviada em: terça-feira, 12 de julho de 2011 10:07 Para: r-help Assunto: [R] applying function to multiple columns of a matrix Hi, I want to apply a function to a matrix, taking the columns 3 by 3. I could use a for loop: for(i in 1:3){ # here I assume my data matrix has 9 columns j = i*3 set = my.data[,c(j-2,j-1,j)] my.function(set) } which looks cumbersome and possibly slow. I was hoping there is some function in the apply()/lapply() families that could take 3 columns at a time. I though of turning mydata in a list, then using lapply() new.data = list(my.data[,1:3], my.data[,4:6], my.data[,7:9]) lapply(new.data, my.function) but that might incur in too much memory penalty and does have the issue of requiring a for loop to create the list (not all my data is conveniently of 9 columns only). Any suggestion would be much appreciated. Bw Federico -- Federico C. F. Calboli Department of Epidemiology and Biostatistics Imperial College, St. Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 75941602 Fax +44 (0)20 75943193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This message and its attachments may contain confidential and/or privileged information. If you are not the addressee, please, advise the sender immediately by replying to the e-mail and delete this message. Este mensaje y sus anexos pueden contener información confidencial o privilegiada. Si ha recibido este e-mail por error por favor bórrelo y envíe un mensaje al remitente. Esta mensagem e seus anexos podem conter informação confidencial ou privilegiada. Caso não seja o destinatário, solicitamos a imediata notificação ao remetente e exclusão da mensagem. -- Federico C. F. Calboli Department of Epidemiology and Biostatistics Imperial College, St. Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 75941602 Fax +44 (0)20 75943193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] FW: lasso regression
Hi, I am trying to do a lasso regression using the lars package with the following data: FastestTime WinPercentage PlacePercentage ShowPercentage BreakAverage FinishAverage Time7Average Time3Average Finish 116.90 0.14 0.14 0.29 4.43 3.29 117.56 117.77 5.00 116.23 0.29 0.43 0.14 6.14 2.14 116.84 116.80 2.00 116.41 0.00 0.14 0.29 5.71 3.71 117.24 117.17 4.00 115.80 0.57 0.00 0.29 2.14 2.57 116.21 116.53 6.00 #load Data crs- read.csv(file:///C:/temp/Horse//horseracing.csvfile:///C:\temp\Horse\horseracing.csv, na.strings=c(,, NA, , ?), encoding=UTF-8) ## define x and y x= x-crs[,9]#predictor variables y= y-crs[1:8,] #response variable library(lars) cv.lars(x, y, K=10, trace=TRUE, plot.it = TRUE,se = TRUE, type=lasso) and I get: LASSO sequence Error in one %*% x : requires numeric/complex matrix/vector arguments Any idea on what I am doing wrong? Thank you!! Sincerely, tom [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot means ?
* David Winsemius qjvafrz...@pbzpnfg.arg [2011-07-11 18:16:25 -0400]: What is the point of offering this code? To illustrate what I was talking about (code is its own specification). I hoped that there was already a package doing that (and more in that direction). It seems to be doing what you want yes. Are you trying to use someone else's code no, I wrote it myself. I hoped someone would comment on it to help me improve it. I actually now use findInterval which you suggested. (who by the way appears to have been a former SAS programmer The last time I used SAS was more than 10 years ago. I am a Lisper (I also know C/C++/Perl c). the totally unnecessary semi-colons) then why are they accepted? optional syntax elements suck... thanks for your help. -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 11.0.60900031 http://truepeace.org http://pmw.org.il http://camera.org http://jihadwatch.org http://www.PetitionOnline.com/tap12009/ http://iris.org.il http://memri.org Lisp: it's here to save your butt. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: lasso regression
On Jul 12, 2011, at 9:53 AM, Heiman, Thomas J. wrote: Hi, I am trying to do a lasso regression using the lars package with the following data (see attached): Nothing attached. (And now you have also sent an exact duplicate.) snipped failed attempt to include data inline that was sabotaged by using HTML mail #load Data crs- read.csv(file:///C:/temp/Horse//horseracing.csvfile:///C:\temp\Horse\horseracing.csv , na.strings=c(,, NA, , ?), encoding=UTF-8) This looks wrong. Your data had no commas in it and you are also setting na.strings to include commas. If I am wrong then you should provide dput on crs instead of ## define x and y x= x-crs[,9]#predictor variables y= y-crs[1:8,] #response variable library(lars) cv.lars(x, y, K=10, trace=TRUE, plot.it = TRUE,se = TRUE, type=lasso) and I get: LASSO sequence Error in one %*% x : requires numeric/complex matrix/vector arguments Any idea on what I am doing wrong? Thank you!! Sincerely, tom [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] FW: lasso regression
Hi, Hopefully I got the formatting down.. I am trying to do a lasso regression using the lars package with the following data (the data files is in .csv format): V1 V2 V3 V4 V5 V6 V7 V8 V9 1 FastestTime WinPercentage PlacePercentage ShowPercentage BreakAverageFinishAverage Time7AverageTime3AverageFinish 2 116.9 0.14285715 0.14285715 0.2857143 4.4285713.2857144 117.557144 117.76667 5.0 3 116.22857 0.2857143 0.42857143 0.14285715 6.1428572.142857116.84286 116.8 2.0 4 116.41428 0.0 0.14285715 0.2857143 5.7142863.7142856 117.24286 117.14 4.0 5 115.8 0.5714286 0.0 0.2857143 2.1428572.5714285 116.21429 116.5 6.0 #load Data crs- read.csv(file:///C:/temp/Horse//horseracing.csvfile:///C:\temp\Horse\horseracing.csv, na.strings=c(,, NA, , ?), encoding=UTF-8) ## define x and y x= x-crs[,9]#predictor variables y= y-crs[1:8,] #response variable library(lars) cv.lars(x, y, K=10, trace=TRUE, plot.it = TRUE,se = TRUE, type=lasso) and I get: LASSO sequence Error in one %*% x : requires numeric/complex matrix/vector arguments Any idea on what I am doing wrong? Thank you!! Sincerely, tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to find out whether a string is a factor?
I have two data frames: str(ysmd) 'data.frame': 8325 obs. of 6 variables: $ X.stock : Factor w/ 8325 levels A,AA,AA-,..: 2702 6547 4118 7664 7587 6350 3341 5640 5107 7589 ... $ market.cap : num -1.00 2.97e+10 3.54e+08 3.46e+08 -1.00 ... $ X52.week.low : num 40.2 22.5 27.5 12.2 20.7 ... $ X52.week.high: num 43.3 38.2 35.1 19.2 32.7 ... $ X3.month.average.daily.volume: num 154 7862250 16330 205784 14697 ... $ X50.day.moving.average.price : num 41.8 36.3 30.5 15.2 29.9 ... str(top1000) 'data.frame': 1000 obs. of 1 variable: $ V1: Factor w/ 1000 levels AA,AAI,AAP,..: 146 96 341 814 382 977 66 1 737 595 ... I want to split ysmd into two new data frames: ysmd.top1000 and ysmd.rest so that ysmd.top1000$X.stock only contains factors from top1000$V1 and ysmd.rest$X.stock contains all the other factors. I should be able to just write ysmd.top1000 - ysmd[ysmd$X.stock is in top1000$V1,] ysmd.rest - ysmd[ysmd$X.stock not in top1000$V1,] but how so I check whether a string is a member of a factor? -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 11.0.60900031 http://mideasttruth.com http://truepeace.org http://camera.org http://thereligionofpeace.com http://pmw.org.il Professionalism is being dispassionate about your work. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] High density scatter plot with logarithmic binning
How can perform logarithmic binning in the scatterplot? I could only take the log of the variables and plot them, but I am sure that is not the way. I have a very huge data, and would want to plot those high density scatterplots and code then with different colors for the bins/density. -- View this message in context: http://r.789695.n4.nabble.com/High-density-scatter-plot-with-logarithmic-binning-tp3662226p3662226.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MC-Simulation with foreach: Some cores finish early
If you switch directly to the multicore package you can use the mclapply() function. There, check for the parameter mc.preschedule=T / F. You can use this parameter to improve the load balancing. I do not know a parameter to tune foreach with this parameter. Best Markus Am Dienstag, den 12.07.2011, 04:31 -0700 schrieb peter_petersen: Dear R-Users, I run a MC-Simulation using the the packages foreach and doMC on a PowerMac with 24 cores. There are roughly a hundred parametersets and I parallelized the program in a way, that each core computes one of these parametersets completely. The problem ist, that some parametersets take a lot longer to compute than others. After a while there are only a quarter of the cores still computing (their first parameterset), while others are already finished. But some parametersets are still untouched. I have thought about changing my parameterfile in a way, that every combination takes roughly the same time (longer computations are offset with less repetitions), but maybe there is a more elegant solution. Is it somehow possible to wake the finished cores, while there is still work to do? ;-) Sincerly, H. Bumann -- View this message in context: http://r.789695.n4.nabble.com/MC-Simulation-with-foreach-Some-cores-finish-early-tp3661998p3661998.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating a zero matrix when a condition doesn´t get it
Many Thanks¡¡¡ I will try this night, I have read this I think could help me. I´m conscient the question was badly formulated now, I will try to explain better next time¡¡¡ On a side note: apply always accesses the function you use at least once. If the input is a dataframe without any rows but with defined variables, it sends FALSE as an argument to the function. If the dataframe is completely empty, it sends a logical(0) to the function. x - data.frame(a=numeric(0)) str(x) 'data.frame': 0 obs. of 1 variable: $ a: num y - apply(x,MARGIN=1,FUN=function(x){print(x)}) [1] FALSE x - data.frame() str(x) 'data.frame': 0 obs. of 0 variables y - apply(x,MARGIN=1,FUN=function(x){print(x)}) logical(0) 2011/7/12 Sarah Goslee sarah.gos...@gmail.com Hi, You don't provide us with a reproducible example, so I can't provide you with actual code. But two approaches come to mind: 1. Create da2 with one row and n columns, then change the appropriate elements, if any, based on your conditions. 2. Do the conditional parts, then check to see whether da2 is empty. If it is, then replace the empty data frame with a data frame of one row and n columns. Sarah On Tue, Jul 12, 2011 at 3:51 AM, Trying To learn again tryingtolearnag...@gmail.com wrote: Hi all, I first create a matrix/data frame called d2 if another matrix accomplishes some restrictions dacc2 da2-da1[colSums(dacc2)9,] da2-da2[(da2[,13]=24),] write.csv(da2, file =paste('hggi', i,'.csv',sep = '')) The thing is if finally da2 cannot get/passs the filters, it cannot writte a csv because there is no any true condition. How can I create anyway a csv with zeros of one row and n columns (being n the number of columns of da2? I need a loop? Rarely. -- Sarah Goslee http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Avoiding loops to detect number of coincidences
Hi all, I have this information on a file ht.txt, imagine it is a data frame without labels: 1 1 1 8 1 1 6 4 1 3 1 3 3 And on other table called pru.txt I have sequences similar this 4 1 1 8 1 1 6 4 1 3 1 3 3 1 6 1 8 1 1 6 4 1 3 1 3 3 1 1 1 8 1 1 6 4 1 3 1 3 3 6 6 6 8 1 1 6 4 1 3 1 3 3 I want to now how many positions are identical between each row in pru compared with ht. n and m are the col and row of pru (m is the same number in pru and ht) I tried this with loops n-nrow(pru) m-ncol(pru) dacc2-mat.or.vec(n, m) for (g in 1:n){ for (j in 1:m){ if(pru[g,j]-ht[1,j]!=0) dacc2[g,j]=0 else {dacc2[g,j]=1} } } So when I have dacc2 I can filter this: dar2-pru[colSums(dacc2)2 colSums(dacc2)10,] There is some way to avoid loops? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] FW: lasso regression
Hi, I am trying to do a lasso regression using the lars package with the following data (see attached): FastestTime WinPercentage PlacePercentage ShowPercentage BreakAverage FinishAverage Time7Average Time3Average Finish 116.90 0.14 0.14 0.29 4.43 3.29 117.56 117.77 5.00 116.23 0.29 0.43 0.14 6.14 2.14 116.84 116.80 2.00 116.41 0.00 0.14 0.29 5.71 3.71 117.24 117.17 4.00 115.80 0.57 0.00 0.29 2.14 2.57 116.21 116.53 6.00 117.76 0.14 0.14 0.43 5.43 3.57 118.57 118.87 3.00 117.69 0.14 0.14 0.00 4.71 4.00 118.69 118.60 6.00 116.46 0.14 0.00 0.00 5.14 5.00 118.50 118.97 5.00 119.77 0.00 0.00 0.14 4.57 4.14 120.74 121.03 4.00 116.81 0.14 0.29 0.00 4.86 3.57 117.63 117.40 5.00 117.66 0.14 0.14 0.14 4.57 4.71 119.19 120.57 7.00 #load Data crs- read.csv(file:///C:/temp/Horse//horseracing.csvfile:///C:\temp\Horse\horseracing.csv, na.strings=c(,, NA, , ?), encoding=UTF-8) ## define x and y x= x-crs[,9]#predictor variables y= y-crs[1:8,] #response variable library(lars) cv.lars(x, y, K=10, trace=TRUE, plot.it = TRUE,se = TRUE, type=lasso) and I get: LASSO sequence Error in one %*% x : requires numeric/complex matrix/vector arguments Any idea on what I am doing wrong? Thank you!! Sincerely, tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reorganize data fram
Hi, I have a data frame of about 700 000 rows which look something like this: DateTemperature Category 2007102 16 A 2007102 17 B 2007102 18 C but need it to be: Date TemperatureA TemperatureB TemperatureC 2007102 16 1718 Any suggestions? /Angelica -- View this message in context: http://r.789695.n4.nabble.com/Reorganize-data-fram-tp3662123p3662123.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cross K Ripley's function and spatio-temporal interaction power
Dear All, I have a collections of spatial data. I have to analyze pairs of these point patterns to test their spatial interaction. I was moving towards the cross K Ripley's function. The problem, however, are the following: 1) What is the best way to get a single real value that represents the interaction power? 2) How to obtain a value that even allows me to rank the pairwise point patterns according to their interaction power? PS: I have to perform the same analysis for temporal interaction and spatio-temporal interaction. Thanks in advance for your help Best Regards Massimiliano Ruocco __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to find out whether a string is a factor?
On Jul 12, 2011, at 10:12 AM, Sam Steingold wrote: I have two data frames: str(ysmd) 'data.frame': 8325 obs. of 6 variables: $ X.stock : Factor w/ 8325 levels A,AA,AA-,..: 2702 6547 4118 7664 7587 6350 3341 5640 5107 7589 ... $ market.cap : num -1.00 2.97e+10 3.54e+08 3.46e +08 -1.00 ... $ X52.week.low : num 40.2 22.5 27.5 12.2 20.7 ... $ X52.week.high: num 43.3 38.2 35.1 19.2 32.7 ... $ X3.month.average.daily.volume: num 154 7862250 16330 205784 14697 ... $ X50.day.moving.average.price : num 41.8 36.3 30.5 15.2 29.9 ... str(top1000) 'data.frame': 1000 obs. of 1 variable: $ V1: Factor w/ 1000 levels AA,AAI,AAP,..: 146 96 341 814 382 977 66 1 737 595 ... I want to split ysmd into two new data frames: ysmd.top1000 and ysmd.rest so that ysmd.top1000$X.stock only contains factors from top1000$V1 and ysmd.rest$X.stock contains all the other factors. I should be able to just write ysmd.top1000 - ysmd[ysmd$X.stock is in top1000$V1,] ysmd.rest - ysmd[ysmd$X.stock not in top1000$V1,] but how so I check whether a string is a member of a factor? ?%in% -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 11.0.60900031 http://mideasttruth.com http://truepeace.org http://camera.org http://thereligionofpeace.com http://pmw.org.il Professionalism is being dispassionate about your work. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: lasso regression
On Tue, 2011-07-12 at 10:12 -0400, Heiman, Thomas J. wrote: Hi, Hopefully I got the formatting down.. I am trying to do a lasso regression using the lars package with the following data (the data files is in .csv format): V1 V2 V3 V4 V5 V6 V7 V8 V9 1 FastestTime WinPercentage PlacePercentage ShowPercentage BreakAverageFinishAverage Time7AverageTime3AverageFinish 2 116.9 0.14285715 0.14285715 0.2857143 4.4285713.2857144 117.557144 117.76667 5.0 3 116.22857 0.2857143 0.42857143 0.14285715 6.1428572.142857116.84286 116.8 2.0 4 116.41428 0.0 0.14285715 0.2857143 5.7142863.7142856 117.24286 117.14 4.0 5 115.8 0.5714286 0.0 0.2857143 2.1428572.5714285 116.21429 116.5 6.0 #load Data crs- read.csv(file:///C:/temp/Horse//horseracing.csvfile:///C:\temp\Horse\horseracing.csv, na.strings=c(,, NA, , ?), encoding=UTF-8) ## define x and y x= x-crs[,9]#predictor variables y= y-crs[1:8,] #response variable library(lars) cv.lars(x, y, K=10, trace=TRUE, plot.it = TRUE,se = TRUE, type=lasso) and I get: LASSO sequence Error in one %*% x : requires numeric/complex matrix/vector arguments Any idea on what I am doing wrong? Thank you!! Row 1 contains character data, the variable names. Are you missing a `header = TRUE` (this is the default in `read.csv()`), or do you have several header lines? I also think you have the response/predictors back to front there; otherwise, why would you need to shrink the coefficient and select from a model with a single predictor? HTH G Sincerely, tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Avoiding loops to detect number of coincidences
Hi Trying, It would be helpful if you provided reproducible examples. It would also be polite to sign a name so that we have something by which to address you. On Tue, Jul 12, 2011 at 8:00 AM, Trying To learn again tryingtolearnag...@gmail.com wrote: Hi all, I have this information on a file ht.txt, imagine it is a data frame without labels: 1 1 1 8 1 1 6 4 1 3 1 3 3 And on other table called pru.txt I have sequences similar this 4 1 1 8 1 1 6 4 1 3 1 3 3 1 6 1 8 1 1 6 4 1 3 1 3 3 1 1 1 8 1 1 6 4 1 3 1 3 3 6 6 6 8 1 1 6 4 1 3 1 3 3 I want to now how many positions are identical between each row in pru compared with ht. I have no idea what you are trying to do with the loops below, but if you are trying to count matches by row: a reproducible example dput(ht) c(1, 1, 1, 8, 1, 1, 6, 4, 1, 3, 1, 3, 3) dput(pru) structure(list(V1 = c(4L, 1L, 1L, 6L), V2 = c(1L, 6L, 1L, 6L), V3 = c(1L, 1L, 1L, 6L), V4 = c(8L, 8L, 8L, 8L), V5 = c(1L, 1L, 1L, 1L), V6 = c(1L, 1L, 1L, 1L), V7 = c(6L, 6L, 6L, 6L ), V8 = c(4L, 4L, 4L, 4L), V9 = c(1L, 1L, 1L, 1L), V10 = c(3L, 3L, 3L, 3L), V11 = c(1L, 1L, 1L, 1L), V12 = c(3L, 3L, 3L, 3L), V13 = c(3L, 3L, 3L, 3L)), .Names = c(V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, V11, V12, V13 ), class = data.frame, row.names = c(NA, -4L)) # count the positional matches by row apply(pru, 1, function(x)sum(x == ht)) [1] 12 12 13 10 Sarah n and m are the col and row of pru (m is the same number in pru and ht) I tried this with loops n-nrow(pru) m-ncol(pru) dacc2-mat.or.vec(n, m) for (g in 1:n){ for (j in 1:m){ if(pru[g,j]-ht[1,j]!=0) dacc2[g,j]=0 else {dacc2[g,j]=1} } } So when I have dacc2 I can filter this: dar2-pru[colSums(dacc2)2 colSums(dacc2)10,] There is some way to avoid loops? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: lasso regression
On Jul 12, 2011, at 10:12 AM, Heiman, Thomas J. wrote: Hi, Hopefully I got the formatting down.. I am trying to do a lasso regression using the lars package with the following data (the data files is in .csv format): V1 V2 V3 V4 V5 V6 V7 V8 V9 1 FastestTime WinPercentage PlacePercentage ShowPercentage BreakAverage FinishAverage Time7Average Time3Average Finish 2 116.9 0.14285715 0.14285715 0.2857143 4.428571 3.2857144 117.557144 117.76667 5.0 3 116.22857 0.2857143 0.42857143 0.14285715 6.142857 2.142857 116.84286 116.8 2.0 4 116.41428 0.0 0.14285715 0.2857143 5.714286 3.7142856 117.24286 117.14 4.0 5 115.8 0.5714286 0.0 0.2857143 2.142857 2.5714285 116.21429 116.5 6.0 It is now clear that you failed to get your data in properly. Since stringsAsFactors is set to TRUE by default for all of the read.* functions, all of your columns are now factors. Perhaps you had a blank line at the beginning of your data? The default for read.csv (which is just a wrapper with different parameters for read.table) is to set header =TRUE. You should learn to use str() on your data immediately after data entry steps. -- David. #load Data crs- read.csv(file:///C:/temp/Horse//horseracing.csvfile:///C:\temp\Horse\horseracing.csv , na.strings=c(,, NA, , ?), encoding=UTF-8) ## define x and y x= x-crs[,9]#predictor variables y= y-crs[1:8,] #response variable library(lars) cv.lars(x, y, K=10, trace=TRUE, plot.it = TRUE,se = TRUE, type=lasso) and I get: LASSO sequence Error in one %*% x : requires numeric/complex matrix/vector arguments Any idea on what I am doing wrong? Thank you!! Sincerely, tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reorganize data fram
On Jul 12, 2011, at 8:42 AM, anglor wrote: Hi, I have a data frame of about 700 000 rows which look something like this: DateTemperature Category 2007102 16 A 2007102 17 B 2007102 18 C but need it to be: Date TemperatureA TemperatureB TemperatureC 2007102 16 1718 reshape(dat, idvar=Date, timevar=Category, direction=wide) Date Temperature.A Temperature.B Temperature.C 1 2007102161718 Any suggestions? /Angelica -- View this message in context: http://r.789695.n4.nabble.com/Reorganize-data-fram-tp3662123p3662123.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: lasso regression
Hi, (i) As David suggested, please use `dput` to provide examples of data! (ii) The nut of your problem is that you are giving lars an object that it is not expecting. It wants a *matrix* for its `x` variable, as you'll see in the help for ?lars. So, as long as this expression: R is.numeric(x) is.matrix(x) Evaluates to FALSE for your x, you won't get it to work. (iii) Consider using glmnet -- you get the lass for free when you set a=1, but you can also see if the elastic net is helpful. -steve On Tue, Jul 12, 2011 at 10:12 AM, Heiman, Thomas J. thei...@mitre.org wrote: Hi, Hopefully I got the formatting down.. I am trying to do a lasso regression using the lars package with the following data (the data files is in .csv format): V1 V2 V3 V4 V5 V6 V7 V8 V9 1 FastestTime WinPercentage PlacePercentage ShowPercentage BreakAverage FinishAverage Time7Average Time3Average Finish 2 116.9 0.14285715 0.14285715 0.2857143 4.428571 3.2857144 117.557144 117.76667 5.0 3 116.22857 0.2857143 0.42857143 0.14285715 6.142857 2.142857 116.84286 116.8 2.0 4 116.41428 0.0 0.14285715 0.2857143 5.714286 3.7142856 117.24286 117.14 4.0 5 115.8 0.5714286 0.0 0.2857143 2.142857 2.5714285 116.21429 116.5 6.0 #load Data crs- read.csv(file:///C:/temp/Horse//horseracing.csvfile:///C:\temp\Horse\horseracing.csv, na.strings=c(,, NA, , ?), encoding=UTF-8) # # define x and y x= x-crs[,9] #predictor variables y= y-crs[1:8,] #response variable library(lars) cv.lars(x, y, K=10, trace=TRUE, plot.it = TRUE,se = TRUE, type=lasso) and I get: LASSO sequence Error in one %*% x : requires numeric/complex matrix/vector arguments Any idea on what I am doing wrong? Thank you!! Sincerely, tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reorganize data fram
Hi: Try the cast() function in the reshape package. Using d as the name of your data frame, library(reshape) cast(d, Date ~ Category, value = 'Temperature') Date A B C 1 2007102 16 17 18 HTH, Dennis On Tue, Jul 12, 2011 at 5:42 AM, anglor angelica.ekens...@dpes.gu.se wrote: Hi, I have a data frame of about 700 000 rows which look something like this: Date Temperature Category 2007102 16 A 2007102 17 B 2007102 18 C but need it to be: Date TemperatureA TemperatureB TemperatureC 2007102 16 17 18 Any suggestions? /Angelica -- View this message in context: http://r.789695.n4.nabble.com/Reorganize-data-fram-tp3662123p3662123.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] time zone - any practical solution?
On Tue, Jul 12, 2011 at 8:57 AM, B Laura gm.spam2...@gmail.com wrote: Dear Gabor http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windowss=excel doesnt describe handling dates with daylight saving time issues. Two references were given and its discussed in the R News article. It was also mentioned over again in my first post -- namely, don't use POSIXct and then you don't have time zones and so all these problems go away. R classes Date can remove time and timezone, however calculating days You don't have to remove the time zone if you never use POSIXct. Date class has no time zones in the first place. difference between two manipulated variables same problem appear if handling these without Dates. R News 4/1 doesnt provide solution to this neither. It certainly discusses how to choose the appropriate date / time class. Your problem is that you are using the wrong class for the problem whereas you seem to be interpreting it as how to fix it up after having chosen the wrong class. By that time the wrong design decision has already been made and that is what is fundamentally causing the problem. The entire first page of the Rnews article discusses choosing the right class in the first place. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Connecting to Empress DB using RODBC
On Jul 11, 2011, at 9:16 PM, Steve Parker wrote: Hi there, I am using the RODBC library to connect to an Empress database. I have installed the ODBC data source with the server DNs number and port, and named the source Trawl. It is the odbcDriverConnect that seems to have the problem, and I suspect one of the settings in my Data Source is wrong, or that my syntax to identify the database is wrong. I have not set CodeSet or ODBC Version. Here are the error messages. 1: In odbcDriverConnect(Trawl) : [RODBC] ERROR: state 08001, code -256, message [Empress Software][ODBC DLL]Unable to connect to data source 2: In odbcDriverConnect(Trawl) : [RODBC] ERROR: state 01S00, code 0, message [Microsoft][ODBC Driver Manager] Invalid connection string attribute 3: In odbcDriverConnect(Trawl) : ODBC connection failed Can anyone point me in the right direction? Is there a specific syntax for naming the database other than its name? It does bring up a GUI where I can choose my data source and login, but then it just gives the errors above. Any help is greatly appreciated. Steve I would repost your query to r-sig-db: https://stat.ethz.ch/pipermail/r-sig-db/ but also include information on your OS (presumably some version of Windows), the version of R and the version of RODBC, being sure that you are running the latest of each (R 2.13.1 and RODBC 1.3-2). Also include the actual function calls you are making along with the error messages, being sure to mask your userID and password where included. We can't tell you if your syntax is wrong if you don't include it. The above errors could be your syntax or perhaps an ODBC configuration error See vignette(RODBC) for general information on creating a proper Windows DSN for your database. If Empress has any kind of ODBC client or if you can use something like Excel or MS Query to connect via ODBC, you can test your connection to the database independent of R. That will help assess if your problem is your basic ODBC configuration or if there is something specific to R/RODBC such as your syntax or perhaps a driver issue. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] time zone - any practical solution?
If you don't need POSIXt types, as Gabor says don't use them. However, there are good reasons to use them sometimes, and the most workable solution I have found is to set your default timezone in R to a non-DST timezone before you convert from character to POSIXct. This is dependent on your OS and particular build of R, but on Windows and Linux I have found that Sys.setenv(TZ=Etc/GMT-2) at the beginning of my R session should handle Central European Standard Time. To identify which time zones are supported on your system, read the documentation (there is generally a zoneinfo directory somewhere with filenames matching the time zones). --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. B Laura gm.spam2...@gmail.com wrote: Dear Gabor http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windowss=excel doesnt describe handling dates with daylight saving time issues. R classes Date can remove time and timezone, however calculating days difference between two manipulated variables same problem appear if handling these without Dates. R News 4/1 doesnt provide solution to this neither. Have read and struggled with this stuff for 3 days. Anyone else who could help on this? Regards,Laura. 2011/7/12 Gabor Grothendieck ggrothendi...@gmail.com On Tue, Jul 12, 2011 at 6:58 AM, B Laura gm.spam2...@gmail.com wrote: Hello all, Could someone help me with the time zones in understandable practical way? I got completely stucked with this. Have googled for a while and read the manuals, but without solutions... _ When data imported from Excel 2007 into R (2.13) all time variables, depending on date (summer or winter) get (un-asked for it!) a time zone addition CEST (for summer dates) or CET (for winter dates). Read http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windowss=excel which gives many ways of reading Excel into R and read R News 4/1 which discusses appropriate R classes to use (you would b best to use Date, not POSIXct, in which case you could not have time zone problems in the first place) and internal representations of R vs. Excel. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com [[alternative HTML version deleted]] _ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adding a correlation value (like Rsquared) to a 4 parameter logistic fit model.
Hello, In my lab we use a four parameter logistic fit model for our ELISA data (absorbance values). We are currently testing the use of different solvents and need to find a way to add a correlation value (such as an R squared or something similar) so we can test different solvents in making this standard curve. We currently use the drc package and this is our script for the 4 parameter: SC-read.delim(file = C:/Documents and Settings/rekem/My Documents/SCBook.txt, header = T, check.names = FALSE, as.is = TRUE) FourP-drm(Response~Expected, data = SC, fct = LL.4()) plot(FourP, main = LTB4 Standard Curve Zi Phase 7, xlab = Expected (pg/mL), ylab = Response (%Bound)) Thanks for any help. Kevin McEnroy -- View this message in context: http://r.789695.n4.nabble.com/Adding-a-correlation-value-like-Rsquared-to-a-4-parameter-logistic-fit-model-tp3662480p3662480.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm: mark sample used in estimation
Thanks Peter, Ted! Best, Anirban On Tue, Jul 12, 2011 at 4:54 AM, Ted Harding ted.hard...@wlandres.net wrote: On 11-Jul-11 07:55:44, Anirban Mukherjee wrote: Hi all, I wanted to mark the estimation sample: mark what rows (observations) are deleted by lm due to missingness. For eg, from the original example in help, I have changed one of the values in trt to be NA (missing). # code below # # original example ctl - c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt - c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) # change 18th observation of trt trt - c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,NA,4.32,4.69) group - gl(2,10,20, labels=c(Ctl,Trt)) weight - c(ctl, trt) lm.D9 - lm(weight ~ group) summary(lm.D9) Call: lm(formula = weight ~ group) Residuals: Min__ 1Q__ Median__ 3Q_ Max -1.04556 -0.48378_ 0.05444_ 0.23622_ 1.39444 Coefficients: ___ Estimate Std. Error t value Pr(|t|) (Intercept)__ 5.0320 0.2258_ 22.281 5.09e-14 *** groupTrt -0.3964 0.3281_ -1.208___ 0.244 --- Signif. codes:_ 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.7142 on 17 degrees of freedom _ (1 observation deleted due to missingness) Multiple R-squared: 0.07907,___ Adjusted R-squared: 0.0249 F-statistic:_ 1.46 on 1 and 17 DF,_ p-value: 0.2435 # -- # end snippet I want to generate an indicator variable to mark the observations used in estimation: 1 for a row not deleted, 0 for a row deleted. In this case I want an indicator variable that has seventeen 1s, one 0, and then 2 1s. I know I can do ind = !is.na(group) in the above example. But I am ideally looking for a way that allows one to use any formula in lm, and still be able to mark the estimation sample. Function/option I am missing? The best I could come up with: lm.D9 - lm(weight ~ group, model=TRUE) ind - as.numeric(row.names(lm.D9$model)) esamp - rep(0,length(group)) #substitute nrow(data.frame used in estimation) for length(group) esamp[ind] - 1 esamp [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 Is this safe (recommended?)? Appreciate any help. Best, A Separately from Peter Dalgaard's response, you raise a generic quedtion about how to find out which observations have been used in an LM fit when some cases may have been omitted, e.g. because of missing values (NA). Take the following as an example: X - (1:10) Y - X + rnorm(10) LM - lm(Y ~ X) X1 - X X1[c(4,8)] - NA ## so cases 4 8 will be omitted LM1 - lm(Y ~ X1) row.names(LM$model) # [1] 1 2 3 4 5 6 7 8 9 10 row.names(LM1$model) # [1] 1 2 3 5 6 7 9 10 which( (row.names(LM$model) %in% row.names(LM1$model)) ) # [1] 1 2 3 5 6 7 9 10 ### These are the indices of the cases which were kept which(!(row.names(LM$model) %in% row.names(LM1$model)) ) # [1] 4 8 ### These are this indices of the cases which were omitted You could also use 'names(LM$res)' and 'names(LM1$res)' instead of 'row.names(LM$model' and 'row.names(LM$model)' in the above. Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 11-Jul-11 Time: 21:54:05 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] What's wrong with my code?
I've written out codes for one particular file, and now I want to generate the same kind of graphs and files for the rest of similar data files. When I plugged in these codes, R produced only one plot for the file eight, and it states my error(see below) I have edited and checked my codes so many times but still couldn't figure out what's wrong with it...would you please help me? Thanks! my.files - list.files() for (i in 1: length(my.files)) { temp.dat - read.csv(my.files[i]) eight - read.csv(file=8.csv, header=TRUE, sep=,) eightout - subset(eight, inout==Outgoing from panel hh o_duration0, select=c(inout, enc_callee, o_duration)) f - function(eightoutf) nrow(eightoutf) eightnocalls - ddply(eightout,.(enc_callee),f) colnames(eightnocalls)[2] - nocalls eightout$nocalls - eightnocalls$nocalls [match(eightout$enc_callee, eightnocalls$enc_callee)] eightout=data.frame(eightout,time=c(1:nrow(eightout))) plot(eightout$time,eightout$nocalls) write.csv(eightout, eight.csv, row.names=FALSE) pdf(paste(Sys.Date(),_,my.files[i],_.pdf, sep=)) plot(temp.dat$time, temp.dat$nocalls, main=my.files[i]) dev.off() write.csv(temp.dat, paste(Sys.Date(),_,my.files[i],_.csv, sep=), row.names=FALSE) } R says: need finite 'xlim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf -- View this message in context: http://r.789695.n4.nabble.com/What-s-wrong-with-my-code-tp3662579p3662579.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot means ?
On 2011-07-12 07:03, Sam Steingold wrote: [snip] the totally unnecessary semi-colons) then why are they accepted? optional syntax elements suck... They're accepted because they *can* be useful (multiple statements on one line). Is there *any* language that can *not* be abused? Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What's wrong with my code?
Hi Susie, At a guess, there are no non-missing arguments to min or max. But no, we can't help you. You haven't provided a minimal reproducible example, and without knowing anything about your data it is impossible for the list to offer any constructive suggestions. The posting guide offers suggestions for doing that. In particular, dput() and str() are both very useful. Sarah On Tue, Jul 12, 2011 at 11:15 AM, Susie susiecrab_l...@hotmail.com wrote: I've written out codes for one particular file, and now I want to generate the same kind of graphs and files for the rest of similar data files. When I plugged in these codes, R produced only one plot for the file eight, and it states my error(see below) I have edited and checked my codes so many times but still couldn't figure out what's wrong with it...would you please help me? Thanks! my.files - list.files() for (i in 1: length(my.files)) { temp.dat - read.csv(my.files[i]) eight - read.csv(file=8.csv, header=TRUE, sep=,) eightout - subset(eight, inout==Outgoing from panel hh o_duration0, select=c(inout, enc_callee, o_duration)) f - function(eightoutf) nrow(eightoutf) eightnocalls - ddply(eightout,.(enc_callee),f) colnames(eightnocalls)[2] - nocalls eightout$nocalls - eightnocalls$nocalls [match(eightout$enc_callee, eightnocalls$enc_callee)] eightout=data.frame(eightout,time=c(1:nrow(eightout))) plot(eightout$time,eightout$nocalls) write.csv(eightout, eight.csv, row.names=FALSE) pdf(paste(Sys.Date(),_,my.files[i],_.pdf, sep=)) plot(temp.dat$time, temp.dat$nocalls, main=my.files[i]) dev.off() write.csv(temp.dat, paste(Sys.Date(),_,my.files[i],_.csv, sep=), row.names=FALSE) } R says: need finite 'xlim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: lasso regression
On 07/12/2011 09:53 AM, Heiman, Thomas J. wrote: ## define x and y x= x-crs[,9]#predictor variables y= y-crs[1:8,] #response variable This cannot be correct. The response variable is a vector, while the predictor variables form a matrix. You have the response variable consisting of only the first 8 observations, then all the columns. Perhaps you mean: X - crs[,1:8] y - crs[,9] If this is not the case, please include the output of head(crs) and then tell us which variable is your response. -- Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Role of na.rm inside mean()
This is just posed out of curiosity, (not as a criticism per se). But what is the functional role of the argument na.rm inside the mean() function? If there are missing values, mean() will always return an NA as in the example below. But, is there ever a purpose in computing a mean only to receive NA as a result? In 10 years of using R, I have always used mean() in order to get a result, which is the opposite of its default behavior (when there are NAs). Can anyone suggest a reason why it is in fact desired to get NA as a result of computing mean()? x - rnorm(100) x[1] - NA mean(x) [1] NA mean(x, na.rm=TRUE) [1] 0.08136736 If the reason is to alert the user that the vector has missing values, I suppose I could buy that. But, I think other checks are better Harold [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Role of na.rm inside mean()
In SQL, the default is to ignore NULL (equivalent to NA in R). However, it can be dangerous to fail to verify how much data was actually used in an aggregation, so the logic behind the default na.rm setting may be one of encouraging the user to take responsibility for missing data. --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Doran, Harold hdo...@air.org wrote: This is just posed out of curiosity, (not as a criticism per se). But what is the functional role of the argument na.rm inside the mean() function? If there are missing values, mean() will always return an NA as in the example below. But, is there ever a purpose in computing a mean only to receive NA as a result? In 10 years of using R, I have always used mean() in order to get a result, which is the opposite of its default behavior (when there are NAs). Can anyone suggest a reason why it is in fact desired to get NA as a result of computing mean()? x - rnorm(100) x[1] - NA mean(x) [1] NA mean(x, na.rm=TRUE) [1] 0.08136736 If the reason is to alert the user that the vector has missing values, I suppose I could buy that. But, I think other checks are better Harold [[alternative HTML version deleted]] _ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Role of na.rm inside mean()
On 12/07/2011 12:26 PM, Doran, Harold wrote: This is just posed out of curiosity, (not as a criticism per se). But what is the functional role of the argument na.rm inside the mean() function? If there are missing values, mean() will always return an NA as in the example below. But, is there ever a purpose in computing a mean only to receive NA as a result? The general idea in R is that NA stands for unknown. If some of the values in a vector are unknown, then the mean of the vector is also unknown. NA is also used in other ways sometimes; then it makes sense to remove it and compute the mean of the other values. Duncan Murdoch In 10 years of using R, I have always used mean() in order to get a result, which is the opposite of its default behavior (when there are NAs). Can anyone suggest a reason why it is in fact desired to get NA as a result of computing mean()? x- rnorm(100) x[1]- NA mean(x) [1] NA mean(x, na.rm=TRUE) [1] 0.08136736 If the reason is to alert the user that the vector has missing values, I suppose I could buy that. But, I think other checks are better Harold [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Role of na.rm inside mean()
Hi Harold, Many (most?) of the statistics function have a similar argument. I suspect it is sort of to warn the user---you have to be explicit about it rather than the program just silently removing or ignoring values that would not work in the function called. I can think of one example where I want a missing value returned. In psychology we often create scores on some construct (say optimism), by averaging individuals' response to several questions. In certain cases if a subject does not respond to one question, their overall score should be missing. This is easily accomplished by letting na.rm = FALSE. Cheers, Josh On Tue, Jul 12, 2011 at 9:26 AM, Doran, Harold hdo...@air.org wrote: This is just posed out of curiosity, (not as a criticism per se). But what is the functional role of the argument na.rm inside the mean() function? If there are missing values, mean() will always return an NA as in the example below. But, is there ever a purpose in computing a mean only to receive NA as a result? In 10 years of using R, I have always used mean() in order to get a result, which is the opposite of its default behavior (when there are NAs). Can anyone suggest a reason why it is in fact desired to get NA as a result of computing mean()? x - rnorm(100) x[1] - NA mean(x) [1] NA mean(x, na.rm=TRUE) [1] 0.08136736 If the reason is to alert the user that the vector has missing values, I suppose I could buy that. But, I think other checks are better Harold [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] For applying formula in rows
Dear all, I have a problem and it is very difficult for me to get a code. I am reading a file(attached with this mail) using the code- df=read.table(summary.txt,fill=T,sep=,colClasses = character,header=T) and dataframe df is like this- V1V2 CaseA CaseC CaseG CaseT new 10 135344109 0 0 1 012 10 135344110 0 1 0 0 12 10 135344111 0 0 1 0 12 10 135344112 0 0 1 0 12 10 135344113 0 0 1 0 12 10 135344114 1 0 0 0 12 10 135344115 1 0 0 0 12 10 135344116 0 0 0 1 12 10 135344117 0 1 0 0 12 10 135344118 0 0 0 112 I want to apply a formula which is (number/total)*new*2. where number is in column caseA,G,C,T and total is sum of these 4 columns.I will explain with an example.the output of first row should be- V1V2 CaseA CaseC CaseG CaseT new 10 135344109 0 0 24 012 because sum of 3rd,4th,5th and 6th column is 1 for first row.and for case A,C and T if we will apply above formula the answer will be zero (0/1)*12*2 which is equal to 0 but for Case G- (1/1)*12*2 which is equal to 24. Can you please help me. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London V1V2 CaseA CaseC CaseG CaseT new 10 135344109 0 0 1 0 12 10 135344110 0 1 0 0 12 10 135344111 0 0 1 0 12 10 135344112 0 0 1 0 12 10 135344113 0 0 1 0 12 10 135344114 1 0 0 0 12 10 135344115 1 0 0 0 12 10 135344116 0 0 0 1 12 10 135344117 0 1 0 0 12 10 135344118 0 0 0 1 12 10A*A0 0 0 0 12 10 135344120 1 0 0 0 12 10 135344121 0 0 1 0 12 10 135344122 0 1 0 0 12 10 135344123 0 1 0 0 12 10 135344124 0 1 0 0 12 10 135344125 0 0 0 1 12 10 135344126 0 0 1 0 12 10 135344127 0 0 1 0 12 10 135344128 0 1 0 0 12 10 135344129 0 1 0 0 12 10 135344130 0 0 0 1 12 10 135344185 0 1 0 0 12 10 135344186 1 0 0 0 12 10 135344187 0 0 1 0 12 10 135344188 1 0 0 0 12 10 135344189 0 1 0 0 12 10 135344190 0 0 0 1 12 10 135344191 0 0 1 0 12 10 135344192 0 1 0 0 12 10 135344193 0 1 0 0 12 10 135344194 0 1 0 0 12 10 135344195 0 0 0 1 12 10 135344196 0 0 1 0 12 10 135344197 0 1 0 0 12 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] For applying formula in rows
Hi Vikas, Here is one way: df - read.table(summary.txt, header = TRUE) str(df) df[, total] - rowSums(df[, 3:6]) df[, 3:6] - apply(df[, 3:6], 2, function(x) x / df[, total] * df[, new] * 2) head(df) V1V2 CaseA CaseC CaseG CaseT new total 1 10 135344109 0 024 0 12 1 2 10 135344110 024 0 0 12 1 3 10 135344111 0 024 0 12 1 4 10 135344112 0 024 0 12 1 5 10 135344113 0 024 0 12 1 6 10 13534411424 0 0 0 12 1 Note that I read the data in differently than you did. This matters. Cheers, Josh 2011/7/12 Bansal, Vikas vikas.ban...@kcl.ac.uk: Dear all, I have a problem and it is very difficult for me to get a code. I am reading a file(attached with this mail) using the code- df=read.table(summary.txt,fill=T,sep=,colClasses = character,header=T) and dataframe df is like this- V1 V2 CaseA CaseC CaseG CaseT new 10 135344109 0 0 1 0 12 10 135344110 0 1 0 0 12 10 135344111 0 0 1 0 12 10 135344112 0 0 1 0 12 10 135344113 0 0 1 0 12 10 135344114 1 0 0 0 12 10 135344115 1 0 0 0 12 10 135344116 0 0 0 1 12 10 135344117 0 1 0 0 12 10 135344118 0 0 0 1 12 I want to apply a formula which is (number/total)*new*2. where number is in column caseA,G,C,T and total is sum of these 4 columns.I will explain with an example.the output of first row should be- V1 V2 CaseA CaseC CaseG CaseT new 10 135344109 0 0 24 0 12 because sum of 3rd,4th,5th and 6th column is 1 for first row.and for case A,C and T if we will apply above formula the answer will be zero (0/1)*12*2 which is equal to 0 but for Case G- (1/1)*12*2 which is equal to 24. Can you please help me. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] installation of package 'mapproj' had non-zero exit status
## Hello.. I have asked a similar question, but this is not fixed as before. ## I am running the following using Ubuntu OS: R version 2.13.1 (2011-07-08) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-linux-gnu (64-bit) ## when I do this: install.packages(mapproj, dependencies=T) ## I get this: Installing package(s) into â/home/brad/R/x86_64-pc-linux-gnu-library/2.13â (as âlibâ is unspecified) also installing the dependency âmapsâ trying URL 'http://cran.case.edu/src/contrib/maps_2.1-6.tar.gz' Content type 'application/x-gzip' length 1371854 bytes (1.3 Mb) opened URL == downloaded 1.3 Mb trying URL 'http://cran.case.edu/src/contrib/mapproj_1.1-8.3.tar.gz' Content type 'application/x-gzip' length 23955 bytes (23 Kb) opened URL == downloaded 23 Kb * installing *source* package âmapsâ ... ** libs ** arch - gcc -std=gnu99 -O3 -pipe -gGmake.c -o Gmake Gmake.c: In function âget_lhâ: Gmake.c:111: warning: cast from pointer to integer of different size Gmake.c:113: warning: cast from pointer to integer of different size Gmake.c: In function âmainâ: Gmake.c:211: warning: cast from pointer to integer of different size Gmake.c:214: warning: cast from pointer to integer of different size Gmake.c:217: warning: cast from pointer to integer of different size Gmake.c:219: warning: cast from pointer to integer of different size Gmake.c:221: warning: cast from pointer to integer of different size Gmake.c:224: warning: cast from pointer to integer of different size Gmake.c:227: warning: cast from pointer to integer of different size gcc -std=gnu99 -O3 -pipe -gLmake.c -o Lmake Lmake.c: In function âmainâ: Lmake.c:223: warning: cast from pointer to integer of different size Lmake.c:228: warning: cast from pointer to integer of different size Lmake.c:230: warning: cast from pointer to integer of different size Lmake.c:232: warning: cast from pointer to integer of different size Lmake.c:235: warning: cast from pointer to integer of different size Converting world to world2 f convert.awk world.line world2.line /bin/bash: f: command not found make: [world2.line] Error 127 (ignored) make county.L state.L usa.L nz.L world.L world2.L italy.L france.L make[1]: Entering directory `/tmp/RtmpssTER5/R.INSTALL21eb6525/maps/src' ./Lmake 0 s b county.line county.linestats ../inst/mapdata/county.L ./Lmake 0 s b state.line state.linestats ../inst/mapdata/state.L ./Lmake 0 s b usa.line usa.linestats ../inst/mapdata/usa.L ./Lmake 0 s b nz.line nz.linestats ../inst/mapdata/nz.L ./Lmake 0 s b world.line world.linestats ../inst/mapdata/world.L ./Lmake 0 s b world2.line world2.linestats ../inst/mapdata/world2.L Cannot read left and right at line 1 make[1]: *** [world2.L] Error 1 make[1]: Leaving directory `/tmp/RtmpssTER5/R.INSTALL21eb6525/maps/src' make: *** [ldata] Error 2 ERROR: compilation failed for package âmapsâ * removing â/home/brad/R/x86_64-pc-linux-gnu-library/2.13/mapsâ ERROR: dependency âmapsâ is not available for package âmapprojâ * removing â/home/brad/R/x86_64-pc-linux-gnu-library/2.13/mapprojâ The downloaded packages are in â/tmp/RtmpwXL9El/downloaded_packagesâ Warning messages: 1: In install.packages(mapproj, dependencies = T) : installation of package 'maps' had non-zero exit status 2: In install.packages(mapproj, dependencies = T) : installation of package 'mapproj' had non-zero exit status ## Any idea as to why? this also happens when I try to install the 'maps' package -- View this message in context: http://r.789695.n4.nabble.com/installation-of-package-mapproj-had-non-zero-exit-status-tp3662940p3662940.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] elimination duplicate elements sampling!
On 7/7/2011 3:23 PM, elephann wrote: Hi everyone! I have a data frame with 1112 time series and I am going to randomly sampling r samples for z times to compose different portfolio size(r securities portfolio). As for r=2 and z=1,that's: z=1 A=seq(1:1112) x1=sample(A,z,replace =TRUE) x2=sample(A,z,replace =TRUE) M=cbind(x1,x2) # combination of 2 series Because in a portfolio with x1[i]=x2[i],(i=1,2,...,1) means a 1 securities' portfolio,not 2 securities',it should be eliminated and resampling. With r increase, for example r=k, how do I efficiently eliminated all such portfolio as x1[i]=x2[i]=...=xk[i]? Why not sample without replacement the r portfolios, and replicate that z times? z - 1 # number of replicates r - 2 # number in each replicate A - 1:1112 # space to sample from M - t(replicate(z, sample(A, r))) Besides, any r securities' portfolio with the same securities' combination means the same portfolio(given same weights as here), e.g. M(x1[i],x5[i],x7[i],x1000[i]) and M(x5[i],x7[i],x1[i],x1000[i]) or M(x1[i],x7[i],x5[i],x1000[i]) are the same, how do I efficiently eliminat these possibilities? Do you mean you don't want any of the replicates to be the same? You can eliminate duplicates M - t(replicate(z, sort(sample(A, r M - M[!duplicated(M),] Or you can create all possible portfolios of size r, and sample z from that without replacement to do it in one pass. cmb - t(combn(A, r)) M - cmb[sample(nrow(cmb), z),] Note this is not practical for r 2. cmb is an array of size r by choose(length(A), r) (which is 2 x 617716 in this case). In fact, for r 3, this won't even work with the 1112 sample space. For r = 3, cmb is 3 x 228554920. But for the three portfolio case, the probability of getting a duplicate portfolio is small. Better is to sample a few extra so that you still have sufficient after throwing out duplicates M - t(replicate(1.01*z, sort(sample(A, r M - M[!duplicated(M),][1:z,] The 1.01 multiplier may not be big enough; there is no multiplier that will guarantee that you will have z samples when you are done. Although the second line will throw an error if there are not z unique samples, so it may be easier to pick up. -- View this message in context: http://r.789695.n4.nabble.com/elimination-duplicate-elements-sampling-tp3652791p3652791.html Sent from the R help mailing list archive at Nabble.com. -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health Science University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] What's wrong with my code? (Edited version-added my data)
I've written out codes for one particular file, and now I want to generate the same kind of graphs and files for the rest of similar data files. For example, a file 8.csv would look like such: enc_callee inout o_duration type A out 342 de B in 234 de C out 132 de E in 111 de A in 13 cf H in 15.7cf G out 32 de A out 32 cf I in 14 de K in 189 de J out 34.1cf B in 98.7de H out 23 de C out 43 cf H in 567 cf I out 12 de E out 12 de K out 12 cf B in 1 cf A out 29 de D out 89 cf J in 302 de H in 12 cf A in 153 cf C out 233 de My command to deal with this simple file would be: eight - read.csv(file=8.csv, header=TRUE, sep=,) eightout - subset(eight, inout==out o_duration0, select=c(inout, enc_callee, o_duration)) f - function(eightoutf) nrow(eightoutf) eightnocalls - ddply(eightout,.(enc_callee),f) colnames(eightnocalls)[2] - nocalls eightout$nocalls - eightnocalls$nocalls [match(eightout$enc_callee, eightnocalls$enc_callee)] eightout=data.frame(eightout,time=c(1:nrow(eightout))) plot(eightout$time,eightout$nocalls) write.csv(eightout, eightM.csv, row.names=FALSE) And then, R will produce eightM.csv as such: inout enc_callee o_duration nocalls time 1out A 342.0 31 3out C 132.0 32 7out G 32.0 13 8out A 32.0 34 11 out J 34.1 15 13 out H 23.0 1 6 14 out C 43.0 3 7 16 out I 12.0 1 8 17 out E 12.0 1 9 18 out K 12.0 1 10 20 out A 29.0 3 11 21 out D 89.0 1 12 25 out C 233.0 3 13 I will also get a plot http://r.789695.n4.nabble.com/file/n3662910/eightM.png What I want to do now, is that I have a few hundred similar files, and I want to generate the same type of plots and files, so I've written the following codes, however, R states that there's some error. I've tried editing many times but wasn't successful. my.files - list.files() for (i in 1: length(my.files)) { temp.dat - read.csv(my.files[i]) eight - read.csv(file=8.csv, header=TRUE, sep=,) eightout - subset(eight, inout==out o_duration0, select=c(inout, enc_callee, o_duration)) f - function(eightoutf) nrow(eightoutf) eightnocalls - ddply(eightout,.(enc_callee),f) colnames(eightnocalls)[2] - nocalls eightout$nocalls - eightnocalls$nocalls [match(eightout$enc_callee, eightnocalls$enc_callee)] eightout=data.frame(eightout,time=c(1:nrow(eightout))) plot(eightout$time,eightout$nocalls) write.csv(eightout, eight.csv, row.names=FALSE) pdf(paste(Sys.Date(),_,my.files[i],_.pdf, sep=)) plot(temp.dat$time, temp.dat$nocalls, main=my.files[i]) dev.off() write.csv(temp.dat, paste(Sys.Date(),_,my.files[i],_.csv, sep=), row.names=FALSE) } R says: need finite 'xlim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf I wonder what went wrong with my codes, please help me! Thank you very much!! -- View this message in context: http://r.789695.n4.nabble.com/What-s-wrong-with-my-code-Edited-version-added-my-data-tp3662910p3662910.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Explain how it gets back out?
Probability - function(N, f, m, b, x, t) { #N is the number of lymph nodes #f is the fraction of Dendritic cells (in the correct node) that have the antigen #m is the number of time steps #b is the starting position (somewhere in the node or somewhere in the gap between nodes. It is a number between 1 and (x+t)) #x is the number of time steps it takes to traverse the gap #t is the number of time steps it takes to traverse a node. A - 1/N B - 1-A C - 1-f D - (((m+b-1)%%(x+t))+1) if (b=t) {starts inside node if (m=(t-b)){return(B + A*(C^m))} # start end in first node if (D=t) { # we finish in a node a - (B + A*(C^(t-b))) #first node b - ((B + A*(C^t))^(floor((m+b)/(x+t))-1)) # intermediate nodes (if any) c - (B + A*(C^D)) # last node d - (a*b*c) return(d) } else {Probability(N, f, (m-1), b, x, t)} ## finish in a gap } else {## starts outside node if (m=(x+t-b)) {return(1)} #also end in the gap if (D=t) { #end in a node b - ((B + A*(C^t))^(floor((m/(x+t) c - (B + (A*(C^D))) d - (b*c) return(d) } else {Probability(N, f, (m-1), b, x, t)} #outside node } } I have the following code and I know it works, but I need to explain what is going on, particularly with the recursion. Is it true that when each call finishes - it will pass a quantity back to the next generation above until you return to the start of the chain, then outputs the final result.? If so, could someone explain it in a bit more clearly? if not, how does the recursion work - how does it finally output a value? -- View this message in context: http://r.789695.n4.nabble.com/Explain-how-it-gets-back-out-tp3662928p3662928.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subsetting NaN values in localG()
Hi, I'm currently trying to calculate local Getis-Ord Gi* statistics for a 169x315 cell matrix of temperature values, below is the code I currently have (diffc is the data vector I am removing NaN values from, and I am moving said values to diffD; -999 represents NaN values; id contains ID values for cells I want to use in the calculation, which I already know to contain 25064 values): /counter = 1 diffD = array(0,25064) id = array(0,25064) for(i in 1:53235){ if(diffc[i]!=-999){ diffD[counter] = diffc[i] id[counter] = i counter = counter+1 } } ##Isolates values I want to use in localG calculation neigh = cell2nb(169,315,type=='queen') neigh2 = subset.nb(neigh,(1:length(neigh) %in% id) mylist = nb2listw(neigh2,style=B) stats = localG(diffD,mylist)/ Unfortunately, when I get to the last line of the code, I receive the following error: /Error in matrix(0, nrow = nrow(x), ncol = ncol(x)) : invalid 'ncol' value (too large or NA)/ I can't figure out what it is referring to, as I have verified that there are no NA values and ncol should only be 1, as diffD and mylist are the same size (25064 data regions). My data works when I don't remove the cells with values of -999, however it returns some ridiculous Z-values (as expected). All I can think of is that I'm either using subset.nb() incorrectly or subset.nb() isn't returning a useable nb object in localG(). I'm basically trying to mimic ArcGIS' Hot Spot Analysis to locate cold and hot spots spatially in this code. Thanks, Dan -- View this message in context: http://r.789695.n4.nabble.com/Subsetting-NaN-values-in-localG-tp3662781p3662781.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.numeric
It works well. Thanks so much. -- View this message in context: http://r.789695.n4.nabble.com/as-numeric-tp3661739p3662671.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to smoothen a geodata set in R
Hello, I'm new to this list. Sorry if my question or parts of it already came up before. For my research in geostatistics, I am working with large sets of data in R (basically large matrices containing discrete x and y coordinates and a value for a certain parameter). These sets are obtained by kriging. The operation I'd like to perform is smoothen the output data set. I want to do it by adding each data point and its 8 surrounding points and dividing this by nine (gives an average), and then replacing each element in the matrix with the result. Question 1: is there a way to address the parameter value of a single element (for example, the value for element [x=452, y=682] inside the matrix) and perform an operation on it in R? Question 2: is there a way to programm R into a loop, so that the same operation can be performed on all elements inside the matrix? Question 3: is it a problem if my data is geodata (made with the geoR library)? -- View this message in context: http://r.789695.n4.nabble.com/How-to-smoothen-a-geodata-set-in-R-tp3662902p3662902.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What's wrong with my code? (Edited version-added my data)
Dear Susie, See inline for some suggestions, but generally, I think you would benefit from breaking this down into smaller pieces. The error you are getting indicates the problem has to do with the plotting, but that will be trickier to isolate while also dealing with reading in data, looping, etc. On Tue, Jul 12, 2011 at 10:11 AM, Susie susiecrab_l...@hotmail.com wrote: I've written out codes for one particular file, and now I want to generate the same kind of graphs and files for the rest of similar data files. For example, a file 8.csv would look like such: enc_callee inout o_duration type A out 342 de B in 234 de C out 132 de E in 111 de A in 13 cf H in 15.7 cf G out 32 de A out 32 cf I in 14 de K in 189 de J out 34.1 cf B in 98.7 de H out 23 de C out 43 cf H in 567 cf I out 12 de E out 12 de K out 12 cf B in 1 cf A out 29 de D out 89 cf J in 302 de H in 12 cf A in 153 cf C out 233 de My command to deal with this simple file would be: eight - read.csv(file=8.csv, header=TRUE, sep=,) eightout - subset(eight, inout==out o_duration0, select=c(inout, enc_callee, o_duration)) f - function(eightoutf) nrow(eightoutf) eightnocalls - ddply(eightout,.(enc_callee),f) colnames(eightnocalls)[2] - nocalls eightout$nocalls - eightnocalls$nocalls [match(eightout$enc_callee, eightnocalls$enc_callee)] eightout=data.frame(eightout,time=c(1:nrow(eightout))) plot(eightout$time,eightout$nocalls) write.csv(eightout, eightM.csv, row.names=FALSE) And then, R will produce eightM.csv as such: inout enc_callee o_duration nocalls time 1 out A 342.0 3 1 3 out C 132.0 3 2 7 out G 32.0 1 3 8 out A 32.0 3 4 11 out J 34.1 1 5 13 out H 23.0 1 6 14 out C 43.0 3 7 16 out I 12.0 1 8 17 out E 12.0 1 9 18 out K 12.0 1 10 20 out A 29.0 3 11 21 out D 89.0 1 12 25 out C 233.0 3 13 I will also get a plot http://r.789695.n4.nabble.com/file/n3662910/eightM.png What I want to do now, is that I have a few hundred similar files, and I want to generate the same type of plots and files, so I've written the following codes, however, R states that there's some error. I've tried editing many times but wasn't successful. my.files - list.files() for (i in 1: length(my.files)) { temp.dat - read.csv(my.files[i]) Maybe Im missing something, but starting here, I do not see anything that changes with each iteration of your loop. It will just keep reading in, editing and writing out 8.csv over and over. If I'm right, then you should just move this part outside of the loop so it is just done once. eight - read.csv(file=8.csv, header=TRUE, sep=,) eightout - subset(eight, inout==out o_duration0, select=c(inout, enc_callee, o_duration)) f - function(eightoutf) nrow(eightoutf) eightnocalls - ddply(eightout,.(enc_callee),f) colnames(eightnocalls)[2] - nocalls eightout$nocalls - eightnocalls$nocalls [match(eightout$enc_callee, eightnocalls$enc_callee)] eightout=data.frame(eightout,time=c(1:nrow(eightout))) plot(eightout$time,eightout$nocalls) write.csv(eightout, eight.csv, row.names=FALSE) {end part that does not seem to change} pdf(paste(Sys.Date(),_,my.files[i],_.pdf, sep=)) plot(temp.dat$time, temp.dat$nocalls, main=my.files[i]) From the error, my guess is that the problem is right here. Try looking at temp.dat$time and temp.dat$nocalls to see if the data are appropriate for plotting. Are any of the pdfs and files getting produced? If yes, this would strongly suggest that your code is working, but some of your data files are not plottable. Something else you could try would be to add str(temp.dat) right after you read in the data in your loop, this should print out the basic structure of the data and might give you some clues. HTH, Josh dev.off() write.csv(temp.dat, paste(Sys.Date(),_,my.files[i],_.csv, sep=), row.names=FALSE) } R says: need finite 'xlim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to
[R] Deviance of zeroinfl/hurdle models
Dear list, I'm wondering if anyone can help me calculate the deviance of either a zeroinfl or hurdle model from package pscl? Even if someone could point me to the correct formula for calculating the deviance, I could do the rest on my own. I am trying to calculate a pseudo-R-squared measure based on the R^{2}_{DEV} of [1], so I need to be able to calculate the deviance of the full and null models. Does anyone have any suggestions? Alternatively, does anyone have a suggestion for a better measure to report (I'm aware that R^2 measures aren't really appropriate here), preferably something that is easy enough to program or compute using existing packages... Thanks in advance, Carson [1] Cameron, A.C., Windmeijer, F.A.G., 1996. R^2 measures for count data regression models with applications to health-care utilization. J. Bus. Econom. Statist. 14, 209–220 -- Carson J. Q. Farmer ISSP Doctoral Fellow National Centre for Geocomputation National University of Ireland, Maynooth, http://www.carsonfarmer.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Explain how it gets back out?
On 12-Jul-11 17:18:26, mousy0815 wrote: Probability - function(N, f, m, b, x, t) { #N is the number of lymph nodes #f is the fraction of Dendritic cells (in the correct node) that have the antigen #m is the number of time steps #b is the starting position (somewhere in the node or somewhere in the gap between nodes. It is a number between 1 and (x+t)) #x is the number of time steps it takes to traverse the gap #t is the number of time steps it takes to traverse a node. A - 1/N B - 1-A C - 1-f D - (((m+b-1)%%(x+t))+1) if (b=t) {starts inside node if (m=(t-b)){return(B + A*(C^m))} # start end in first node if (D=t) { # we finish in a node a - (B + A*(C^(t-b))) #first node b - ((B + A*(C^t))^(floor((m+b)/(x+t))-1)) # intermediate nodes (if any) c - (B + A*(C^D)) # last node d - (a*b*c) return(d) } else {Probability(N, f, (m-1), b, x, t)} ## finish in a gap } else {## starts outside node if (m=(x+t-b)) {return(1)} #also end in the gap if (D=t) { #end in a node b - ((B + A*(C^t))^(floor((m/(x+t) c - (B + (A*(C^D))) d - (b*c) return(d) } else {Probability(N, f, (m-1), b, x, t)} #outside node } } I have the following code and I know it works, but I need to explain what is going on, particularly with the recursion. Is it true that when each call finishes - it will pass a quantity back to the next generation above until you return to the start of the chain, then outputs the final result.? If so, could someone explain it in a bit more clearly? if not, how does the recursion work - how does it finally output a value? -- This is a generic reply, rather than referring to your specific code above. The most succinct definition of recursion is in Ted's Dictionary: *Recursion* If you understand *Recursion*, then stop reading now and do something else. Otherwise, see *Recursion. (I have found that this goes down well in lectures. It presupposes however that the reader will eventually catch on, so the definition is not suitable for the infinitely stupid -- which is perhaps a realistic assumption in a lecture context). The really important element in the above definition is the initial escape clause (which, by the above assumption, will eventually be realised). A proper recursive definition must include something which will eventually cause it to return a result to level above. The structure of the process which occurs when a recursive function is called can be illustrated by a function to compute n! (the factorial of a a positive integer n): factorial - function(n) { if(n==0) return(1) else## Escape clause return( n*factorial(n-1) ) } So what happens when you call 'factorial(3)' is: n==3 so !(n==0) so return(3*( n==2 so !(n==0) so return(2*( n==1 so !(n==0) so return(1*( n==0 so return 1 1*(1) = 1 2*(1) = 2 3*(2) = 6 return(6) Another way of looking at it is that each successive call opens a *( in the expression (3*(3-1=2*(2-1=1*(1-1==0 - 1 | Escape clause activated here = (3*(2*(1*(1 ... and then backing up through the levels completes each level with a ) and passes up the result, resulting successively in (3*(2*(1*(1) ... = (3*(2*(1*1 ... = (3*(2*(1 ... (3*(2*(1) ... = (3*(2*1 = (3*(2 ... (3*(2) ... = (3*2 ... = (6 ... (6) = 6 Hoping this helps! Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 12-Jul-11 Time: 20:02:47 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grey colored lines and overwriting labels i qqplot2
Merging two posts (data and questions); see inline below. On 7/11/2011 7:55 PM, Sigrid wrote: Thank you, Dennis. This is my regenerated dput codes. They should be correct as I closed off R and re-ran them based on the dput output. NB, this is the test dataset used later structure(list(year = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), treatment = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L), .Label = c(A, B, C, D, E, F, G), class = factor), total = c(135L, 118L, 121L, 64L, 53L, 49L, 178L, 123L, 128L, 127L, 62L, 129L, 126L, 99L, 183L, 45L, 57L, 45L, 72L, 30L, 71L, 123L, 89L, 102L, 60L, 44L, 59L, 124L, 145L, 126L, 103L, 67L, 97L, 66L, 76L, 108L, 36L, 48L, 41L, 69L, 47L, 57L, 167L, 136L, 176L, 85L, 36L, 82L, 222L, 149L, 171L, 145L, 122L, 192L, 136L, 164L, 154L, 46L, 57L, 57L, 70L, 55L, 102L, 111L, 152L, 204L, 41L, 46L, 103L, 156L, 148L, 155L, 103L, 124L, 176L, 111L, 142L, 187L, 43L, 52L, 75L, 64L, 91L, 78L, 196L, 314L, 265L, 44L, 39L, 98L, 197L, 273L, 274L, 89L, 91L, 74L, 91L, 112L, 98L, 140L, 90L, 121L, 120L, 161L, 83L, 230L, 266L, 282L, 35L, 53L, 57L, 315L, 332L, 202L, 90L, 79L, 89L, 67L, 116L, 109L, 44L, 68L, 75L, 29L, 52L, 52L, 253L, 203L, 87L, 105L, 234L, 152L, 247L, 243L, 144L, 167L, 165L, 95L, 300L, 128L, 125L, 84L, 183L, 88L, 153L, 185L, 175L, 226L, 216L, 118L, 118L, 94L, 224L, 259L, 176L, 175L, 147L, 197L, 141L, 176L, 187L, 87L, 92L, 148L, 86L, 139L, 122L), country = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L ), .Label = c(high, low), class = factor)), .Names = c(year, treatment, total, country), class = data.frame, row.names = c(NA, -167L)) I hope be useful for you when giving me a hand with my difficulties. On 7/9/2011 8:24 PM, Sigrid wrote: I created this graph in ggplot and added ablines to the different facets by specifying with subset commands. As you might see, there are still a few issues. 1.) I would like to have the diamonds in a grey scale instead of colors. I accomplished this (see graph 2) until I overwrote the label title for the treatments and the colors came back (graph 1). I used these two commands: p=ggplot(data = test, aes(x = YEAR, y = TOTAL, colour = TREATMENT)) + geom_point() + facet_wrap(~country)+scale_colour_grey()+ scale_y_continuous(number of votes)+ scale_x_continuous(Years)+ scale_x_continuous(breaks=1:4) + scale_colour_hue(breaks='A', labels='label A')+ scale_colour_hue(breaks='B', labels='label B') How can I keep the grey scale, but avoid changing back to colors when using the scale_colour_hue command? You should only have one scale_ call for each scale type. Here, you have three scale_colour_ calls, the first selecting a grey scale, the second defining a single break with its label (and thus implicitly subsetting on that single break value), and a second which defines a different break/label/subset. Only the last one has any effect. http://r.789695.n4.nabble.com/file/n3657119/color_graph.gif 2.) Furthermore, only one of the overwritten labels of the treatments came up, despite putting in two commands (graph 1). What could have happened here? p +
[R] Generating a histogram with R
Hello, I have a sample file: chr22 100 150 125 21 0.145 + chr22 200 300 212 13 0.05+ chr22 345 365 351 12 0.09+ chr22 500 750 510 15 0.10+ chr22 500 750 642 9 0.02+ chr22 800 900 850 10 0.05+ where I need to generate a histogram from the data in column 6 (i.e. 0.145, 0.05, etc.). To make it easier to read, I would plot the data as 1-0.05=0.95 for all of the data in column 6. What I would like to know is how to generate a histogram with the data from one file? Also, would I be able to generate one histogram from multiple files as well (with the same format)? For example, I have multiple files in the same format as the sample file above, and I would like to make one histogram for all column six data in all files. Thank you, a217 -- View this message in context: http://r.789695.n4.nabble.com/Generating-a-histogram-with-R-tp3663350p3663350.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating a histogram with R
Hello: R has an extensive Help system. Please learn to use it. ?histogram ?help Also see the online manual tutorial An Introduction to R -- Bert On Tue, Jul 12, 2011 at 12:41 PM, a217 aj...@case.edu wrote: Hello, I have a sample file: chr22 100 150 125 21 0.145 + chr22 200 300 212 13 0.05 + chr22 345 365 351 12 0.09 + chr22 500 750 510 15 0.10 + chr22 500 750 642 9 0.02 + chr22 800 900 850 10 0.05 + where I need to generate a histogram from the data in column 6 (i.e. 0.145, 0.05, etc.). To make it easier to read, I would plot the data as 1-0.05=0.95 for all of the data in column 6. What I would like to know is how to generate a histogram with the data from one file? Also, would I be able to generate one histogram from multiple files as well (with the same format)? For example, I have multiple files in the same format as the sample file above, and I would like to make one histogram for all column six data in all files. Thank you, a217 -- View this message in context: http://r.789695.n4.nabble.com/Generating-a-histogram-with-R-tp3663350p3663350.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating a histogram with R
On Jul 12, 2011, at 3:41 PM, a217 wrote: Hello, I have a sample file: chr22 100 150 125 21 0.145 + chr22 200 300 212 13 0.05+ chr22 345 365 351 12 0.09+ chr22 500 750 510 15 0.10+ chr22 500 750 642 9 0.02+ chr22 800 900 850 10 0.05+ where I need to generate a histogram from the data in column 6 (i.e. 0.145, 0.05, etc.). To make it easier to read, I would plot the data as 1-0.05=0.95 for all of the data in column 6. That makes no sense to me, unless you want to pre-multiply all values by 0.95. What I would like to know is how to generate a histogram with the data from one file? Also, would I be able to generate one histogram from multiple files as well (with the same format)? ?hist ?histogram (# lattice There are a ton of worked examples in the Archives. Learn to search. Reasonable search terms once you get to Barons site with RSiteSearch are (after setting the web interface to get r-help postings): grouped histogram For example, I have multiple files in the same format as the sample file above, and I would like to make one histogram for all column six data in all files. Also there are a ton of worked examples in the archive dealing with accessing multiple files. Thank you, a217 -- View this message in context: http://r.789695.n4.nabble.com/Generating-a-histogram-with-R-tp3663350p3663350.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Deviance of zeroinfl/hurdle models
Carson Farmer carson.farmer at gmail.com writes: Dear list, I'm wondering if anyone can help me calculate the deviance of either a zeroinfl or hurdle model from package pscl? Even if someone could point me to the correct formula for calculating the deviance, I could do the rest on my own. What about library(pscl) example(hurdle) -2*logLik(fm_hnb2) ? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] when to use `which'?
when do I need to use which()? a - c(1,2,3,4,5,6) a [1] 1 2 3 4 5 6 a[a==4] [1] 4 a[which(a==4)] [1] 4 which(a==4) [1] 4 a[which(a2)] [1] 3 4 5 6 a[a2] [1] 3 4 5 6 seems unnecessary... -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 11.0.60900031 http://jihadwatch.org http://palestinefacts.org http://mideasttruth.com http://truepeace.org http://thereligionofpeace.com Good programmers treat Microsoft products as damage and route around it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to smoothen a geodata set in R
To answer your questions: Yes, yes, and probably no. You will have to pick up any introductory manual of R where questions 1 and 2 will be discussed. For 1: you index x as in x[452,682]. For 2: there are ways to write (and avoid) loops in R (e.g. for or while loops). Often avoidance is preferable because R is not very fast looping. For 3: I cannot say for sure, but as long as you data is in matrix format or something that can be coerced to it, no. Daniel Tariq wrote: Hello, I'm new to this list. Sorry if my question or parts of it already came up before. For my research in geostatistics, I am working with large sets of data in R (basically large matrices containing discrete x and y coordinates and a value for a certain parameter). These sets are obtained by kriging. The operation I'd like to perform is smoothen the output data set. I want to do it by adding each data point and its 8 surrounding points and dividing this by nine (gives an average), and then replacing each element in the matrix with the result. Question 1: is there a way to address the parameter value of a single element (for example, the value for element [x=452, y=682] inside the matrix) and perform an operation on it in R? Question 2: is there a way to programm R into a loop, so that the same operation can be performed on all elements inside the matrix? Question 3: is it a problem if my data is geodata (made with the geoR library)? -- View this message in context: http://r.789695.n4.nabble.com/How-to-smoothen-a-geodata-set-in-R-tp3662902p3663432.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] when to use `which'?
Well ... which(a==4)^2 ?? -- Bert On Tue, Jul 12, 2011 at 1:17 PM, Sam Steingold s...@gnu.org wrote: when do I need to use which()? a - c(1,2,3,4,5,6) a [1] 1 2 3 4 5 6 a[a==4] [1] 4 a[which(a==4)] [1] 4 which(a==4) [1] 4 a[which(a2)] [1] 3 4 5 6 a[a2] [1] 3 4 5 6 seems unnecessary... -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 11.0.60900031 http://jihadwatch.org http://palestinefacts.org http://mideasttruth.com http://truepeace.org http://thereligionofpeace.com Good programmers treat Microsoft products as damage and route around it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] when to use `which'?
On Jul 12, 2011, at 4:17 PM, Sam Steingold wrote: when do I need to use which()? a - c(1,2,3,4,5,6) a [1] 1 2 3 4 5 6 a[a==4] [1] 4 a[which(a==4)] [1] 4 which(a==4) [1] 4 a[which(a2)] [1] 3 4 5 6 a[a2] [1] 3 4 5 6 seems unnecessary... It is unnecessary when `a` is a toy case and has no NA's. And you will find some of the cognoscenti trying to correct you when you do use which(). a - c(1,2, NA ,3,4, NaN, 5,6) data.frame(lets= letters[1:8], stringsAsFactors=FALSE)[a0, ] [1] a b NA d e NA g h data.frame(lets= letters[1:8], stringsAsFactors=FALSE)[which(a0), ] [1] a b d e g h If you have millions of records and tens of thousands of NA's (say ~ 1% of the data), imagine what your console looks like if you try to pick out records from one day and get 10,000 where you were expecting 100. A real PITA when you are doing real work. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] when to use `which'?
On Tue, Jul 12, 2011 at 1:17 PM, Sam Steingold s...@gnu.org wrote: when do I need to use which()? See ?which For examples, try: example(which) a - c(1,2,3,4,5,6) a [1] 1 2 3 4 5 6 a[a==4] [1] 4 a[which(a==4)] [1] 4 which(a==4) [1] 4 a[which(a2)] [1] 3 4 5 6 a[a2] [1] 3 4 5 6 seems unnecessary... Yes, it can be used as a redudant wrapper as you have demonstrated in your examples. In those cases, it is most certainly unnecessary. -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 11.0.60900031 http://jihadwatch.org http://palestinefacts.org http://mideasttruth.com http://truepeace.org http://thereligionofpeace.com Good programmers treat Microsoft products as damage and route around it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple save question
Here is a worked example. Can you point out to me where in temp rmean is stored? Thanks. Tom library(survival) library(ISwR) dat.s - Surv(melanom$days,melanom$status==1) fit - survfit(dat.s~1) plot(fit) summary(fit) Call: survfit(formula = dat.s ~ 1) time n.risk n.event survival std.err lower 95% CI upper 95% CI 185201 10.995 0.004960.9851.000 204200 10.990 0.007000.9761.000 210199 10.985 0.008550.9681.000 232198 10.980 0.009850.9611.000 279196 10.975 0.011000.9540.997 295195 10.970 0.012020.9470.994 386193 10.965 0.012970.9400.991 426192 10.960 0.013840.9330.988 469191 10.955 0.014650.9270.984 529189 10.950 0.015420.9200.981 621188 10.945 0.016150.9140.977 629187 10.940 0.016830.9070.973 659186 10.935 0.017480.9010.970 667185 10.930 0.018110.8950.966 718184 10.925 0.018700.8890.962 752183 10.920 0.019270.8830.958 779182 10.915 0.019810.8770.954 793181 10.910 0.020340.8710.950 817180 10.904 0.020840.8650.946 833178 10.899 0.021340.8590.942 858177 10.894 0.021810.8530.938 869176 10.889 0.022270.8470.934 872175 10.884 0.022720.8410.930 967174 10.879 0.023150.8350.926 977173 10.874 0.023570.8290.921 982172 10.869 0.023970.8230.917 1041171 10.864 0.024360.8170.913 1055170 10.859 0.024740.8120.909 1062169 10.854 0.025110.8060.904 1075168 10.849 0.025470.8000.900 1156167 10.844 0.025820.7940.896 1228166 10.838 0.026160.7890.891 1252165 10.833 0.026490.7830.887 1271164 10.828 0.026810.7770.883 1312163 10.823 0.027130.7720.878 1435161 10.818 0.027440.7660.874 1506159 10.813 0.027740.7600.869 1516155 10.808 0.028050.7550.865 1548152 10.802 0.028370.7490.860 1560150 10.797 0.028680.7430.855 1584148 10.792 0.028990.7370.851 1621146 10.786 0.029290.7310.846 1667137 10.780 0.029630.7250.841 1690134 10.775 0.029980.7180.836 1726131 10.769 0.030330.7120.831 1933110 10.762 0.030850.7040.825 2061 95 10.754 0.031550.6940.818 2062 94 10.746 0.032210.6850.812 2103 90 10.737 0.032900.6760.805 2108 88 10.729 0.033580.6660.798 2256 80 10.720 0.034380.6560.791 2388 75 10.710 0.035230.6450.783 2467 69 10.700 0.036190.6330.775 2565 63 10.689 0.037290.6200.766 2782 57 10.677 0.038540.6050.757 3042 52 10.664 0.039940.5900.747 3338 35 10.645 0.043070.5660.735 print(fit, print.rmean=TRUE) Call: survfit(formula = dat.s ~ 1) records n.maxn.start events *rmean *se(rmean) median 205205205 57 4125161 NA 0.95LCL0.95UCL NA NA * restricted mean with upper limit = 5565 temp - summary(fit) str(temp) List of 12 $ surv: num [1:57] 0.995 0.99 0.985 0.98 0.975 ... $ time: num [1:57] 185 204 210 232 279 295 386 426 469 529 ... $ n.risk : num [1:57] 201 200 199 198 196 195 193 192 191 189 ... $ n.event : num [1:57] 1 1 1 1 1 1 1 1 1 1 ... $ conf.int: num 0.95 $ type: chr right $ table : Named num [1:7] 205 205 205 57 NA NA NA ..- attr(*, names)= chr [1:7] records n.max n.start events ... $ n.censor: num [1:57] 0 0 0 1 0 0 0 0 0 0 ... $ std.err : num [1:57] 0.00496 0.007 0.00855 0.00985 0.011 ... $ lower : num [1:57] 0.985 0.976 0.968 0.961 0.954 ... $ upper : num [1:57] 1 1 1 1 0.997 ... $ call: language survfit(formula =
[R] Question re complex survey design and cure models
Hello all, I am using AddHealth data to fit a cure, aka split population model using nltm. I am not sure how to account for the complex survey design - does anyone have any suggestions? Any help would be greatly appreciated! Sincerely, Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.