Re: [R] kruskal-wallis, stratified
See the thread stratified Wilcoxon available? at http://tolstoy.newcastle.edu.au/R/help/05/08/11143.html Heinz At 11:21 13.04.2010, Kay Cichini wrote: hello everyone, can anybody tell me if there is a kruskal-wallis, or another non-parametric test, that can deal with multiple samples that are stratified? thanks, kay -- View this message in context: http://n4.nabble.com/kruskal-wallis-stratified-tp1838210p1838210.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kruskal-wallis, stratifiedhttp://n4.nabble.com/forum/NewNode.jtp?tpl=replynode=1838232
Sorry for not being precise enough. Here http://tolstoy.newcastle.edu.au/R/help/05/08/11177.html you should find the attachment http://tolstoy.newcastle.edu.au/R/help/att-11177/KW.strat.2005.R I used it, and it seems to work. In some cases some elements of weight may become Inf. Heinz At 12:29 13.04.2010, Kay Cichini wrote: hello heinz, i read the thread already. i think it applies only to 2-sample problems. greatings, kay -- View this message in context: http://n4.nabble.com/kruskal-wallis-stratified-tp1838210p1838261.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can we get rid of bar charts with error bars?
Frank, the example on http://biostat.mc.vanderbilt.edu/DynamitePlots is nice, and I agree with you. Just one minor question: would it be possible to mention as An article with nice dot plots a paper, which is freely available? Heinz At 14:56 03.12.2009, Frank E Harrell Jr wrote: Bar charts with error bars are far inferior to dot charts and other types of displays. One of many problems is demonstrated if you draw a bar chart displaying temperature in F then re-draw it on the degrees C scale. See http://biostat.mc.vanderbilt.edu/DynamitePlots for much more information. The error bars lull us into an assumption that symmetric confidence intervals are OK, among other things. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Building static HTML help pages in R 2.10.x on Windows
Instead of an answer, may I add question c) can someone state that it is impossible to generate static HTML help pages under Windows? At 21:40 22.12.2009, Steve Rowley wrote: I upgraded to R2.10.1pat and discovered, along with everybody else, that static HTML pages are no longer the default. Fine; my tastes would go the other way, but I'm happy to adapt. However, I'd still like to build static HTML pages (for stable bookmarking, use when R is not running, etc.). I'm using the Windows installer, so the advice in the R Installation Admin guide (section 2.2, Help options) to use the configure option --enable-prebuilt-html doesn't seem to apply. I'm using install.packages() rather than R CMD INSTALL, so I don't quite understand how the --html arg to R CMD INSTALL can apply either. So, can anybody point me to an example of either: (a) how to build the static HTML help pages of all currently installed packages under Windows, or, failing that (b) how to do this on Windows ab initio, from a clean install? Thanks! -- Steve Rowley s...@alum.mit.edu http://alum.mit.edu/www/sgr/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Applying function to parts of a matrix based on a factor
If your matrix were a data.frame, it could work like this: df - data.frame(age=1:100, sex=rep(1:2, 50)) with(df, by(age, sex, mean)) without the lapply, sapply etc. family. h At 18:16 13.01.2010, Doran, Harold wrote: with(yourdataframe, tapply(age,sex,mean)) -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of John Sorkin Sent: Wednesday, January 13, 2010 12:11 PM To: r-help@r-project.org Subject: [R] Applying function to parts of a matrix based on a factor R 2.9 Windows XP I have a matrix, Data, which contains a factor Sex and a continuous variable Age. I want to get mean age by sex. I know I can do this with two statements, mean(Data[Age,Data[,Sex]==Male) and mean(Data[Age,Data[,Sex]==Female) I know this can be done in a single command, but I can remember how. There is a function that allows another function work within factors, something like magicfunction(Data,Factor=Sex). n.b. I know the function I am looking for is not in the lapply, sapply etc. family Please put me out of my misery (and senior moment) and remind me what function I should be using. John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for t...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging issue.........
Did you consider to look at the help page for merge? h At 22:01 13.01.2010, karena wrote: hi, I have a question about merging two files. For example, I have two files, the first file is like the following: id trait1 110.2 211.1 39.7 610.2 78.9 10 9.7 11 10.2 The second file is like the following: idtrait2 1 9.8 2 10.8 4 7.8 5 9.8 6 10.1 1210.2 1310.1 now I want to merge the two files by the variable id, I only want to keep the ids which show up in the first file. Even the id does not show up in the second file, it doesn't matter, I can keep the missing values. So my question is: how can I merge the two files and keep only the rows whose id show up in the first file? I know how to do it is SAS, just use the following code: merge data1(in=in1) data2(in=in2); by id; if in1; but I really have no idea about how to do it in R. thank you in advance, karean -- View this message in context: http://n4.nabble.com/merging-issue-tp1013356p1013356.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] BMDP and SAS (was R in clinical trials)
Once I suggested to BMDP to introduce a module-statement that would direct the syntax to the specified module (1L, 2L, ...), so that all syntax could reside in one job, but they did not like that idea. Heinz At 14:55 19.02.2010, Terry Therneau wrote: I used both BMDP and SAS in my earlier years, side by side. At that time the BMDP statistical methods were much more mature and comprehensive: we treated them as the standard when the two packages disagreed. (It was a BMDP manual that clearly explained to me what the hypothesis of Yate's weighted mean test is, something SAS decided to call type III and eternally obfuscate by defining it in terms of a computational algorithm). The BMDP programs had reasonable facilities for data manipulation --- not as strong as SAS but reasonable. However each analysis program was a separate run, so you had to cut and paste your block of setup code onto the front of each program's instructions. Cut and paste with a keypunch machine is not quite as simple as with a mouse, if you needed a listing, some frequencies, 2-3 regressions, ... it got rather tedious. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Output mean/median survival time from survfit
Maybe this thread is of use for you. How to access results of survival analysis Xiaochun Li (06 May 2006) http://tolstoy.newcastle.edu.au/R/help/06/05/26713.html Heinz At 21:28 04.02.2008, Xing Yuan wrote: Hi all, Does anybody know how to output the mean/median survival time from survfit? Thank you very much!!! Joe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with cut
At 15:22 22.02.2008, Henrique Dallazuanna wrote: Try this: grep(330, levels(cc), value=T) Could you please explain in a little more detail, how this answers the original question? I would have expected 330 to fall into (313,330] category. Can you please advice what do I do wrong? Thank you Heinz On 22/02/2008, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi All, I might misunderstood how cut works. But following behaviour surprises me. vv - seq(150, 346, by= 4) cc - cut(vv, 12) cc[vv == 330] Results [1] (330,346] I would have expected 330 to fall into (313,330] category. Can you please advice what do I do wrong? Many Thanks, Jussi Lehto Visit our website at http://www.ubs.com This message contains confidential information and is in...{{dropped:29}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] correlation between categorical data
At 07:40 21.06.2009, J Dougherty wrote: [...] There are other ways of regarding the FET. Since it is precisely what it says - an exact test - you can argue that you should avoid carrying over any conclusions drawn about the small population the test was applied to and employing them in a broader context. In so far as the test is concerned, the sample data and the contingency table it is arrayed in are the entire universe. In that sense, the FET can't be conservative or liberal. It isn't actually a hypothesis test and should not be thought of as one or used in the place of one. JDougherty Could you give some reference, supporting this, for me, surprising view? I don't see a necessary connection between an exact test and the idea that it does not test a hypothesis. Thanks, Heinz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survSplit with data.frame containing a Surv object
Dear All, since years I am struggling with Surv objects in data.frames. The following seems to have to do with it. See below the modified example from the help page of survSplit. The original works, as expected. If, however, a Surv object is added to the data.frame, each record gets doubled. Is there some solution other than avoiding Surv objects in data.frames? Thanks, Heinz require(survival) ## from the help page aml3-survSplit(aml,cut=c(5,10,50),end=time,start=start, event=status,episode=i) summary(aml) summary(aml3) coxph(Surv(time,status)~x,data=aml) ## the same coxph(Surv(start,time,status)~x,data=aml3) ## added to show doubling of records aml.so - aml aml.so$surv.object - with(aml, Surv(time, status)) aml3.so - survSplit(aml.so ,cut=c(5,10,50),end=time,start=start, event=status,episode=i) summary(aml3.so) sessionInfo('survival') R version 2.9.1 Patched (2009-07-07 r48910) i386-pc-mingw32 locale: LC_COLLATE=German_Switzerland.1252;LC_CTYPE=German_Switzerland.1252;LC_MONETARY=German_Switzerland.1252;LC_NUMERIC=C;LC_TIME=German_Switzerland.1252 attached base packages: character(0) other attached packages: [1] survival_2.35-4 loaded via a namespace (and not attached): [1] base_2.9.1 graphics_2.9.1 grDevices_2.9.1 methods_2.9.1 [5] splines_2.9.1 stats_2.9.1 utils_2.9.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survSplit with data.frame containing a Surv object
At 20:18 13.07.2009, Charles C. Berry wrote: On Mon, 13 Jul 2009, Heinz Tuechler wrote: Dear All, since years I am struggling with Surv objects in data.frames. The following seems to have to do with it. See below the modified example from the help page of survSplit. The original works, as expected. If, however, a Surv object is added to the data.frame, each record gets doubled. Is there some solution other than avoiding Surv objects in data.frames? I think you can modify survSplit so that it will properly handle Surv objects. Change this line: newdata - lapply(data, rep, ntimes + 1) to this: newdata - lapply(data, function(x) { x - as.matrix(x) x[rep(1:nrow(x), ntimes + 1),] }) or something similar that results Surv objects being rep()'ed rowwise rather than elementwise and returned as objects of the right dimension (rather than as a vector). Caveat: This works in the example you give, but I've not tested this extensively. HTH, Chuck Thanks, Heinz require(survival) ## from the help page aml3-survSplit(aml,cut=c(5,10,50),end=time,start=start, event=status,episode=i) summary(aml) summary(aml3) coxph(Surv(time,status)~x,data=aml) ## the same coxph(Surv(start,time,status)~x,data=aml3) ## added to show doubling of records aml.so - aml aml.so$surv.object - with(aml, Surv(time, status)) aml3.so - survSplit(aml.so ,cut=c(5,10,50),end=time,start=start, event=status,episode=i) summary(aml3.so) sessionInfo('survival') R version 2.9.1 Patched (2009-07-07 r48910) i386-pc-mingw32 locale: LC_COLLATE=German_Switzerland.1252;LC_CTYPE=German_Switzerland.1252;LC_MONETARY=German_Switzerland.1252;LC_NUMERIC=C;LC_TIME=German_Switzerland.1252 attached base packages: character(0) other attached packages: [1] survival_2.35-4 loaded via a namespace (and not attached): [1] base_2.9.1 graphics_2.9.1 grDevices_2.9.1 methods_2.9.1 [5] splines_2.9.1 stats_2.9.1 utils_2.9.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 Thank you Chuck, it seems to work also with my real data, but I noted that in the example aml$x, which is a factor, gets converted to character in aml3.so. Maybe, if I find the time, I should look at as.data.frame.matrix and rbind for Surv objects. Thanks again, Heinz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Two envelopes problem
Mark My experience was similarly frustrating. Maybe formulating the problem a bit differently could help to clarify it. State it like this: Someone chooses an amount of money x. He puts 2x/3 of it in one envelope and x/3 in an other. There is no assumption about the distribution of x. If you choose one envelope your expectation is x/2 and changing may lead to a gain or a loss of x/6. In my view there is no basis for a frequentist conditional expectation, conditional on the amount in the first envelope. Of course, after opening the first envelope and finding a, you know for sure that x can only be 3a or 3a/2, but to me there seems to be no basis to assign probabilities to these two alternatives. I am aware of the long lasting discussion and of course this will not end it. Heinz At 14:51 26.08.2008, Mark Leeds wrote: Duncan: I think I see what you're saying but the strange thing is that if you use the utility function log(x) rather than x, then the expected values are equal. Somehow, if you are correct and I think you are, then taking the log , fixes the distribution of x which is kind of odd to me. I'm sorry to belabor this non R related discussion and I won't say anything more about it but I worked/talked on this with someone for about a month a few years ago and we gave up so it's interesting for me to see this again. Mark -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Duncan Murdoch Sent: Tuesday, August 26, 2008 8:15 AM To: Jim Lemon Cc: r-help@r-project.org; Mario Subject: Re: [R] Two envelopes problem On 26/08/2008 7:54 AM, Jim Lemon wrote: Hi again, Oops, I meant the expected value of the swap is: 5*0.5 + 20*0.5 = 12.5 Too late, must get to bed. But that is still wrong. You want a conditional expectation, conditional on the observed value (10 in this case). The answer depends on the distribution of the amount X, where the envelopes contain X and 2X. For example, if you knew that X was at most 5, you would know you had just observed 2X, and switching would be a bad idea. The paradox arises because people want to put a nonsensical Unif(0, infinity) distribution on X. The Wikipedia article points out that it can also arise in cases where the distribution on X has infinite mean: a mathematically valid but still nonsensical possibility. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mode Vs Class
Congratulation Bill for this very clear and useful explanation. Heinz At 14:58 08.04.2008, [EMAIL PROTECTED] wrote: 'mode' is a mutually exclusive classification of objects according to their basic structure. The 'atomic' modes are numeric, complex, charcter and logical. Recursive objects have modes such as 'list' or 'function' or a few others. An object has one and only one mode. 'class' is a property assigned to an object that determines how generic functions operate with it. It is not a mutually exclusive classification. If an object has no specific class assigned to it, such as a simple numeric vector, it's class is usually the same as its mode, by convention. Changing the mode of an object is often called 'coercion'. The mode of an object can change without necessarily changing the class. e.g. x - 1:16 mode(x) [1] numeric dim(x) - c(4,4) mode(x) [1] numeric class(x) [1] matrix is.numeric(x) [1] TRUE mode(x) - character mode(x) [1] character class(x) [1] matrix However: x - factor(x) class(x) [1] factor mode(x) [1] numeric At this stage, even though x has mode numeric again, its new class, 'factor', inhibits it being used in arithmetic operations. In practice, mode is not used very much, other than to define a class implicitly when no explicit class has been assigned. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Shubha Vishwanath Karanth Sent: Tuesday, 8 April 2008 10:20 PM To: [EMAIL PROTECTED] Subject: [R] Mode Vs Class Hi R, Just came across the 'mode' of an object. What is the basic difference between ?class and ?mode ... For example: d - data.frame(a = c(1,2), b = c(5,6)) class(d) [1] data.frame mode(d) [1] list But, c - c(2,3,5,6,7) class(c) [1] numeric mode(c) [1] numeric Could anyone help me out... Thanks, shubha __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trend test for survival data
Dear Markus! Since I did not see an answer yet, my suggestion is to use coxph with the groups variable numerically coded as the only independent variable. Heinz At 13:39 21.04.2008, Markus Kreuz wrote: Hello, is there a R package that provides a log rank trend test for survival data in =3 treatment groups? Or are there any comparable trend tests for survival data in R? Thanks a lot Markus -- Dipl. Inf. Markus Kreuz Universitaet Leipzig Institut fuer medizinische Informatik, Statistik und Epidemiologie (IMISE) Haertelstr. 16-18 D-04107 Leipzig Tel. +49 341 97 16 276 Fax. +49 341 97 16 109 email: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survival curves for time dependent covariates (was consultation)
At 14:50 12.05.2009, Terry Therneau wrote: *I´m writing to ask you how can I do Survivals Curves using Time-dependent *covariates? Which packages I need to Install?* This is a very difficult problem statistically. That is, there are not many good ideas for what SHOULD be done. Hence, there are no packages. Almost everything you find in an applied paper (e.g. a medical journal) is wrong. Terry Therneau Dear Terry, just in case it does not make too much work to you, maybe you could give some references to examples of wrong applications in applied medical papers. Thanks, Heinz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Where to find a changelog for the survival package
Dear All, since some days I try to use the versions 2.35-4 of the survival package instead of versions 2.31, I had installed until now. Several changes in print.survfit, plot.survfit and seemingly in the structure of ratetabels effect some of my syntax files. Is there somewhere a documentation of these changes, besides the code itself? Thanks in advance, Heinz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Where to find a changelog for the survival package
Thank you Richie. I had seen this before, but my impression is that it's not up to date. I gave a wrong version number in my previous post. I changed from 2.34-1 to 2.35-4. For example, the plot.survfit function lost it's legend parameters, but I don't see this in the changelog. Thanks again, Heinz At 14:53 20.05.2009, richard.cot...@hsl.gov.uk wrote: since some days I try to use the versions 2.35-4 of the survival package instead of versions 2.31, I had installed until now. Several changes in print.survfit, plot.survfit and seemingly in the structure of ratetabels effect some of my syntax files. Is there somewhere a documentation of these changes, besides the code itself? It's in the repository on R-Forge. The latest version is here: http://r-forge.r-project.org/plugins/scmsvn/viewcvs.php/pkg/survival/Changelog.09?rev=11234root=survivalview=markup Regards, Richie. Mathematical Sciences Unit HSL ATTENTION: This message contains privileged and confidential information intended for the addressee(s) only. If this message was sent to you in error, you must not disseminate, copy or take any action in reliance on it and we request that you notify the sender immediately by return email. Opinions expressed in this message and any attachments are not necessarily those held by the Health and Safety Laboratory or any person connected with the organisation, save those by whom the opinions were expressed. Please note that any messages sent or received by the Health and Safety Laboratory email system may be monitored and stored in an information retrieval system. Scanned by MailMarshal - Marshal's comprehensive email content security solution. Download a free evaluation of MailMarshal at www.marshal.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Changelog for the survival package
Dear Terry, first of all, thank you for your immense work. At the moment, I don't have a small reproducible example for the ratetable difficulty I have. I will work on it. Maybe the error message I get is of some information to you. Error in match.ratetable(m[, rate], ratetable) : Data has a date type variable, but the reference ratetable is not a date for variable year If I want to make str(survexp.ode) that is my ratetable, I get: str(survexp.ode) Error in `[.ratetable`(object, seq_len(iv.len)) : Invalid subscript The same, however, is possible in version 2.34-1^ Try: str(survexp.us) Error in `[.ratetable`(object, seq_len(iv.len)) : Invalid subscript But with unclass() it works str(unclass(survexp.us)) num [1:113, 1:2, 1:65] 1.58e-02 1.87e-03 3.01e-04 6.05e-05 1.52e-05 ... - attr(*, dimnames)=List of 3 ..$ : chr [1:113] 0-1d 1-7d 7-28d 28-365d ... ..$ : chr [1:2] male female ..$ : chr [1:65] 1940 1941 1942 1943 ... - attr(*, dimid)= chr [1:3] age sex year - attr(*, type)= num [1:3] 2 1 4 - attr(*, cutpoints)=List of 3 ..$ : num [1:113] 0 1 7 28 365 ... ..$ : NULL ..$ : int [1:65] -7305 -6939 -6574 -6209 -5844 -5478 -5113 -4748 -4383 -4017 ... - attr(*, summary)=function (R) Concerning the legend, I fully aggree with you. It's just that I have several syntax files, where I made use of the legend parameters and so I noted the change. For these files I rebuilt your old plot.survfit(). Further I appreciate your new function survmean(). At the moment it seems to be intended as internal, and not documented in the help. Still, I use it to get the old form of the output and to get the output as an object. I think, with only right censored data, n.max and n.start are not informative. To underline, I appreciate your changes, it's only a little difficult to recognize them correctly by trial and error. Thanks, Heinz At 18:57 21.05.2009, Terry Therneau wrote: Several changes in print.survfit, plot.survfit and seemingly in the structure of ratetabels effect some of my syntax files. Is there somewhere a documentation of these changes, besides the code itself? I agree, the Changelog.09 file is not as comprehensive as one would like. Specific comments: 1. The ratetables were recently changed to accomodate a new option. I thought that I had made them completely backwards compatable with the old -- please let me know specifics if I overlooked something. The routines that make use of the rate tables can now use multiple date types, but they still support the older 'date' class. 2. My local code and the R code had gotton badly out of sync, I spent a substantial fraction of my evenings re-merging them for over a year. 2/3 of the changes were disjoint improvments in the two trees, these were easy to merge. The hardest were survfit and its print/plot methods and some summary methods, where both of us had worked towards the same goal but in not quite the same way. I had made 3x as many updates to survfit as the R tree, so used my (Mayo) code as the base, almost all the others stayed closer to the R side. Feel free to ask me direct questions about any feature or change. I can't necessarily promise fast resolution, but will try. 3. I don't understand putting legend or title options into a plot method, since a separate call after the plot is so much more flexible. They got pushed to the bottom of my change list, and then completely forgotton. 4. In the last few weeks issues with anova.coxph, and predict.coxph/factors/newdata were raised. The fixes were added to Rforge last night, and include 2 new test cases to avoid future mishaps. Terry T. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survfit, summary, and survmean (was Changelog for survival package)
Dear Terry, sorry that I did not see this change, and thank you for it. It is very useful. Heinz At 15:28 22.05.2009, Terry Therneau wrote: Further I appreciate your new function survmean(). At the moment it seems to be intended as internal, and not documented in the help. The computations done by print.survfit are now a part of the results returned by summary.survfit. See 'table' in the output list of ?summary.survfit. Both call an internal survmean() function to ensure that any future updates stay in synchrony. This was a perennial (and justified) complaint with print.survfit. Per the standard print(x) always returns x, so there was no way to get the results of the print as an S object. Terry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different results in calculating SD of 2 numbers
At 11:32 16.01.2008, Jim Lemon wrote: (Ted Harding) wrote: On 16-Jan-08 08:45:04, Martin Maechler wrote: RM == Ron Michael [EMAIL PROTECTED] on Wed, 16 Jan 2008 00:14:56 -0800 (PST) writes: RM Hi all, RM Can anyone tell me why I am getting different results in calculating SD of 2 numbers ? (1.25-0.95)/2 RM [1] 0.15 sd(c(1.25, 0.95)) RM [1] 0.2121320 # why it is different from 0.15? because 1 is different from 2 ! If 2 was 1, than sqrt(2) == 1 as well, but actually I don't think the universe and we all would exist in that case Martin Maechler, ETH Of course we would!! -- Since FALSE implies X is TRUE for any X. But FALSE would also imply that X is FALSE, so you are entitled to your view as well, Martin. Then again, as pi might have been equal to 1 prior to the Big Bang, I see no reason why sqrt(2) shouldn't have been equal to 1 as well. After all, in those days we were all one... Jim Of course the question is off topic, but I like it. In my understanding mathematics is a theoretical model, that may or may not describe properly certain aspects of a reality. I cannot see, why a theoretical model should have any influence on our existence, as long as we don't apply it in an unreasonable way. To believe in our existence or to prove it is a totally different case. Heinz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to keep attributes when dropping factor levels?
Dear All, to drop unused factor levels two ways are outlined in R-help. In both cases a label attribute is lost. The same happens, when using car:::recode. Is there a simple way to avoid losing attributes? Thanks, Heinz ## example ff - factor(substring(statistics, 1:10, 1:10), levels=letters) attributes(ff)$label - 'test label' attributes(ff)$label gg - ff[, drop=TRUE] attributes(gg)$label hh - factor(ff) attributes(hh)$label ii - car:::recode(ff, 't'='s') attributes(ii)$label version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status Patched major 2 minor 8.1 year 2009 month 03 day13 svn rev48132 language R version.string R version 2.8.1 Patched (2009-03-13 r48132) sessionInfo() R version 2.8.1 Patched (2009-03-13 r48132) i386-pc-mingw32 locale: LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] car_1.2-12 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] factor, as.factor and levels
Dear All, to my surprise as.factor does not accept a levels argument. Maybe I did not read the documentation well enough. See the example below. I wanted to use ch1 as factor in the newdata argument of survfit, so I assumed that I could write as.factor(ch1, levels=ch1), since the order should be kept. But as.factor(ch1, levels=ch1) results in the error: Error in as.factor(ch1, levels = ch1) : unused argument(s) (levels = c(low, inter, high)) factor(ch1, levels=ch1) works as I expected. Is it intended that as.factor does not use the levels argument? Thanks, Heinz ch1 - c('low', 'inter', 'high') factor(ch1) factor(ch1, levels=ch1) as.factor(ch1, levels=ch1) version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status Patched major 2 minor 8.1 year 2009 month 03 day13 svn rev48132 language R version.string R version 2.8.1 Patched (2009-03-13 r48132) sessionInfo() R version 2.8.1 Patched (2009-03-13 r48132) i386-pc-mingw32 locale: LC_COLLATE=German_Switzerland.1252;LC_CTYPE=German_Switzerland.1252;LC_MONETARY=German_Switzerland.1252;LC_NUMERIC=C;LC_TIME=German_Switzerland.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] survival_2.34-1 car_1.2-12 gmodels_2.14.1 gdata_2.4.2 [5] Hmisc_3.5-2 loaded via a namespace (and not attached): [1] cluster_1.11.12 grid_2.8.1 gtools_2.5.0-1 lattice_0.17-20 [5] MASS_7.2-46 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] factor, as.factor and levels
Thank you, Jim. I see, the fact that in the documentation you find only as.factor(x) means that it does not accept more arguments. Does as.factor have speed advantages over factor, or is there a different cause for it's existence? Heinz At 13:50 08.04.2009, jim holtman wrote: as.factor does not accept levels as an argument. use the first form that you have factor(ch1, levels=ch1) On Wed, Apr 8, 2009 at 7:36 AM, Heinz Tuechler tuech...@gmx.at wrote: Dear All, to my surprise as.factor does not accept a levels argument. Maybe I did not read the documentation well enough. See the example below. I wanted to use ch1 as factor in the newdata argument of survfit, so I assumed that I could write as.factor(ch1, levels=ch1), since the order should be kept. But as.factor(ch1, levels=ch1) results in the error: Error in as.factor(ch1, levels = ch1) : unused argument(s) (levels = c(low, inter, high)) factor(ch1, levels=ch1) works as I expected. Is it intended that as.factor does not use the levels argument? Thanks, Heinz ch1 - c('low', 'inter', 'high') factor(ch1) factor(ch1, levels=ch1) as.factor(ch1, levels=ch1) version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status Patched major 2 minor 8.1 year 2009 month 03 day13 svn rev48132 language R version.string R version 2.8.1 Patched (2009-03-13 r48132) sessionInfo() R version 2.8.1 Patched (2009-03-13 r48132) i386-pc-mingw32 locale: LC_COLLATE=German_Switzerland.1252;LC_CTYPE=German_Switzerland.1252;LC_MONETARY=German_Switzerland.1252;LC_NUMERIC=C;LC_TIME=German_Switzerland.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] survival_2.34-1 car_1.2-12 gmodels_2.14.1 gdata_2.4.2 [5] Hmisc_3.5-2 loaded via a namespace (and not attached): [1] cluster_1.11.12 grid_2.8.1 gtools_2.5.0-1 lattice_0.17-20 [5] MASS_7.2-46 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] factor, as.factor and levels
Jim - you are right, I should have looked before. So there is a difference that should also effect the dropping of unused levels. Thanks, Heinz At 15:31 08.04.2009, jim holtman wrote: It is just a simple version of 'factor'. The only speed advantage it might have is that it checks to see if it is a factor first. Here is the definition: ) as.factor function (x) if (is.factor(x)) x else factor(x) environment: namespace:base You can always list out what the function does to get a better understanding of how it works. On Wed, Apr 8, 2009 at 8:16 AM, Heinz Tuechler tuech...@gmx.at wrote: Thank you, Jim. I see, the fact that in the documentation you find only as.factor(x) means that it does not accept more arguments. Does as.factor have speed advantages over factor, or is there a different cause for it's existence? Heinz At 13:50 08.04.2009, jim holtman wrote: as.factor does not accept levels as an argument. use the first form that you have factor(ch1, levels=ch1) On Wed, Apr 8, 2009 at 7:36 AM, Heinz Tuechler tuech...@gmx.at wrote: Dear All, to my surprise as.factor does not accept a levels argument. Maybe I did not read the documentation well enough. See the example below. I wanted to use ch1 as factor in the newdata argument of survfit, so I assumed that I could write as.factor(ch1, levels=ch1), since the order should be kept. But as.factor(ch1, levels=ch1) results in the error: Error in as.factor(ch1, levels = ch1) : unused argument(s) (levels = c(low, inter, high)) factor(ch1, levels=ch1) works as I expected. Is it intended that as.factor does not use the levels argument? Thanks, Heinz ch1 - c('low', 'inter', 'high') factor(ch1) factor(ch1, levels=ch1) as.factor(ch1, levels=ch1) version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status Patched major 2 minor 8.1 year 2009 month 03 day13 svn rev48132 language R version.string R version 2.8.1 Patched (2009-03-13 r48132) sessionInfo() R version 2.8.1 Patched (2009-03-13 r48132) i386-pc-mingw32 locale: LC_COLLATE=German_Switzerland.1252;LC_CTYPE=German_Switzerland.1252;LC_MONETARY=German_Switzerland.1252;LC_NUMERIC=C;LC_TIME=German_Switzerland.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] survival_2.34-1 car_1.2-12 gmodels_2.14.1 gdata_2.4.2 [5] Hmisc_3.5-2 loaded via a namespace (and not attached): [1] cluster_1.11.12 grid_2.8.1 gtools_2.5.0-1 lattice_0.17-20 [5] MASS_7.2-46 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (senza oggetto)
At 11:10 16.04.2009, giuseppef...@libero.it wrote: Dear all, I have a database x,y,value imported in R with read.table: dati- read.table(dati.dat) value is a categorical data (land use) and i want to plot in the same colour the same land use. It is possible with R. Thanks a lot __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Maybe, something like: set.seed(726) x - runif(10) y - runif(10) value - sample(c('agric', 'urban', 'traffic'), 10, replace=TRUE) plot(x, y, col=as.numeric(as.factor((value))), pch=as.numeric(as.factor((value Heinz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Umlaut read from csv-file
Dear All! Reading character strings containing an umlaut from a csv-file I find a (to me) surprising behaviour in R 2.8.0, that I did not notice in R 2.7.2. A comparison by == results in FALSE, while grep does find the aggreement. See the example below. The crucial line is x==div 1-2 Veränderungen, with the result [1] FALSE in R 2.8.0 but [1] TRUE in R 2.7.2. Thank you in advance for your help Heinz Tüchler # in R 2.8.0 patched x0 - div 1-2 Veränderungen # define a character string write.csv(x0, 'chr.csv', row.names=FALSE) # write a csv-file with one line rm(x0) x - read.csv('chr.csv', skip=0, header=TRUE, as.is=TRUE)$x # read in csv-file x x==div 1-2 Veränderungen [1] FALSE grep(div 1-2 Veränderungen, x) [1] 1 grep(div 1-2 Veränderungen, x, value=TRUE) [1] div 1-2 Veränderungen unlink('chr.csv') # delete file Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = Patched major = 2 minor = 8.0 year = 2008 month = 11 day = 04 svn rev = 46830 language = R version.string = R version 2.8.0 Patched (2008-11-04 r46830) Windows XP (build 2600) Service Pack 2 Locale: LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252 Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base # in R 2.7.2 patched x0 - div 1-2 Veränderungen # define a character string write.csv(x0, 'chr.csv', row.names=FALSE) # write a csv-file with one line rm(x0) x - read.csv('chr.csv', skip=0, header=TRUE, as.is=TRUE)$x # read in csv-file x x==div 1-2 Veränderungen [1] TRUE grep(div 1-2 Veränderungen, x) [1] 1 grep(div 1-2 Veränderungen, x, value=TRUE) [1] div 1-2 Veränderungen unlink('chr.csv') # delete file Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = Patched major = 2 minor = 7.2 year = 2008 month = 09 day = 02 svn rev = 46486 language = R version.string = R version 2.7.2 Patched (2008-09-02 r46486) Windows XP (build 2600) Service Pack 2 Locale: LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252 Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Umlaut read from csv-file
Dear Prof.Ripley! Thank you very much for your attention. In the given example Encoding(), or the encoding parameter of read.csv solve the problem. I hope your patch will solve also the problem, when I read a spss file by spss.get(), since this function has no encoding parameter and my real problem originated there. many thanks Heinz Tüchler At 23:51 06.11.2008, you wrote: Look at Encoding() on your two strings. The results are different, and this seems to be the root of the problem. Adding encoding=latin1 to the read.csv call is a workaround. It looks like there is a problem in the use of the CHARSXP cache: if I save the session then x0 == x becomes true when I reload it, even though the encodings remain different. I've found the immediate cause and will change this in R-patched shortly. On Thu, 6 Nov 2008, Heinz Tuechler wrote: Dear All! Reading character strings containing an umlaut from a csv-file I find a (to me) surprising behaviour in R 2.8.0, that I did not notice in R 2.7.2. A comparison by == results in FALSE, while grep does find the aggreement. See the example below. The crucial line is x==div 1-2 Veränderungen, with the result [1] FALSE in R 2.8.0 but [1] TRUE in R 2.7.2. Thank you in advance for your help Heinz Tüchler # in R 2.8.0 patched x0 - div 1-2 Veränderungen # define a character string write.csv(x0, 'chr.csv', row.names=FALSE) # write a csv-file with one line rm(x0) x - read.csv('chr.csv', skip=0, header=TRUE, as.is=TRUE)$x # read in csv-file x x==div 1-2 Veränderungen [1] FALSE grep(div 1-2 Veränderungen, x) [1] 1 grep(div 1-2 Veränderungen, x, value=TRUE) [1] div 1-2 Veränderungen unlink('chr.csv') # delete file Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = Patched major = 2 minor = 8.0 year = 2008 month = 11 day = 04 svn rev = 46830 language = R version.string = R version 2.8.0 Patched (2008-11-04 r46830) Windows XP (build 2600) Service Pack 2 Locale: LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252 Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base # in R 2.7.2 patched x0 - div 1-2 Veränderungen # define a character string write.csv(x0, 'chr.csv', row.names=FALSE) # write a csv-file with one line rm(x0) x - read.csv('chr.csv', skip=0, header=TRUE, as.is=TRUE)$x # read in csv-file x x==div 1-2 Veränderungen [1] TRUE grep(div 1-2 Veränderungen, x) [1] 1 grep(div 1-2 Veränderungen, x, value=TRUE) [1] div 1-2 Veränderungen unlink('chr.csv') # delete file Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = Patched major = 2 minor = 7.2 year = 2008 month = 09 day = 02 svn rev = 46486 language = R version.string = R version 2.7.2 Patched (2008-09-02 r46486) Windows XP (build 2600) Service Pack 2 Locale: LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252 Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Encoding() and strsplit()
Dear All, Encoding() goes beyond my understanding. See the example. I would expect from reading the help for Encoding() that strsplit preserves the encoding for each resulting element, but for simple letters it gets lost. Also it seems that an Encoding() cannot be declared for simple letters. They remain in any case unknown. In paste() latin1 seems to dominate unknown. What kind of characteristic of an object is the encoding? It does not show up as attribute and also str() does not give me any hint. Where can I find some explanation regarding encoding? Thanks Heinz ### Encoding() and strsplit u - 'abcäöü' Encoding(u) [1] latin1 Encoding(u) - 'latin1' # to be sure about encoding us - strsplit(u, '')[[1]] # split in single strings Encoding(us) [1] unknown unknown unknown latin1 latin1 latin1 Encoding(us) - rep('latin1', length(us)) Encoding(us) [1] unknown unknown unknown latin1 latin1 latin1 pus - paste(us[1], us[5], sep='') Encoding(pus) [1] latin1 Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = Patched major = 2 minor = 8.0 year = 2008 month = 11 day = 04 svn rev = 46830 language = R version.string = R version 2.8.0 Patched (2008-11-04 r46830) Windows XP (build 2600) Service Pack 2 Locale: LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252 Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Encoding() and strsplit()
At 09:15 07.11.2008, Prof Brian Ripley wrote: See the 'R Internals' manual. Thank you, now I understand a little more. My real problem, however is a data frame produced by spss.get(). Is there a simple possibility to mark all characters in that data.frame (except ASCII characters), including levels of factors to latin1? Heinz Tüchler ASCII characters are not marked as Latin-1 nor UTF-8. On Fri, 7 Nov 2008, Heinz Tuechler wrote: Dear All, Encoding() goes beyond my understanding. See the example. I would expect from reading the help for Encoding() that strsplit preserves the encoding for each resulting element, but for simple letters it gets lost. Also it seems that an Encoding() cannot be declared for simple letters. They remain in any case unknown. In paste() latin1 seems to dominate unknown. What kind of characteristic of an object is the encoding? It does not show up as attribute and also str() does not give me any hint. Where can I find some explanation regarding encoding? Thanks Heinz ### Encoding() and strsplit u - 'abcäöü' Encoding(u) [1] latin1 Encoding(u) - 'latin1' # to be sure about encoding us - strsplit(u, '')[[1]] # split in single strings Encoding(us) [1] unknown unknown unknown latin1 latin1 latin1 Encoding(us) - rep('latin1', length(us)) Encoding(us) [1] unknown unknown unknown latin1 latin1 latin1 pus - paste(us[1], us[5], sep='') Encoding(pus) [1] latin1 Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = Patched major = 2 minor = 8.0 year = 2008 month = 11 day = 04 svn rev = 46830 language = R version.string = R version 2.8.0 Patched (2008-11-04 r46830) Windows XP (build 2600) Service Pack 2 Locale: LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252 Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Umlaut read from csv-file
At 13:34 07.11.2008, Peter Dalgaard wrote: Heinz Tuechler wrote: Dear Prof.Ripley! Thank you very much for your attention. In the given example Encoding(), or the encoding parameter of read.csv solve the problem. I hope your patch will solve also the problem, when I read a spss file by spss.get(), since this function has no encoding parameter and my real problem originated there. read.spss() (package foreign) does have a reencode argument, though; and this is called by spss.get(), so it looks like an easy hack to add it there. Thank you, that means, I have to change spss.get to make it accept the reencode argument and pass it to read.spss. At the moment I prefer to step back to R 2.7.2 and to wait for a more general solution, because to me, there seem to be still strange effects of encoding. In the following example the encoding gets lost by dumping and rereading, even if I use the encoding parameter of source(). But may be, I don't understand what this parameter should do. Heinz Tüchler us - c(a, b, c, ä, ö, ü) Encoding(us) [1] unknown unknown unknown latin1 latin1 latin1 dump('us', 'us_dump.txt') rm(us) source('us_dump.txt', encoding='latin1') us [1] a b c ä ö ü Encoding(us) [1] unknown unknown unknown unknown unknown unknown unlink('us_dump.txt') -- O__ Peter Dalgaard Ãster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mismatch in logical result?
Maybe this? http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f At 11:23 07.11.2008, Shubha Vishwanath Karanth wrote: Content-Type: text/plain Content-Disposition: inline Content-length: 569 Hi R, I have certain checkings, which gives FALSE, but actually it is true. Why does this happen? Note that the equations that I am checking below are not even the case of recurring decimals... 1.4^2 == 1.96 [1] FALSE 1.2^3==1.728 [1] FALSE Thanks in advance, Shubha Shubha Karanth | Amba Research Ph +91 80 3980 8031 | Mob +91 94 4886 4510 Bangalore * Colombo * London * New York * San José * Singapore * www.ambaresearch.com This e-mail may contain confidential and/or privileged i...{{dropped:13}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Umlaut read from csv-file
At 16:52 07.11.2008, Prof Brian Ripley wrote: On Fri, 7 Nov 2008, Peter Dalgaard wrote: Heinz Tuechler wrote: Dear Prof.Ripley! Thank you very much for your attention. In the given example Encoding(), or the encoding parameter of read.csv solve the problem. I hope your patch will solve also the problem, when I read a spss file by spss.get(), since this function has no encoding parameter and my real problem originated there. read.spss() (package foreign) does have a reencode argument, though; and this is called by spss.get(), so it looks like an easy hack to add it there. Yes, older software like spss.get needs to get updated for the internationalization age. Modifying it to have a ... argument passed to read.spss would be a good idea (and future-proofing). In cases like this it is likely that the SPSS file does contain its encoding (although sometimes it does not and occasionally it is wrong), so it is helpful to make use of the info if it is there. However, the default is read.spss(reencode=NA) because of the problems of assuming that the info is correct when it is not are worse. The cause, why I tried the example below was to solve the encoding by dumping and then re-sourcing a data.frame with the encoding parameter set to latin1. As you can see, source(x, encoding='latin1') does not have the effect I expected. Unfortunately I do not have any idea, what I understood wrong regarding the meaning of encoding='latin1'. Heinz Tüchler us - c(a, b, c, ä, ö, ü) Encoding(us) [1] unknown unknown unknown latin1 latin1 latin1 dump('us', 'us_dump.txt') rm(us) source('us_dump.txt', encoding='latin1') us [1] a b c ä ö ü Encoding(us) [1] unknown unknown unknown unknown unknown unknown unlink('us_dump.txt') -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Umlaut read from csv-file
At 08:01 08.11.2008, Prof Brian Ripley wrote: We have no idea what you understood (you didn't tell us), but the help says encoding: character vector. The encoding(s) to be assumed when 'file' is a character string: see 'file'. A possible value is 'unknown': see the âDetailsâ. ... This paragraph applies if 'file' is a filename (rather than a connection). If 'encoding = unknown', an attempt is made to guess the encoding. The result of 'localeToCharset()' is used as a guide. If 'encoding' has two or more elements, they are tried in turn until the file/URL can be read without error in the trial encoding. So source(encoding=latin1) says the file is encoded in Latin-1 and should be re-encoded if necessary (e.g. in UTF-8 locale). Setting the Encoding of parsed character strings is not mentioned. You could have written out a data frame with write.csv() and re-read it with read.csv(encoding = latin1): that was the workaround you were given earlier (not to use source). Thank you for this explanation. I felt that I did not understand the help page of source() and I hoped, encoding='latin1' would have the same effect as in read.csv(), but rethinking it, I see that it would conflict with the primary functionality of source(). Earlier I tried writing the data.frame with write.csv and re-reading it. This works, but additional information like labels(), I have to tranfer in a second step. The best way I could immagine, would be some function, which marks every character string in the whole structure of a data.frame, including all attributes, as latin1. On Sat, 8 Nov 2008, Heinz Tuechler wrote: At 16:52 07.11.2008, Prof Brian Ripley wrote: On Fri, 7 Nov 2008, Peter Dalgaard wrote: Heinz Tuechler wrote: Dear Prof.Ripley! Thank you very much for your attention. In the given example Encoding(), or the encoding parameter of read.csv solve the problem. I hope your patch will solve also the problem, when I read a spss file by spss.get(), since this function has no encoding parameter and my real problem originated there. read.spss() (package foreign) does have a reencode argument, though; and this is called by spss.get(), so it looks like an easy hack to add it there. Yes, older software like spss.get needs to get updated for the internationalization age. Modifying it to have a ... argument passed to read.spss would be a good idea (and future-proofing). In cases like this it is likely that the SPSS file does contain its encoding (although sometimes it does not and occasionally it is wrong), so it is helpful to make use of the info if it is there. However, the default is read.spss(reencode=NA) because of the problems of assuming that the info is correct when it is not are worse. The cause, why I tried the example below was to solve the encoding by dumping and then re-sourcing a data.frame with the encoding parameter set to latin1. As you can see, source(x, encoding='latin1') does not have the effect I expected. Unfortunately I do not have any idea, what I understood wrong regarding the meaning of encoding='latin1'. Heinz Tüchler us - c(a, b, c, ä, ö, ü) Encoding(us) [1] unknown unknown unknown latin1 latin1 latin1 dump('us', 'us_dump.txt') rm(us) source('us_dump.txt', encoding='latin1') us [1] a b c ä ö ü Encoding(us) [1] unknown unknown unknown unknown unknown unknown unlink('us_dump.txt') -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] attr.all.equal() and all.equal(attributes(), attributes())
Dear All! If I try to compare the attributes of two objects, I find a surprising behaviour of attr.all.equal(). With identical attributes I receive the answert NULL. If the attributes differ, the answer is as expecxted and differences are shown. all.equal(attributes(), attributes()) instead returns TRUE, if attributes are equal. See example: v - 1:5 attr(v, 'testattribute') - 'testattribute v' v_c - v attributes(v) $testattribute [1] testattribute v attributes(v_c) $testattribute [1] testattribute v all.equal(v, v_c) [1] TRUE attr.all.equal(v, v_c) NULL - - - - - - - - - - here is, what I don't expected all.equal(attributes(v), attributes(v_c)) [1] TRUE attr(v_c, 'testattribute') - 'testattribute v_c' attr.all.equal(v, v_c) [1] Attributes: Component 1: 1 string mismatch all.equal(attributes(v), attributes(v_c)) [1] Component 1: 1 string mismatch Thanks for your attention Heinz Tüchler Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = Patched major = 2 minor = 8.0 year = 2008 month = 11 day = 04 svn rev = 46830 language = R version.string = R version 2.8.0 Patched (2008-11-04 r46830) Windows XP (build 2600) Service Pack 2 Locale: LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252 Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compare objects
At 13:26 09.11.2008, Leon Yee wrote: Hi, friends Is there any functions for object comparing? For example, I have two list objects, and I want to know whether they are the same. Since the the components of list are not necessary atomic, this kind of comparison should be recursive. Does this kind of function exist? Thank you for your help! Leon see maybe: all.equal() identical() Heinz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Umlaut read from csv-file
At 06:25 09.11.2008, Prof Brian Ripley wrote: On Sat, 8 Nov 2008, Heinz Tuechler wrote: At 08:01 08.11.2008, Prof Brian Ripley wrote: We have no idea what you understood (you didn't tell us), but the help says encoding: character vector. The encoding(s) to be assumed when 'file' is a character string: see 'file'. A possible value is 'unknown': see the â??Detailsâ??. ... This paragraph applies if 'file' is a filename (rather than a connection). If 'encoding = unknown', an attempt is made to guess the encoding. The result of 'localeToCharset()' is used as a guide. If 'encoding' has two or more elements, they are tried in turn until the file/URL can be read without error in the trial encoding. So source(encoding=latin1) says the file is encoded in Latin-1 and should be re-encoded if necessary (e.g. in UTF-8 locale). Setting the Encoding of parsed character strings is not mentioned. You could have written out a data frame with write.csv() and re-read it with read.csv(encoding = latin1): that was the workaround you were given earlier (not to use source). Thank you for this explanation. I felt that I did not understand the help page of source() and I hoped, encoding='latin1' would have the same effect as in read.csv(), but rethinking it, I see that it would conflict with the primary functionality of source(). Earlier I tried writing the data.frame with write.csv and re-reading it. This works, but additional information like labels(), I have to tranfer in a second step. The best way I could immagine, would be some function, which marks every character string in the whole structure of a data.frame, including all attributes, as latin1. I think it is possible that con - file(foo) source(con, encoding=latin1) close(foo) will also do what you want, although that's an udocumented side effect. You are right. It does work in my real data problem. Thank you. (minor remark: I think close(foo) should be close(con)) But all of this should be unnecessary in R-patched (although it is possible that there are other quirks with unmarked strings lurking in the shadows, there are no other obvious changes from 2.7.2). On Sat, 8 Nov 2008, Heinz Tuechler wrote: At 16:52 07.11.2008, Prof Brian Ripley wrote: On Fri, 7 Nov 2008, Peter Dalgaard wrote: Heinz Tuechler wrote: Dear Prof.Ripley! Thank you very much for your attention. In the given example Encoding(), or the encoding parameter of read.csv solve the problem. I hope your patch will solve also the problem, when I read a spss file by spss.get(), since this function has no encoding parameter and my real problem originated there. read.spss() (package foreign) does have a reencode argument, though; and this is called by spss.get(), so it looks like an easy hack to add it there. Yes, older software like spss.get needs to get updated for the internationalization age. Modifying it to have a ... argument passed to read.spss would be a good idea (and future-proofing). In cases like this it is likely that the SPSS file does contain its encoding (although sometimes it does not and occasionally it is wrong), so it is helpful to make use of the info if it is there. However, the default is read.spss(reencode=NA) because of the problems of assuming that the info is correct when it is not are worse. The cause, why I tried the example below was to solve the encoding by dumping and then re-sourcing a data.frame with the encoding parameter set to latin1. As you can see, source(x, encoding='latin1') does not have the effect I expected. Unfortunately I do not have any idea, what I understood wrong regarding the meaning of encoding='latin1'. Heinz Tüchler us - c(a, b, c, ä, ö, ü) Encoding(us) [1] unknown unknown unknown latin1 latin1 latin1 dump('us', 'us_dump.txt') rm(us) source('us_dump.txt', encoding='latin1') us [1] a b c ä ö ü Encoding(us) [1] unknown unknown unknown unknown unknown unknown unlink('us_dump.txt') -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied
Re: [R] attr.all.equal() and all.equal(attributes(), attributes())
At 14:24 09.11.2008, Peter Dalgaard wrote: Heinz Tuechler wrote: Dear All! If I try to compare the attributes of two objects, I find a surprising behaviour of attr.all.equal(). With identical attributes I receive the answert NULL. If the attributes differ, the answer is as expecxted and differences are shown. all.equal(attributes(), attributes()) instead returns TRUE, if attributes are equal. That _is_ as documented, although if you want to quibble, it should probably also have been mentioned in the Value section. Sorry, I admit that I read only the value section. Heinz I don't know if there's a a rationale for it. The actual code goes msg - NULL if () msg - c(msg, some text) if (. msg so it returns NULL if none of the ifs are taken, but it could easily be changed to return if (is.null(msg)) TRUE else msg (easily depending, of course, on how much code actually depends on the current behaviour...) See example: v - 1:5 attr(v, 'testattribute') - 'testattribute v' v_c - v attributes(v) $testattribute [1] testattribute v attributes(v_c) $testattribute [1] testattribute v all.equal(v, v_c) [1] TRUE attr.all.equal(v, v_c) NULL - - - - - - - - - - here is, what I don't expected all.equal(attributes(v), attributes(v_c)) [1] TRUE attr(v_c, 'testattribute') - 'testattribute v_c' attr.all.equal(v, v_c) [1] Attributes: Component 1: 1 string mismatch all.equal(attributes(v), attributes(v_c)) [1] Component 1: 1 string mismatch Thanks for your attention Heinz Tüchler Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = Patched major = 2 minor = 8.0 year = 2008 month = 11 day = 04 svn rev = 46830 language = R version.string = R version 2.8.0 Patched (2008-11-04 r46830) Windows XP (build 2600) Service Pack 2 Locale: LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252 Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survival::survfit,plot.survfit
At 15:28 26.02.2009, Terry Therneau wrote: plot(survfit(fit)) should plot the survival-function for x=0 or equivalently beta'=0. This curve is independent of any covariates. This is not correct. It plots the curve for a hypothetical subject with x= mean of each covariate. Does this mean, the curve corresponds to the one you would get based on the base line hazard? Heinz This is NOT the average survival of the data set. Imagine a cohort made up of 60 year old men and their 10 year old grandsons: the expected survival of this cohort does not look that for a 35 year old male. Terry T __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to write a Surv object to a csv-file?
Dear All, trying to write a data.frame, containing Surv objects to a csv-file I get Error in dimnames(X) - list(dn[[1L]], unlist(collabs, use.names = FALSE)) : length of 'dimnames' [2] not equal to array extent. See example below. May be, I overlooked something, but I expected that also data.frames containing Surv objects may be written to csv files. Is there a better way to write to csv files? Thanks, Heinz Tüchler ### write Surv-object in csv-file library(survival) ## create example data soa - Surv(1:5, c(0, 0, 1, 0, 1)) df.soa - data.frame(soa) write.csv(df.soa, 'df.soa.csv')## works as I expected read.csv('df.soa.csv') ## works as I expected df.soa2 - data.frame(soa, soa2=soa) write.csv(df.soa2, 'df.soa2.csv') ## works as I expected read.csv('df.soa2.csv')## works as I expected char1 - letters[1:5] df.soac - data.frame(soa, char1) write.csv(df.soac, 'df.soac.csv') ## generates the following error message: Error in dimnames(X) - list(dn[[1L]], unlist(collabs, use.names = FALSE)) : length of 'dimnames' [2] not equal to array extent df.csoa - data.frame(char1, soa) write.csv(df.csoa, 'df.soac.csv') ## generates the following error message: Error in dimnames(X) - list(dn[[1L]], unlist(collabs, use.names = FALSE)) : length of 'dimnames' [2] not equal to array extent platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status Patched major 2 minor 8.0 year 2008 month 11 day10 svn rev46884 language R version.string R version 2.8.0 Patched (2008-11-10 r46884) sessionInfo() R version 2.8.0 Patched (2008-11-10 r46884) i386-pc-mingw32 locale: LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] survival_2.34-1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to write a Surv object to a csv-file?
Dear David! Thank you for your response. I like csv files, because in that case I can easily compare different versions of similar data.frames. Similar in this case means that I may add a column or change some transformation command for one column. With dput it's rather difficult, and when I tried the compare package, I had no success comparing data.frames containing Surv objects. Thanks again Heinz At 22:31 19.12.2008, David Winsemius wrote: On Dec 19, 2008, at 2:04 PM, Heinz Tuechler wrote: Dear All, trying to write a data.frame, containing Surv objects to a csv-file I get Error in dimnames(X) - list(dn[[1L]], unlist(collabs, use.names = FALSE)) : length of 'dimnames' [2] not equal to array extent. See example below. May be, I overlooked something, but I expected that also data.frames containing Surv objects may be written to csv files. Is there a better way to write to csv files? Yes, if the goal is creating an ASCII structure that can be recovered by an R interpreter: ?dput ?dget dput(df.soac, test) copy.df.soac - dget(test) all.equal(df.soac, copy.df.soac) Doesn't give you a result that you would want to read with Excel, but that does not appear to be your goal. You can examine it with a text editor. -- David Winsemius Thanks, Heinz Tüchler ### write Surv-object in csv-file library(survival) ## create example data soa - Surv(1:5, c(0, 0, 1, 0, 1)) df.soa - data.frame(soa) write.csv(df.soa, 'df.soa.csv')## works as I expected read.csv('df.soa.csv') ## works as I expected df.soa2 - data.frame(soa, soa2=soa) write.csv(df.soa2, 'df.soa2.csv') ## works as I expected read.csv('df.soa2.csv')## works as I expected char1 - letters[1:5] df.soac - data.frame(soa, char1) write.csv(df.soac, 'df.soac.csv') ## generates the following error message: Error in dimnames(X) - list(dn[[1L]], unlist(collabs, use.names = FALSE)) : length of 'dimnames' [2] not equal to array extent df.csoa - data.frame(char1, soa) write.csv(df.csoa, 'df.soac.csv') ## generates the following error message: Error in dimnames(X) - list(dn[[1L]], unlist(collabs, use.names = FALSE)) : length of 'dimnames' [2] not equal to array extent platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status Patched major 2 minor 8.0 year 2008 month 11 day10 svn rev46884 language R version.string R version 2.8.0 Patched (2008-11-10 r46884) sessionInfo() R version 2.8.0 Patched (2008-11-10 r46884) i386-pc-mingw32 locale: LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria. 1252;LC_MONETARY=German_Austria. 1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] survival_2.34-1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to write a Surv object to a csv-file?
Dear Charles, yes, your solution does what I need. Maybe, it offers also a way to use the compare package with Surv objects. Thank you, Heinz At 23:30 19.12.2008, Charles C. Berry wrote: On Fri, 19 Dec 2008, Heinz Tuechler wrote: Dear David! Thank you for your response. I like csv files, because in that case I can easily compare different versions of similar data.frames. Similar in this case means that I may add a column or change some transformation command for one column. With dput it's rather difficult, and when I tried the compare package, I had no success comparing data.frames containing Surv objects. Heinz, Is this good enough? mat - as.data.frame( lapply( df.soac, unclass ) ) write.csv(mat,'mat.csv') read.csv('mat.csv') X soa.time soa.status char1 1 11 0 1 2 22 0 2 3 33 1 3 4 44 0 4 5 55 1 5 The bug seems to be in as.matrix.data.frame. HTH, Chuck Thanks again Heinz At 22:31 19.12.2008, David Winsemius wrote: On Dec 19, 2008, at 2:04 PM, Heinz Tuechler wrote: Dear All, trying to write a data.frame, containing Surv objects to a csv-file I get Error in dimnames(X) - list(dn[[1L]], unlist(collabs, use.names = FALSE)) : length of 'dimnames' [2] not equal to array extent. See example below. May be, I overlooked something, but I expected that also data.frames containing Surv objects may be written to csv files. Is there a better way to write to csv files? Yes, if the goal is creating an ASCII structure that can be recovered by an R interpreter: ?dput ?dget dput(df.soac, test) copy.df.soac - dget(test) all.equal(df.soac, copy.df.soac) Doesn't give you a result that you would want to read with Excel, but that does not appear to be your goal. You can examine it with a text editor. -- David Winsemius Thanks, Heinz Tüchler ### write Surv-object in csv-file library(survival) ## create example data soa - Surv(1:5, c(0, 0, 1, 0, 1)) df.soa - data.frame(soa) write.csv(df.soa, 'df.soa.csv')## works as I expected read.csv('df.soa.csv') ## works as I expected df.soa2 - data.frame(soa, soa2=soa) write.csv(df.soa2, 'df.soa2.csv') ## works as I expected read.csv('df.soa2.csv')## works as I expected char1 - letters[1:5] df.soac - data.frame(soa, char1) write.csv(df.soac, 'df.soac.csv') ## generates the following error message: Error in dimnames(X) - list(dn[[1L]], unlist(collabs, use.names = FALSE)) : length of 'dimnames' [2] not equal to array extent df.csoa - data.frame(char1, soa) write.csv(df.csoa, 'df.soac.csv') ## generates the following error message: Error in dimnames(X) - list(dn[[1L]], unlist(collabs, use.names = FALSE)) : length of 'dimnames' [2] not equal to array extent platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status Patched major 2 minor 8.0 year 2008 month 11 day10 svn rev46884 language R version.string R version 2.8.0 Patched (2008-11-10 r46884) sessionInfo() R version 2.8.0 Patched (2008-11-10 r46884) i386-pc-mingw32 locale: LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria. 1252;LC_MONETARY=German_Austria. 1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] survival_2.34-1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How many attributes are there of a variable?
Peng, based on a suggestion, Frank made years ago (18.7.2006), I use one attribute that contains all further attributes, I want to assign to variables. It's necessary to create your own class and subsetting method, so that this attribute does not get lost. Together with some functions I use labels for variables, value.labels, missing.value definitions etc. It seems, without protection by your own class and the corresponding subsetting method, you can never be sure, if an attribute survives subsetting. Heinz At 23:21 06.09.2009, Frank E Harrell Jr wrote: Peng, You can create all the attributes you want, with one headache: R does not keep attributes across subsetting operations so you need to write classes and [.something methods when attributions need to be kept or adjusted upon subsetting rows. The Hmisc package uses attributes such as label, units, imputed. You might look at the code to see how it did that. For example, label(x) will use attr(x, 'label') to fetch the 'label' attribute. There are attribute-setting functions there too. Frank Peng Yu wrote: Hi, According to the example below this email, attr(x,names) is the same as names(x). I am wondering how many attributes there are of a given variable. How to find out what they are? Can I always use some_attribute(x) instead of attr(x, some_attribute)? Regards, Peng x=c(1,2,3) attr(x,names)=c(a,b,c) x a b c 1 2 3 y=c(1,2,3) names(y)=c(a,b,c) y a b c 1 2 3 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What determines the unit of POSIXct differences?
Jim - Thank you very much for this explanation and the hint to use difftime. Heinz At 15:09 11.09.2009, jim holtman wrote: '-' calls 'difftime' which, if you don't specify the units, makes the following assumptions in the code: difftime function (time1, time2, tz = , units = c(auto, secs, mins, hours, days, weeks)) { time1 - as.POSIXct(time1, tz = tz) time2 - as.POSIXct(time2, tz = tz) z - unclass(time1) - unclass(time2) units - match.arg(units) if (units == auto) { if (all(is.na(z))) units - secs else { zz - min(abs(z), na.rm = TRUE) if (is.na(zz) || zz 60) units - secs else if (zz 3600) units - mins else if (zz 86400) units - hours else units - days } } switch(units, secs = structure(z, units = secs, class = difftime), mins = structure(z/60, units = mins, class = difftime), hours = structure(z/3600, units = hours, class = difftime), days = structure(z/86400, units = days, class = difftime), weeks = structure(z/(7 * 86400), units = weeks, class = difftime)) } You can use difftime explicitly so you can control the units. c(as.POSIXct('2009-09-01'), as.POSIXct('2009-10-11')) - as.POSIXct('2009-08-31') Time differences in days [1] 1 41 difftime(c(as.POSIXct('2009-09-01'), as.POSIXct('2009-10-11')), as.POSIXct('2009-08-31'), units='sec') Time differences in secs [1] 86400 3542400 On Fri, Sep 11, 2009 at 7:50 AM, Heinz Tuechler tuech...@gmx.at wrote: Dear All, what determines if a difference between POSIXct objects gets expressed in days or seconds? In the following example, it's sometimes seconds, sometimes days. as.POSIXct('2009-09-01') - as.POSIXct(NA) Time difference of NA secs c(as.POSIXct('2009-09-01'), as.POSIXct(NA)) - c(as.POSIXct('2009-09-01'), as.POSIXct('2009-08-31')) Time differences in secs [1] 0 NA c(as.POSIXct('2009-09-01'), as.POSIXct(NA)) - as.POSIXct('2009-08-31') Time differences in days [1] 1 NA Thanks, Heinz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to assess object names within a function in lapply or l_ply?
Dear All, to produce output of several columns of a data frame, I tried to use lapply and also l_ply. In both cases, I would like to print a header line containing also the name of the respective column in the data frame. For example, I would like the following lapply(data.frame(a=1:3, b=2:4), function(x) print(deparse(substitute(x to produce: [1] a [1] b and not, what it actually does: [1] X[[1L]] [1] X[[2L]] $a [1] X[[1L]] $b [1] X[[2L]] or with l_ply (plyr package) l_ply(data.frame(a=1:3, b=2:4), function(x) print(deparse(substitute(x to produce: [1] a [1] b and not, what it actually does: [1] .data[[i]] [1] .data[[i]] Is this possible? Thanks, Heinz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to assess object names within a function in lapply or l_ply?
Thank you, Henrique, my example was simplified. In a more complexe function I want to use the objects, not just their names. In your solution, I have to adapt the function itself, depending on the name of the data.frame, which I would like to avoid. Thanks, Heinz At 13:36 28.09.2009, Henrique Dallazuanna wrote: You can use names insteed: DF - data.frame(a=1:3, b=2:4) lapply(names(DF), function(x){ print(x) DF[x] }) On Mon, Sep 28, 2009 at 8:22 AM, Heinz Tuechler tuech...@gmx.at wrote: Dear All, to produce output of several columns of a data frame, I tried to use lapply and also l_ply. In both cases, I would like to print a header line containing also the name of the respective column in the data frame. For example, I would like the following lapply(data.frame(a=1:3, b=2:4), function(x) print(deparse(substitute(x to produce: [1] a [1] b and not, what it actually does: [1] X[[1L]] [1] X[[2L]] $a [1] X[[1L]] $b [1] X[[2L]] or with l_ply (plyr package) l_ply(data.frame(a=1:3, b=2:4), function(x) print(deparse(substitute(x to produce: [1] a [1] b and not, what it actually does: [1] .data[[i]] [1] .data[[i]] Is this possible? Thanks, Heinz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to assess object names within a function in lapply or l_ply?
Henrique, based on your solution I found out, how to avoid to name explicitly the object. lapply(data.frame(a=1:3, b=2:4), function(x) names(eval(as.list(sys.call(-1))[[2]])) [as.numeric(gsub([^0-9], , deparse(substitute(x] ) Thanks, Heinz At 13:57 28.09.2009, Henrique Dallazuanna wrote: Heinz, Try this: lapply(DF, function(x)names(DF)[as.numeric(gsub([^0-9], , deparse(substitute(x]) On Mon, Sep 28, 2009 at 8:43 AM, Heinz Tuechler tuech...@gmx.at wrote: Thank you, Henrique, my example was simplified. In a more complexe function I want to use the objects, not just their names. In your solution, I have to adapt the function itself, depending on the name of the data.frame, which I would like to avoid. Thanks, Heinz At 13:36 28.09.2009, Henrique Dallazuanna wrote: You can use names insteed: DF - data.frame(a=1:3, b=2:4) lapply(names(DF), function(x){ print(x) DF[x] }) On Mon, Sep 28, 2009 at 8:22 AM, Heinz Tuechler tuech...@gmx.at wrote: Dear All, to produce output of several columns of a data frame, I tried to use lapply and also l_ply. In both cases, I would like to print a header line containing also the name of the respective column in the data frame. For example, I would like the following lapply(data.frame(a=1:3, b=2:4), function(x) print(deparse(substitute(x to produce: [1] a [1] b and not, what it actually does: [1] X[[1L]] [1] X[[2L]] $a [1] X[[1L]] $b [1] X[[2L]] or with l_ply (plyr package) l_ply(data.frame(a=1:3, b=2:4), function(x) print(deparse(substitute(x to produce: [1] a [1] b and not, what it actually does: [1] .data[[i]] [1] .data[[i]] Is this possible? Thanks, Heinz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to assess object names within a function in lapply or l_ply?
Hadley, many thanks for your answer and for the enormous work you put into plyr, a really powerful package. For now, I will solve my problem with a variable label attribute, I usually attach to columns in data frames. I asked the list, because I thought, I am overlooking something trivial, since lapply itself apparently knows the object names, as it labels the output by them. It just does not supply them to the function it calls. Maybe deparse(substitute(x)) with the right environment would do it, but I did not find it. Thanks, Heinz At 16:27 28.09.2009, hadley wickham wrote: or with l_ply (plyr package) l_ply(data.frame(a=1:3, b=2:4), function(x) print(deparse(substitute(x The best way to do this is to supply both the object you want to iterate over, and its names. Unfortunately it's slightly difficult to create a data structure of the correct form to do this with m_ply. df - data.frame(a=1:3, b=2:4) input - list(x = df, name = names(df)) inputdf - structure(input, class = data.frame, row.names = seq_along(input[[1]])) m_ply(inputdf, function(x, name) { cat(name, -\n) print(x) }) I'll think about how to improve this for a future version. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to assess object names within a function in lapply or l_ply?
At 18:17 28.09.2009, hadley wickham wrote: many thanks for your answer and for the enormous work you put into plyr, a really powerful package. For now, I will solve my problem with a variable label attribute, I usually attach to columns in data frames. I asked the list, because I thought, I am overlooking something trivial, since lapply itself apparently knows the object names, as it labels the output by them. It just does not supply them to the function it calls. lapply knows the names - the calling function doesn't - it takes the output add then fixes up the names after it's run. Hadley A theoretical question, as you are not responsible for lapply: would you think that problems arise, if lapply would name each list object with it's name as it calls the function in it's body, instead of naming it X[[1L]], ... ? Heinz -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different result of multiple regression in R and SPSS
At 19.07.2011 18:50 -0700, Spencer Graves wrote: On 7/19/2011 4:04 PM, Bert Gunter wrote: On Tue, Jul 19, 2011 at 3:45 PM, David Winsemiusdwinsem...@comcast.net wrote: On Jul 19, 2011, at 6:29 PM, J. wrote: Thanks for the answer. # However, I am still curious about which result I should use? The result from R or the one from SPSS? It is becoming apparent that you do not know how to use the results from either system. The progress of science would be safer if you get some advice from a person that knows what they are doing. ## I nominate this for an R fortune. -- Bert None of us ever know what we're doing at some level. We often think we do, and sometimes we get results more in spite of what we've done than because of it. That of course increases our confidence and encourages us to repeat mistakes in contexts where we might not be so lucky. Spencer Wise! Heinz Why the results from two programs are different? Different parametrizations. If I had to guess I would bet that the gender coefficient is R is exactly twice that of the one from SPSS. They are probably both correct in the context of their respective codings. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Spencer Graves, PE, PhD President and Chief Technology Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 web: www.structuremonitoring.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] value.labels
At 11.08.2011 21:50 +0300, Zeki Çatav wrote: PrÅ, 2011-08-11 tarihinde 19:27 +0200 saatinde, Uwe Ligges yazdı: On 11.08.2011 19:22, David Winsemius wrote: On Aug 11, 2011, at 11:42 AM, Uwe Ligges wrote: On 11.08.2011 16:10, zcatav wrote: Hello R people, I have a data.frame. Status variable has 3 values. 0-alive, 1-dead and 2-missed.. . As I understood the question, just how to rename the levels was the original question. Uwe I don't want to rename levels or converting from numeric to string. I want to add each corresponding levels value, a label, as in SPSS. Level 0 labeled with alive, level 1 labeled with dead and level 2 labeled with missed. This is not possible with a factor, because factor levels can only be positive integers. Heinz -- Zeki Ãatav __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] value.labels
At 12.08.2011 09:11 +1200, Rolf Turner wrote: On 12/08/11 09:59, Heinz Tuechler wrote: At 11.08.2011 21:50 +0300, Zeki Çatav wrote: PrÅ, 2011-08-11 tarihinde 19:27 +0200 saatinde, Uwe Ligges yazdı: On 11.08.2011 19:22, David Winsemius wrote: On Aug 11, 2011, at 11:42 AM, Uwe Ligges wrote: On 11.08.2011 16:10, zcatav wrote: Hello R people, I have a data.frame. Status variable has 3 values. 0-alive, 1-dead and 2-missed.. . As I understood the question, just how to rename the levels was the original question. Uwe I don't want to rename levels or converting from numeric to string. I want to add each corresponding levels value, a label, as in SPSS. Level 0 labeled with alive, level 1 labeled with dead and level 2 labeled with missed. This is not possible with a factor, because factor levels can only be positive integers. That is just plain (ridiculously) wrong. RTFM. cheers, Rolf Turner So, how would you construct a factor with levels 0, 1, 2 and labels alive, dead, and missed, as the original post asked for? Heinz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] value.labels
At 12.08.2011 11:05 +1200, Rolf Turner wrote: On 12/08/11 11:34, Heinz Tuechler wrote: At 12.08.2011 09:11 +1200, Rolf Turner wrote: On 12/08/11 09:59, Heinz Tuechler wrote: At 11.08.2011 21:50 +0300, Zeki Çatav wrote: PrÅ, 2011-08-11 tarihinde 19:27 +0200 saatinde, Uwe Ligges yazdı: On 11.08.2011 19:22, David Winsemius wrote: On Aug 11, 2011, at 11:42 AM, Uwe Ligges wrote: On 11.08.2011 16:10, zcatav wrote: Hello R people, I have a data.frame. Status variable has 3 values. 0-alive, 1-dead and 2-missed.. . As I understood the question, just how to rename the levels was the original question. Uwe I don't want to rename levels or converting from numeric to string. I want to add each corresponding levels value, a label, as in SPSS. Level 0 labeled with alive, level 1 labeled with dead and level 2 labeled with missed. This is not possible with a factor, because factor levels can only be positive integers. That is just plain (ridiculously) wrong. RTFM. cheers, Rolf Turner So, how would you construct a factor with levels 0, 1, 2 and labels alive, dead, and missed, as the original post asked for? Heinz As I said, RTFM. But for completeness: x - sample(0:2,100,TRUE) y - factor(x,labels=c(alive,dead,missed)) Duhhh. cheers, Rolf Turner Maybe you would like to look at the structure. str(y) Factor w/ 3 levels alive,dead,..: 3 1 2 2 3 3 2 1 1 1 ... or dput(y) structure(c(3L, 1L, 2L, 2L, 3L, 3L, 2L, 1L, 1L, 1L, 2L, 1L, 3L, 1L, 2L, 3L, 2L, 2L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 2L, 1L, 2L, 1L, 1L, 3L, 1L, 2L, 3L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 3L, 2L, 1L, 2L, 3L, 3L, 2L, 1L, 1L, 2L, 3L, 3L, 2L, 1L, 3L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 3L, 3L, 3L, 2L, 3L, 3L, 1L, 2L, 3L, 2L, 3L, 1L, 3L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 3L, 2L, 2L, 1L, 1L, 1L, 2L, 3L, 3L, 2L, 1L, 2L, 3L), .Label = c(alive, dead, missed ), class = factor) Anything else but positive integers? Heinz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] value.labels
At 12.08.2011 01:53 +0100, Heinz Tuechler wrote: At 12.08.2011 11:05 +1200, Rolf Turner wrote: On 12/08/11 11:34, Heinz Tuechler wrote: At 12.08.2011 09:11 +1200, Rolf Turner wrote: On 12/08/11 09:59, Heinz Tuechler wrote: At 11.08.2011 21:50 +0300, Zeki Çatav wrote: PrÅ, 2011-08-11 tarihinde 19:27 +0200 saatinde, Uwe Ligges yazdı: On 11.08.2011 19:22, David Winsemius wrote: On Aug 11, 2011, at 11:42 AM, Uwe Ligges wrote: On 11.08.2011 16:10, zcatav wrote: Hello R people, I have a data.frame. Status variable has 3 values. 0-alive, 1-dead and 2-missed.. . As I understood the question, just how to rename the levels was the original question. Uwe I don't want to rename levels or converting from numeric to string. I want to add each corresponding levels value, a label, as in SPSS. Level 0 labeled with alive, level 1 labeled with dead and level 2 labeled with missed. This is not possible with a factor, because factor levels can only be positive integers. That is just plain (ridiculously) wrong. RTFM. cheers, Rolf Turner So, how would you construct a factor with levels 0, 1, 2 and labels alive, dead, and missed, as the original post asked for? Heinz As I said, RTFM. But for completeness: x - sample(0:2,100,TRUE) y - factor(x,labels=c(alive,dead,missed)) Duhhh. cheers, Rolf Turner Maybe you would like to look at the structure. str(y) Factor w/ 3 levels alive,dead,..: 3 1 2 2 3 3 2 1 1 1 ... or dput(y) structure(c(3L, 1L, 2L, 2L, 3L, 3L, 2L, 1L, 1L, 1L, 2L, 1L, 3L, 1L, 2L, 3L, 2L, 2L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 2L, 1L, 2L, 1L, 1L, 3L, 1L, 2L, 3L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 3L, 2L, 1L, 2L, 3L, 3L, 2L, 1L, 1L, 2L, 3L, 3L, 2L, 1L, 3L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 3L, 3L, 3L, 2L, 3L, 3L, 1L, 2L, 3L, 2L, 3L, 1L, 3L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 3L, 2L, 2L, 1L, 1L, 1L, 2L, 3L, 3L, 2L, 1L, 2L, 3L), .Label = c(alive, dead, missed ), class = factor) Anything else but positive integers? Heinz To be fair, you can construct a factor containing zeros. b - c(0L,0L,1L,1L,1L,2L,2L,2L,2L) b [1] 0 0 1 1 1 2 2 2 2 str(b) int [1:9] 0 0 1 1 1 2 2 2 2 table(b) b 0 1 2 2 3 4 levels(b) - letters[1:3] str(b) atomic [1:9] 0 0 1 1 1 2 2 2 2 - attr(*, levels)= chr [1:3] a b c class(b) - 'factor' str(b) Factor w/ 3 levels a,b,c: 0 0 1 1 1 2 2 2 2 But, if you print it, you get a warning. b [1] a a a b b b b a a Levels: a b c Warning message: In xx[] - as.character(x) : number of items to replace is not a multiple of replacement length And table() gives a wrong result. table(b) b a b c 3 4 0 If you take a numeric, not explicitly integer vector, you are less lucky. c - c(0,0,1,1,1,2,2,2,2) str(c) num [1:9] 0 0 1 1 1 2 2 2 2 levels(c) - letters[1:3] str(c) atomic [1:9] 0 0 1 1 1 2 2 2 2 - attr(*, levels)= chr [1:3] a b c Assigning class factor is rejected with an error. class(c) - 'factor' Error in class(c) - factor : adding class factor to an invalid object Heinz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] retain class after merge
Dear All, is there a simple way to retain the class attribute of a column, if merging two data.frames? When merging the example data.frames form help(merge) I am unable to keep the class attribute as set before merging (see below). Two columns are assigned new classes before merge (myclass1, myclass2), but after merge the resulting column has class character. best regards, Heinz ## use character columns of names to get sensible sort order authors - data.frame( surname = I(c(Tukey, Venables, Tierney, Ripley, McNeil)), nationality = c(US, Australia, US, UK, Australia), deceased = c(yes, rep(no, 4))) books - data.frame( name = I(c(Tukey, Venables, Tierney, Ripley, Ripley, McNeil, R Core)), title = c(Exploratory Data Analysis, Modern Applied Statistics ..., LISP-STAT, Spatial Statistics, Stochastic Simulation, Interactive Data Analysis, An Introduction to R), other.author = c(NA, Ripley, NA, NA, NA, NA, Venables Smith)) class(authors$surname) - 'myclass1' class(books$name) - 'myclass2' (m1 - merge(authors, books, by.x = surname, by.y = name)) class(m1$surname) [1] character sessionInfo() R version 2.13.1 Patched (2011-08-08 r56671) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C [5] LC_TIME=German_Switzerland.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Feature request: rating/review system for R packages
It's unclear to me, why the rating/review system should relate to entire packages. Would it not be more informative, if single specific functions would be rated and reviewed? I would like to see if + is rated better than -, or if more difficulties are reported for * than for /. I could then consider in the future to prefer sums over differences. best, Heinz At 20.03.2011 19:03 +, Ben Bolker wrote: Dieter Menne dieter.menne at menne-biomed.de writes: After pondering all the pros and cons regarding the usefulness of a rating/review system for R packages, don't you think it would make sense to implement such a thing? Or to look what is there, and how little it is filled: http://crantastic.org/ Dieter If I were feeling a little more ambitious, I would write a contributed popularity contest package (cf. http://lwn.net/Articles/75753/, http://popcon.debian.org/) that did the following: * recorded information on a user's configuration and installed packages and reported it *somewhere* (web server, etc.; R has plenty of communications facilities built in) for more intrusive but complete information: * gave users an option to install a `hook' that would report at some interval (regular? random?) which packages were actually loaded (on Unix-alike machines one might be able to use the 'atime' feature to guess when a package was *last* loaded even if it wasn't currently in use) * gave users an option to contribute further information (country, research field, etc.) * might pop up a window showing installed packages and offering users the option to comment or to give ratings to particularly good or bad packages, which would be sent to wherever ... This would be completely optional, but *if* word got around it could collect a useful (albeit completely statistically unsound) set of information. *If* I were writing this I would (a) be very clear in the package description etc etc what information would be collected and stored, where, and how it would be used; (b) carefully think about the tradeoffs between annoying users and collecting more information; (c) consult with the fine folks running CRANtastic to see if they wanted to somehow integrate it into their infrastructure. The big advantage of this approach is that you don't need to convince anyone from R-core to do anything, you just need to convince users to install your package. Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why do we have to turn factors into characters for various functions?
At 12.12.2010 00:48 +0200, Tal Galili wrote: Hello dear R-help mailing list, My question is *not* about how factors are implemented in R (which is, if I understand correctly, that factors keeps numbers and assign levels to them). My question *is* about why so many functions that work on factors don't treat them as characters by default? Here are two simple examples: Example one turning the characters inside a factor into numeric: x - factor(4:6) as.numeric(x) # output: 1 2 3 as.numeric(as.character(x)) # output: 4 5 6 # isn't this what we wanted? Example two, using strsplit on a factor: x - factor(paste(letters[4:6], 4:6, sep=A)) strsplit(x, A) # will result in an error: # Error in strsplit(x, A) : non-character argument strsplit(as.character(x), A) # will work and split So what is the reason this is the case? Is it that implementing a switch of factors to characters as the default in some of the basic function will cause old code to break? Is it a better design in some other way? I am curious to know the reason for this. In my view the answer can be found implicitly in the language definition. Factors are currently implemented using an integer array to specify the actual levels and a second array of names that are mapped to the integers. Rather unfortunately users often make use of the implementation in order to make some calculations easier. It is the unfortunate use of factors that seems generally accepted, even if the language definition continues: This, however, is an implementation issue and is not guaranteed to hold in all implementations of R. Personally, like some others, I avoid factors, except in cases, where they represent a statistical concept. Certainly I would agree with you that, if only reading the R Language Definition and not the documentation of the function factor, one would rather expect functions like as.numeric or strsplit to operate on the levels of a factor and not on the underlying, implementation specific, integer array. Heinz Thank you for your reading, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why do we have to turn factors into characters for various functions?
Hello Petr, don't want to convince you. If you like the following: x - factor(1:4, labels=c(one, two, three, four)) y - factor(3:5, labels=c(three, four, five)) data.frame(character=c(as.character(x), as.character(y)), numeric=c(x, y)) character numeric 1 one 1 2 two 2 3 three 3 4 four 4 5 three 1 6 four 2 7 five 3 For me the behaviour of character vectors is easier to follow and less errror prone. cx - c(one, two, three, four) cy - c(three, four, five) c(cx, cy) [1] one two three four three four five Anyway it is maybe more about personal habits than about bad factor features I agree with you regarding personal habits. It's not the features of factors. For me it's the rather inconsistent use in functions like c() or print(). If you print a factor, you see it's levels, but if you combine it using c(), you combine the famouse implementation specific underlying integer vector. best regards, Heinz At 13.12.2010 08:50 +0100, Petr PIKAL wrote: Hi r-help-boun...@r-project.org napsal dne 12.12.2010 21:00:37: At 12.12.2010 00:48 +0200, Tal Galili wrote: Hello dear R-help mailing list, My question is *not* about how factors are implemented in R (which is, if I understand correctly, that factors keeps numbers and assign levels to them). My question *is* about why so many functions that work on factors don't treat them as characters by default? Here are two simple examples: Example one turning the characters inside a factor into numeric: x - factor(4:6) as.numeric(x) # output: 1 2 3 as.numeric(as.character(x)) # output: 4 5 6 # isn't this what we wanted? Example two, using strsplit on a factor: x - factor(paste(letters[4:6], 4:6, sep=A)) strsplit(x, A) # will result in an error: # Error in strsplit(x, A) : non-character argument strsplit(as.character(x), A) # will work and split So what is the reason this is the case? Is it that implementing a switch of factors to characters as the default in some of the basic function will cause old code to break? Is it a better design in some other way? I am curious to know the reason for this. In my view the answer can be found implicitly in the language definition. Factors are currently implemented using an integer array to specify the actual levels and a second array of names that are mapped to the integers. Rather unfortunately users often make use of the implementation in order to make some calculations easier. It is the unfortunate use of factors that seems generally accepted, even if the language definition continues: This, however, is an implementation issue and is not guaranteed to hold in all implementations of R. Personally, like some others, I avoid factors, except in cases, where they represent a statistical concept. On contrary I find factors quite useful. Consider possibility to change its levels set.seed(111) x - factor(sample(1:4, 20, replace=T), labels=c(one, two, three, four)) x [1] three three two three two two one three two one three three [13] one one one two one four two three Levels: one two three four levels(x)[3:4] - more x [1] more more two more two two one more two one more more one one one [16] two one more two more Levels: one two more I believe that if x is character, it can be also done but factor way seems to me more convenient. I also use point distinction in plots by pch=as.numeric(some.factor) quite often. Anyway it is maybe more about personal habits than about bad factor features Regards Petr Certainly I would agree with you that, if only reading the R Language Definition and not the documentation of the function factor, one would rather expect functions like as.numeric or strsplit to operate on the levels of a factor and not on the underlying, implementation specific, integer array. Heinz Thank you for your reading, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) --- --- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __
[R] survexp - unable to reproduce example
Dear All, when I try to reproduce an example of survexp, taken from the help page of survdiff, I receive the error message Error in floor(temp) : Non-numeric argument to mathematical function . It seems to come from match.ratetable. I think, it has to do with character variables in a ratetable. I would be interested to know, if it works for others. With an older version of survival, it worked well. best regards, Heinz library(survival) Loading required package: splines ## Example from help page of survdiff ## Expected survival for heart transplant patients based on ## US mortality tables expect - survexp(futime ~ ratetable(age=(accept.dt - birth.dt), + sex=1,year=accept.dt,race=white), jasa, cohort=FALSE, + ratetable=survexp.usr) Error in floor(temp) : Non-numeric argument to mathematical function sessionInfo('survival') R version 2.12.1 Patched (2010-12-18 r53869) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C [5] LC_TIME=German_Switzerland.1252 attached base packages: character(0) other attached packages: [1] survival_2.36-2 loaded via a namespace (and not attached): [1] base_2.12.1 graphics_2.12.1 grDevices_2.12.1 methods_2.12.1 [5] splines_2.12.1 stats_2.12.1 tools_2.12.1 utils_2.12.1 traceback() 2: match.ratetable(rdata, ratetable) 1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1, year = accept.dt, race = white), jasa, cohort = FALSE, ratetable = survexp.usr) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survexp - example produces error
Dear All, reposting, because I did not find a solution, maybe someone could check the example below. It's taken from the help page of survdiff. Executing it, gives the error Error in floor(temp) : Non-numeric argument to mathematical function best regards, Heinz library(survival) ## Example from help page of survdiff ## Expected survival for heart transplant patients based on ## US mortality tables expect - survexp(futime ~ ratetable(age=(accept.dt - birth.dt), sex=1,year=accept.dt,race=white), jasa, cohort=FALSE, ratetable=survexp.usr) Error in floor(temp) : Non-numeric argument to mathematical function sessionInfo('survival') R version 2.12.1 Patched (2010-12-18 r53869) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C [5] LC_TIME=German_Switzerland.1252 attached base packages: character(0) other attached packages: [1] survival_2.36-2 loaded via a namespace (and not attached): [1] base_2.12.1 graphics_2.12.1 grDevices_2.12.1 methods_2.12.1 [5] splines_2.12.1 stats_2.12.1 tools_2.12.1 utils_2.12.1 traceback() 2: match.ratetable(rdata, ratetable) 1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1, year = accept.dt, race = white), jasa, cohort = FALSE, ratetable = survexp.usr) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survexp - example produces error
Thank you, Peter after setting options(error=recover), see the output below, once for frame number 2, which I suspect to be the problem, once for frame number 1. Heinz expect - + survexp(futime ~ ratetable(age=(accept.dt - birth.dt), + sex=1,year=accept.dt,race=white), + jasa, cohort=FALSE, + ratetable=survexp.usr) Error in floor(temp) : Non-numeric argument to mathematical function Enter a frame number, or 0 to exit 1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1, year = ac 2: match.ratetable(rdata, ratetable) Selection: 2 Called from: top level Browse[1] temp [1] white white white white white white white white white [10] white white white white white white white white white [19] white white white white white white white white white [28] white white white white white white white white white [37] white white white white white white white white white [46] white white white white white white white white white [55] white white white white white white white white white [64] white white white white white white white white white [73] white white white white white white white white white [82] white white white white white white white white white [91] white white white white white white white white white [100] white white white white Browse[1] Q There is also 'temp' in frame number 1. expect - + survexp(futime ~ ratetable(age=(accept.dt - birth.dt), + sex=1,year=accept.dt,race=white), + jasa, cohort=FALSE, + ratetable=survexp.usr) Error in floor(temp) : Non-numeric argument to mathematical function Enter a frame number, or 0 to exit 1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1, year = ac 2: match.ratetable(rdata, ratetable) Selection: 1 Called from: top level Browse[1] temp [1] 495 15 38 172 674 39 84 57 1527 80 13860 [16] 307 35 42 36 27 1031 50 732 218 1799 1400 262 71 34 851 [31] 76 1586 1571 11 99 654 52 1407 13211 44 9958 1141 [46] 979 284 101 187 60 941 148 342 915 67 68 841 583 77 31 [61] 669 29 619 595 89 16 544 20 514 95 481 444 427 79 333 [76] 396 109 369 206 185 339 264 164 179 130 108 30 10 Browse[1] Q At 31.12.2010 13:46 +0100, peter dalgaard wrote: On Dec 31, 2010, at 10:21 , Heinz Tuechler wrote: Dear All, reposting, because I did not find a solution, maybe someone could check the example below. It's taken from the help page of survdiff. Executing it, gives the error Error in floor(temp) : Non-numeric argument to mathematical function Hmm, it's not happening to me (Mac OSX) either with 2.12.1 or the current R-patched (r53892). Could be a platform issue (sounds unlikely), a local user issue, or a locale one. Could you set options(error=recover) and find out what is the value of temp when the error occurs? best regards, Heinz library(survival) ## Example from help page of survdiff ## Expected survival for heart transplant patients based on ## US mortality tables expect - survexp(futime ~ ratetable(age=(accept.dt - birth.dt), sex=1,year=accept.dt,race=white), jasa, cohort=FALSE, ratetable=survexp.usr) Error in floor(temp) : Non-numeric argument to mathematical function sessionInfo('survival') R version 2.12.1 Patched (2010-12-18 r53869) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C [5] LC_TIME=German_Switzerland.1252 attached base packages: character(0) other attached packages: [1] survival_2.36-2 loaded via a namespace (and not attached): [1] base_2.12.1 graphics_2.12.1 grDevices_2.12.1 methods_2.12.1 [5] splines_2.12.1 stats_2.12.1 tools_2.12.1 utils_2.12.1 traceback() 2: match.ratetable(rdata, ratetable) 1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1, year = accept.dt, race = white), jasa, cohort = FALSE, ratetable = survexp.usr) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survexp - example produces error
Dear Peter, Dear All, a further attempt led me to an answer. If I set options(stringsAsFactors=TRUE), which I usually have set to FALSE, no error occurs. I am, however not happy with this solution. Heinz Thank you, Peter after setting options(error=recover), see the output below, once for frame number 2, which I suspect to be the problem, once for frame number 1. Heinz expect - + survexp(futime ~ ratetable(age=(accept.dt - birth.dt), + sex=1,year=accept.dt,race=white), + jasa, cohort=FALSE, + ratetable=survexp.usr) Error in floor(temp) : Non-numeric argument to mathematical function Enter a frame number, or 0 to exit 1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1, year = ac 2: match.ratetable(rdata, ratetable) Selection: 2 Called from: top level Browse[1] temp [1] white white white white white white white white white [10] white white white white white white white white white [19] white white white white white white white white white [28] white white white white white white white white white [37] white white white white white white white white white [46] white white white white white white white white white [55] white white white white white white white white white [64] white white white white white white white white white [73] white white white white white white white white white [82] white white white white white white white white white [91] white white white white white white white white white [100] white white white white Browse[1] Q There is also 'temp' in frame number 1. expect - + survexp(futime ~ ratetable(age=(accept.dt - birth.dt), + sex=1,year=accept.dt,race=white), + jasa, cohort=FALSE, + ratetable=survexp.usr) Error in floor(temp) : Non-numeric argument to mathematical function Enter a frame number, or 0 to exit 1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1, year = ac 2: match.ratetable(rdata, ratetable) Selection: 1 Called from: top level Browse[1] temp [1] 495 15 38 172 674 39 84 57 1527 80 13860 [16] 307 35 42 36 27 1031 50 732 218 1799 1400 262 71 34 851 [31] 76 1586 1571 11 99 654 52 1407 13211 44 9958 1141 [46] 979 284 101 187 60 941 148 342 915 67 68 841 583 77 31 [61] 669 29 619 595 89 16 544 20 514 95 481 444 427 79 333 [76] 396 109 369 206 185 339 264 164 179 130 108 30 10 Browse[1] Q At 31.12.2010 13:46 +0100, peter dalgaard wrote: On Dec 31, 2010, at 10:21 , Heinz Tuechler wrote: Dear All, reposting, because I did not find a solution, maybe someone could check the example below. It's taken from the help page of survdiff. Executing it, gives the error Error in floor(temp) : Non-numeric argument to mathematical function Hmm, it's not happening to me (Mac OSX) either with 2.12.1 or the current R-patched (r53892). Could be a platform issue (sounds unlikely), a local user issue, or a locale one. Could you set options(error=recover) and find out what is the value of temp when the error occurs? best regards, Heinz library(survival) ## Example from help page of survdiff ## Expected survival for heart transplant patients based on ## US mortality tables expect - survexp(futime ~ ratetable(age=(accept.dt - birth.dt), sex=1,year=accept.dt,race=white), jasa, cohort=FALSE, ratetable=survexp.usr) Error in floor(temp) : Non-numeric argument to mathematical function sessionInfo('survival') R version 2.12.1 Patched (2010-12-18 r53869) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C [5] LC_TIME=German_Switzerland.1252 attached base packages: character(0) other attached packages: [1] survival_2.36-2 loaded via a namespace (and not attached): [1] base_2.12.1 graphics_2.12.1 grDevices_2.12.1 methods_2.12.1 [5] splines_2.12.1 stats_2.12.1 tools_2.12.1 utils_2.12.1 traceback() 2: match.ratetable(rdata, ratetable) 1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1, year = accept.dt, race = white), jasa, cohort = FALSE, ratetable = survexp.usr) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r
Re: [R] survexp - example produces error
Follow up: The critical line seems to be in survexp around line 97 rdata - data.frame(eval(rcall, m)) If changed to: old.stringsAsFactors - options()$stringsAsFactors options(stringsAsFactors=TRUE) rdata - data.frame(eval(rcall, m)) ### - seems to be critical options(stringsAsFactors=old.stringsAsFactors) it seems to work. Heinz At 31.12.2010 15:53 +0100, Heinz Tuechler wrote: Dear Peter, Dear All, a further attempt led me to an answer. If I set options(stringsAsFactors=TRUE), which I usually have set to FALSE, no error occurs. I am, however not happy with this solution. Heinz Thank you, Peter after setting options(error=recover), see the output below, once for frame number 2, which I suspect to be the problem, once for frame number 1. Heinz expect - + survexp(futime ~ ratetable(age=(accept.dt - birth.dt), + sex=1,year=accept.dt,race=white), + jasa, cohort=FALSE, + ratetable=survexp.usr) Error in floor(temp) : Non-numeric argument to mathematical function Enter a frame number, or 0 to exit 1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1, year = ac 2: match.ratetable(rdata, ratetable) Selection: 2 Called from: top level Browse[1] temp [1] white white white white white white white white white [10] white white white white white white white white white [19] white white white white white white white white white [28] white white white white white white white white white [37] white white white white white white white white white [46] white white white white white white white white white [55] white white white white white white white white white [64] white white white white white white white white white [73] white white white white white white white white white [82] white white white white white white white white white [91] white white white white white white white white white [100] white white white white Browse[1] Q There is also 'temp' in frame number 1. expect - + survexp(futime ~ ratetable(age=(accept.dt - birth.dt), + sex=1,year=accept.dt,race=white), + jasa, cohort=FALSE, + ratetable=survexp.usr) Error in floor(temp) : Non-numeric argument to mathematical function Enter a frame number, or 0 to exit 1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1, year = ac 2: match.ratetable(rdata, ratetable) Selection: 1 Called from: top level Browse[1] temp [1] 495 15 38 172 674 39 84 57 1527 80 13860 [16] 307 35 42 36 27 1031 50 732 218 1799 1400 262 71 34 851 [31] 76 1586 1571 11 99 654 52 1407 13211 44 9958 1141 [46] 979 284 101 187 60 941 148 342 915 67 68 841 583 77 31 [61] 669 29 619 595 89 16 544 20 514 95 481 444 427 79 333 [76] 396 109 369 206 185 339 264 164 179 130 108 30 10 Browse[1] Q At 31.12.2010 13:46 +0100, peter dalgaard wrote: On Dec 31, 2010, at 10:21 , Heinz Tuechler wrote: Dear All, reposting, because I did not find a solution, maybe someone could check the example below. It's taken from the help page of survdiff. Executing it, gives the error Error in floor(temp) : Non-numeric argument to mathematical function Hmm, it's not happening to me (Mac OSX) either with 2.12.1 or the current R-patched (r53892). Could be a platform issue (sounds unlikely), a local user issue, or a locale one. Could you set options(error=recover) and find out what is the value of temp when the error occurs? best regards, Heinz library(survival) ## Example from help page of survdiff ## Expected survival for heart transplant patients based on ## US mortality tables expect - survexp(futime ~ ratetable(age=(accept.dt - birth.dt), sex=1,year=accept.dt,race=white), jasa, cohort=FALSE, ratetable=survexp.usr) Error in floor(temp) : Non-numeric argument to mathematical function sessionInfo('survival') R version 2.12.1 Patched (2010-12-18 r53869) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C [5] LC_TIME=German_Switzerland.1252 attached base packages: character(0) other attached packages: [1] survival_2.36-2 loaded via a namespace (and not attached): [1] base_2.12.1 graphics_2.12.1 grDevices_2.12.1 methods_2.12.1 [5] splines_2.12.1 stats_2.12.1 tools_2.12.1 utils_2.12.1 traceback() 2: match.ratetable(rdata, ratetable) 1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1, year = accept.dt, race = white), jasa, cohort = FALSE, ratetable = survexp.usr) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org
Re: [R] CSV value not being read as it appears
At 14.01.2011 07:09 -0800, Peter Ehlers wrote: On 2011-01-14 02:09, bgr...@dyson.brisnet.org.au wrote: Brian, Thanks. My response to David follows. I should add that this problem has never occurred previously as far as I know (I have now checked the previous report I was sent): This problem occurs to me frequently. Like Philipp and David, I too always check imported categorical variables. The worst cases are trailing spaces (in quoted text). These are still the best worst cases. My favourite worst cases are entries like 5-10 or similar that are trasformed into dates, e.g. 05Oct2011. My problem is, however that I don't know any other universally known format to exchange data with a medical colleague or with a social scientist. Heinz It is hardly R's fault that Excel users routinely commit crimes against data. Peter Ehlers Hello David, Thanks for your e-mail. The data was a report derived from a statewide database, saved in EXCEL format, so the usual issue of the vagaries of human data entry variation wasn't the issue as the data was an automated report, which is run every three months. I would not have even noticed this problem if I hadn't been double checking the numbers of people by district. Visual inspection didn't reveal this problem - no white space was obvious and the spelling was identical. Tabulation via R wouldn't have detected this - I was obtaining the EXCEL totals via filter which I then compared with R output. I'm hoping I can skip this step, in future, with Jim's suggestion. regards Bob On Fri, 14 Jan 2011, David Scott wrote: As a further note, this is a reminder that whenever you get data via a spreadsheet the first thing to do is examine it and clean up any problems. A basic requirement is to tabulate any categorical variable. Spreadsheets allow any sort of data to be entered, with no controls. My experience is that those who enter data into spreadsheets enter all sorts of variations of what a human would wish to treat as the same (Open, Open , open, etc.), even when told not to. Another common problem is that they enter characters such as non-breaking space or zero-width characters: we added support for known encodings of NBSP to strip.white about five years ago. David Scott On 14/01/2011 4:03 p.m., Jim Holtman wrote: try strip.white=TRUE to strip out white space Sent from my iPad On Jan 13, 2011, at 21:44, bgr...@dyson.brisnet.org.au wrote: I have a frustrating issue which I am hoping someone may have a suggestion about. I am running XP and R 2.12.0 and saved an EXCEL file that I was sent as a csv file. The initial code I ran follows. dec- read.csv(g://FMH/FO30122010.csv,header=T) dec.open- subset (dec, Status == Open) table(dec.open$AMHS) I was checking the output and noticed a difference between my manual count and R output. Two subject's rows were not being detected by the subset command: For the AMHS where there was a discrepancy I then ran: wm- subset (dec, AMHS == WM) The problem appears to be that there is a space before the 'Open value for two indivduals, as per the example below. 10/02/2010 Open 22/08/2007 Open Checking in EXCEL there does not appear to be a space and the format is the same (e.g 'general'). I resolved the problem by copying over the values for the two individuals where I identified a problem. Given this problem was not detected by visual scanning I would appreciate advice on how this problem can be detected in future without my having to manually check raw data against R output. Any assistance is appreciated, Bob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road,
[R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
Dear All, is it known that source works much faster in R 2.15.2 than in R 3.0.2 ? In the example below I observe e.g. for a data.frame with 10^7 rows the following timings: R version 2.15.2 Patched (2012-11-29 r61184) length: 1e+07 user system elapsed 62.040.22 62.26 R version 3.0.2 Patched (2013-10-27 r64116) length: 1e+07 user system elapsed 388.63 176.42 566.41 Is there a way to speed R version 3.0.2 up to the performance of R version 2.15.2? best regards, Heinz Tüchler example: sessionInfo() sample.vec - c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the', 'named', 'file', 'or', 'URL', 'or', 'connection') dmp.size - c(10^(1:7)) set.seed(37) for(i in dmp.size) { df0 - data.frame(x=sample(sample.vec, i, replace=TRUE)) dump('df0', file='testdump') cat('length:', i, '\n') print(system.time(source('testdump', keep.source = FALSE, encoding=''))) } output for R version 2.15.2 Patched (2012-11-29 r61184): sessionInfo() R version 2.15.2 Patched (2012-11-29 r61184) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C [5] LC_TIME=German_Switzerland.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base sample.vec - + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the', + 'named', 'file', 'or', 'URL', 'or', 'connection') dmp.size - c(10^(1:7)) set.seed(37) for(i in dmp.size) { + df0 - data.frame(x=sample(sample.vec, i, replace=TRUE)) + dump('df0', file='testdump') + cat('length:', i, '\n') + print(system.time(source('testdump', keep.source = FALSE, +encoding=''))) + } length: 10 user system elapsed 0 0 0 length: 100 user system elapsed 0 0 0 length: 1000 user system elapsed 0 0 0 length: 1 user system elapsed 0.020.000.01 length: 1e+05 user system elapsed 0.210.000.20 length: 1e+06 user system elapsed 4.470.044.51 length: 1e+07 user system elapsed 62.040.22 62.26 output for R version 3.0.2 Patched (2013-10-27 r64116): sessionInfo() R version 3.0.2 Patched (2013-10-27 r64116) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C [5] LC_TIME=German_Switzerland.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base sample.vec - + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the', + 'named', 'file', 'or', 'URL', 'or', 'connection') dmp.size - c(10^(1:7)) set.seed(37) for(i in dmp.size) { + df0 - data.frame(x=sample(sample.vec, i, replace=TRUE)) + dump('df0', file='testdump') + cat('length:', i, '\n') + print(system.time(source('testdump', keep.source = FALSE, +encoding=''))) + } length: 10 user system elapsed 0 0 0 length: 100 user system elapsed 0 0 0 length: 1000 user system elapsed 0 0 0 length: 1 user system elapsed 0.010.000.01 length: 1e+05 user system elapsed 0.360.060.42 length: 1e+06 user system elapsed 6.021.867.88 length: 1e+07 user system elapsed 388.63 176.42 566.41 -- Heinz Tüchler +4317146261 / +436605653878 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
All was run on the identical machine in independent sessions. I did not restart Windows. I also tried 32bit R 3.0.2 and it seemed slightly faster than 64bit. Using Process Explorer v15.23 (http://technet.microsoft.com/de-de/sysinternals/bb896653) my impression was that R 3.0.2 manages memory in a different way than R 2.15.2. While in R 2.15.2 the physical memory used grows steadily, when sourcing a big file, in R 3.0.2 growth and shrinking cycle. best, Heinz on/am 30.10.2013 13:28, Carl Witthoft wrote/hat geschrieben: Did you run the identical code on the identical machine, and did you verify there were no other tasks running which might have limited the RAM available to R? And equally important, did you run these tests in the reverse order (in case R was storing large objects from the first run, thus chewing up RAM)? Dear All, is it known that source works much faster in R 2.15.2 than in R 3.0.2 ? In the example below I observe e.g. for a data.frame with 10^7 rows the following timings: R version 2.15.2 Patched (2012-11-29 r61184) length: 1e+07 user system elapsed 62.040.22 62.26 R version 3.0.2 Patched (2013-10-27 r64116) length: 1e+07 user system elapsed 388.63 176.42 566.41 Is there a way to speed R version 3.0.2 up to the performance of R version 2.15.2? best regards, Heinz Tüchler example: sessionInfo() sample.vec - c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the', 'named', 'file', 'or', 'URL', 'or', 'connection') dmp.size - c(10^(1:7)) set.seed(37) for(i in dmp.size) { df0 - data.frame(x=sample(sample.vec, i, replace=TRUE)) dump('df0', file='testdump') cat('length:', i, '\n') print(system.time(source('testdump', keep.source = FALSE, encoding=''))) } output for R version 2.15.2 Patched (2012-11-29 r61184): sessionInfo() R version 2.15.2 Patched (2012-11-29 r61184) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C [5] LC_TIME=German_Switzerland.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base sample.vec - + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the', + 'named', 'file', 'or', 'URL', 'or', 'connection') dmp.size - c(10^(1:7)) set.seed(37) for(i in dmp.size) { + df0 - data.frame(x=sample(sample.vec, i, replace=TRUE)) + dump('df0', file='testdump') + cat('length:', i, '\n') + print(system.time(source('testdump', keep.source = FALSE, +encoding=''))) + } length: 10 user system elapsed 0 0 0 length: 100 user system elapsed 0 0 0 length: 1000 user system elapsed 0 0 0 length: 1 user system elapsed 0.020.000.01 length: 1e+05 user system elapsed 0.210.000.20 length: 1e+06 user system elapsed 4.470.044.51 length: 1e+07 user system elapsed 62.040.22 62.26 output for R version 3.0.2 Patched (2013-10-27 r64116): sessionInfo() R version 3.0.2 Patched (2013-10-27 r64116) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C [5] LC_TIME=German_Switzerland.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base sample.vec - + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the', + 'named', 'file', 'or', 'URL', 'or', 'connection') dmp.size - c(10^(1:7)) set.seed(37) for(i in dmp.size) { + df0 - data.frame(x=sample(sample.vec, i, replace=TRUE)) + dump('df0', file='testdump') + cat('length:', i, '\n') + print(system.time(source('testdump', keep.source = FALSE, +encoding=''))) + } length: 10 user system elapsed 0 0 0 length: 100 user system elapsed 0 0 0 length: 1000 user system elapsed 0 0 0 length: 1 user system elapsed 0.010.000.01 length: 1e+05 user system elapsed 0.360.060.42 length: 1e+06 user system elapsed 6.021.867.88 length: 1e+07 user system elapsed 388.63 176.42 566.41 -- View this message in context: http://r.789695.n4.nabble.com/big-speed-difference-in-source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __
Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
Best thanks for confirming my impression. I use dump for storing large data.frames with a number of attributes for each variable. save/load is much faster, but I am unsure, if such files will be readable by R versions years later. What format/functions would you suggest for data storage/transfer between different (future) R versions? best regards, Heinz on/am 30.10.2013 20:11, William Dunlap wrote/hat geschrieben: I see a big 2.15.2/3.0.2 speed difference in parse() (which is used by source()) when it is parsing long vectors of numeric data. dump/source has never been an efficient way of transferring data between different R session, but it is much worse now for long vectors. In 2.15.2 doubling the size of the vector (of lengths in the range 10^4 to 10^7) makes the time to parse go up by a factor of c. 2.1. In 3.0.2 that factor is more like 4.4. n elapsed-2.15.2 elapsed-3.0.2 2048 0.003 0.018 4096 0.006 0.065 8192 0.013 0.254 16384 0.025 1.067 32768 0.050 4.114 65536 0.10016.236 131072 0.21966.013 262144 0.808 291.883 524288 2.022 1285.265 1048576 4.918NA 2097152 9.857NA 4194304 22.916NA 8388608 49.671NA 16777216101.042NA 33554432512.719NA I tried this with 64-bit R on a Linux box. The NA's represent sizes that did not finish while I was at a 1 1/2 hour dentist's apppointment. The timing function was: test - function(n = 2^(11:25)) { tf - tempfile() on.exit(unlink(tf)) t(sapply(n, function(n){ dput(log(seq_len(n)), file=tf) print(c(n=n, system.time(parse(file=tf))[1:3])) })) } Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Carl Witthoft Sent: Wednesday, October 30, 2013 5:29 AM To: r-help@r-project.org Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ? Did you run the identical code on the identical machine, and did you verify there were no other tasks running which might have limited the RAM available to R? And equally important, did you run these tests in the reverse order (in case R was storing large objects from the first run, thus chewing up RAM)? Dear All, is it known that source works much faster in R 2.15.2 than in R 3.0.2 ? In the example below I observe e.g. for a data.frame with 10^7 rows the following timings: R version 2.15.2 Patched (2012-11-29 r61184) length: 1e+07 user system elapsed 62.040.22 62.26 R version 3.0.2 Patched (2013-10-27 r64116) length: 1e+07 user system elapsed 388.63 176.42 566.41 Is there a way to speed R version 3.0.2 up to the performance of R version 2.15.2? best regards, Heinz Tüchler example: sessionInfo() sample.vec - c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the', 'named', 'file', 'or', 'URL', 'or', 'connection') dmp.size - c(10^(1:7)) set.seed(37) for(i in dmp.size) { df0 - data.frame(x=sample(sample.vec, i, replace=TRUE)) dump('df0', file='testdump') cat('length:', i, '\n') print(system.time(source('testdump', keep.source = FALSE, encoding=''))) } output for R version 2.15.2 Patched (2012-11-29 r61184): sessionInfo() R version 2.15.2 Patched (2012-11-29 r61184) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C [5] LC_TIME=German_Switzerland.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base sample.vec - + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the', + 'named', 'file', 'or', 'URL', 'or', 'connection') dmp.size - c(10^(1:7)) set.seed(37) for(i in dmp.size) { + df0 - data.frame(x=sample(sample.vec, i, replace=TRUE)) + dump('df0', file='testdump') + cat('length:', i, '\n') + print(system.time(source('testdump', keep.source = FALSE, +encoding=''))) + } length: 10 user system elapsed 0 0 0 length: 100 user system elapsed 0 0 0 length: 1000 user system elapsed 0 0 0 length: 1 user system elapsed 0.020.000.01 length: 1e+05 user system elapsed 0.210.000.20 length: 1e+06 user system elapsed 4.470.044.51 length: 1e+07 user system elapsed 62.040.22 62.26 output for R version 3.0.2 Patched (2013-10-27 r64116): sessionInfo() R version 3.0.2 Patched (2013-10-27 r64116) Platform: x86_64-w64-mingw32/x64 (64-bit)
Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
on/am 31.10.2013 09:12, Prof Brian Ripley wrote/hat geschrieben: On 30/10/2013 21:15, William Dunlap wrote: I have to defer to others for policy declarations like how long the current format used by load and save should be readable. You could also ask how long R will last R can still read (but not write) save() formats used in the 1990's. We would expect R to be able to read saves since R 1.0.0 for as long as R exists. And as R is Open Source, you would be able to compile it and dump the objects you want for as long as suitable compilers and OSes exist And of course R is not the only application which will read the format. There is no guarantee that source() will be able to parse dumps from earlier versions of R, and that has not always been true. People commenting on parse() speed should note the NEWS for R-devel: • The parser has been modified to use less memory. Thank you for the hint. It appears to me that source() in R-devel performs at about the same speed as in R 2.15.2. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: Heinz Tuechler [mailto:tuech...@gmx.at] Sent: Wednesday, October 30, 2013 1:43 PM To: William Dunlap Cc: Carl Witthoft; r-help@r-project.org Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ? Best thanks for confirming my impression. I use dump for storing large data.frames with a number of attributes for each variable. save/load is much faster, but I am unsure, if such files will be readable by R versions years later. What format/functions would you suggest for data storage/transfer between different (future) R versions? best regards, Heinz on/am 30.10.2013 20:11, William Dunlap wrote/hat geschrieben: I see a big 2.15.2/3.0.2 speed difference in parse() (which is used by source()) when it is parsing long vectors of numeric data. dump/source has never been an efficient way of transferring data between different R session, but it is much worse now for long vectors. In 2.15.2 doubling the size of the vector (of lengths in the range 10^4 to 10^7) makes the time to parse go up by a factor of c. 2.1. In 3.0.2 that factor is more like 4.4. n elapsed-2.15.2 elapsed-3.0.2 2048 0.003 0.018 4096 0.006 0.065 8192 0.013 0.254 16384 0.025 1.067 32768 0.050 4.114 65536 0.10016.236 131072 0.21966.013 262144 0.808 291.883 524288 2.022 1285.265 1048576 4.918NA 2097152 9.857NA 4194304 22.916NA 8388608 49.671NA 16777216101.042NA 33554432512.719NA I tried this with 64-bit R on a Linux box. The NA's represent sizes that did not finish while I was at a 1 1/2 hour dentist's apppointment. The timing function was: test - function(n = 2^(11:25)) { tf - tempfile() on.exit(unlink(tf)) t(sapply(n, function(n){ dput(log(seq_len(n)), file=tf) print(c(n=n, system.time(parse(file=tf))[1:3])) })) } Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Carl Witthoft Sent: Wednesday, October 30, 2013 5:29 AM To: r-help@r-project.org Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ? Did you run the identical code on the identical machine, and did you verify there were no other tasks running which might have limited the RAM available to R? And equally important, did you run these tests in the reverse order (in case R was storing large objects from the first run, thus chewing up RAM)? Dear All, is it known that source works much faster in R 2.15.2 than in R 3.0.2 ? In the example below I observe e.g. for a data.frame with 10^7 rows the following timings: R version 2.15.2 Patched (2012-11-29 r61184) length: 1e+07 user system elapsed 62.040.22 62.26 R version 3.0.2 Patched (2013-10-27 r64116) length: 1e+07 user system elapsed 388.63 176.42 566.41 Is there a way to speed R version 3.0.2 up to the performance of R version 2.15.2? best regards, Heinz Tüchler example: sessionInfo() sample.vec - c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the', 'named', 'file', 'or', 'URL', 'or', 'connection') dmp.size - c(10^(1:7)) set.seed(37) for(i in dmp.size) { df0 - data.frame(x=sample(sample.vec, i, replace=TRUE)) dump('df0', file='testdump') cat('length:', i, '\n') print(system.time(source('testdump', keep.source = FALSE, encoding=''))) } output for R version 2.15.2 Patched (2012-11-29 r61184): sessionInfo() R version 2.15.2 Patched (2012-11-29 r61184
Re: [R] Comparing Cox model with Competing Risk model
Dear Terry, as soon as the vignette is ready, I would be very happy, to know about it. Will you send a note to r-help, or will it be announced in some other way? best regards, Heinz On 08.03.2013 15:12, Terry Therneau wrote: -- begin included message -- I have a competing risk data where a patient may die from either AIDS or Cancer. I want to compare the cox model for each of the event of interest with a competing risk model. In the competing risk model the cumulative incidence function is used directly. -end inclusion --- If you do want to persue the Fine-Gray model I would suggest using software that already exists. Find the Task Views tab on CRAN, and follow it to survival and then look at the competing risks section. There is a lot to offer. I would trust it more than rolling your own function. As an aside, modeling the subdistribution function is ONE way of dealing with competing risks, but not everyone thinks that it is the best way to proceed. The model corresponds to a biology that I find unlikely, though it makes for nice math. Since the alternative is discussed in a vignette that I haven't-yet-quite-written we won't persue that any further, however. :-) Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] correlation between categorical data
comment inline David Winsemius wrote on 24.01.2015 21:08: On Jan 23, 2015, at 5:54 PM, JohnDee wrote: Heinz Tuechler wrote At 07:40 21.06.2009, J Dougherty wrote: [...] There are other ways of regarding the FET. Since it is precisely what it says - an exact test - you can argue that you should avoid carrying over any conclusions drawn about the small population the test was applied to and employing them in a broader context. In so far as the test is concerned, the sample data and the contingency table it is arrayed in are the entire universe. In that sense, the FET can't be conservative or liberal. It isn't actually a hypothesis test and should not be thought of as one or used in the place of one. JDougherty Could you give some reference, supporting this, for me, surprising view? I don't see a necessary connection between an exact test and the idea that it does not test a hypothesis. Thanks, Heinz Fisher's Exact Test is a nonparametric test. It tests the distribution in the contingency table against the total possible arrangements and gives you the precise likelihood of that many items being arranged in that manner. That's not the way I understand the construction of the result. The statistic gives rather the ratio of the number of permutations as extreme or more extreme (as measured by the odds ratio) while holding the marginals constant which is then divided by the total number of possible permutations of the data. No more and no less. You could argue about the greater population from which your sample is drawn, but FET makes no assumptions at all about any greater sample universe. It is conditional on the margins, so that is the description of the universe. Also, since the population being used in FET is strictly limited to the members of the contingency table, the results are a subset of a finite group of possible results that are relevant to that specific arrangement of data. You are not estimating parameters of a parent population or making any assumptions about the parent distribution. You can designate a p value such as 0.05 as a level of significance, but there is no error term in the FET result. Fisher stated that the test DOES assume a null hypothesis of independence to a hypergeometric distribution of the cell members. But that creates other issues if you are attempting to use the results in conjunction with assumptions about a broader sample universe than that in the test. For instance you have to carry the assumption of a hypergeometric distribution over in to the land of reality your sample is drawn from and you then have to justify that. In this respect I agree. A real world situation with a universe of fixed margins seems unusual to me. And this is off-topic on Rhelp . Sorry for asking a question off-topic more than five years ago. A nice surprise to get an answer. Thanks, Heinz -- View this message in context: http://r.789695.n4.nabble.com/correlation-between-categorical-data-tp888975p4702235.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Probable Error in fmsb package
Anindya Sankar Dey wrote/hat geschrieben on/am 30.12.2015 07:35: Hi All, The fmsb package has a function called Variance Inflation Factor and it states the definition of the function as follows:- "To evaluate multicolinearity of multiple regression model, calculating the variance inflation factor (VIF) from the result of lm(). If VIF is more than 10, multicolinearity is strongly suggested. " The function computes VIF of a model as 1/(1-R^2) where R^2 is the coefficient of determination. Now nowhere in literature I have come across this definition of VIF, as VIF is always computed at individual variable level. Though the structure is almost the same, R^2 in theoretical VIF is the partial correlation coefficient. I only came aware when lots of freshers from non statistics background I interviewed for analytics position answered that the only definition of VIF they know is 1/(1 - Coeff. of Determination), and there is a R package which calculates VIF like that. After researched I found that such a function indeed exist in fmsb package. Please help me understand has an alternate definition of Variance Inflation Factor has ever emerged in theory? Does it really make sense to have VIF at a model level, as it does not help in solving the problem of multicollinearity during model building. And if I am right, what steps I should do about it. Dear Anindya, to me it seems clear from the example on the help page that VIF() is not intended to be applied to the model of interest, but to separate models for each covariable. The model of interest in the example is # the target multiple regression model res <- lm(Ozone ~ Wind+Temp+Solar.R, data=airquality) The VIF is calculated on submodels for each covariate. # checking multicolinearity for independent variables. VIF(lm(Wind ~ Temp+Solar.R, data=airquality)) VIF(lm(Temp ~ Wind+Solar.R, data=airquality)) VIF(lm(Solar.R ~ Wind+Temp, data=airquality)) Does that agree with your usual definition of a variance inflation factor? best regards, Heinz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] p values from GLM
Bert Gunter wrote on 01.04.2016 23:46: ... of course, whether one **should** get them is questionable... http://www.nature.com/news/statisticians-issue-warning-over-misuse-of-p-values-1.19503#/ref-link-1 This paper repeats the common place statement that a small p-value does not necessarily indicate an important finding. Agreed, but maybe I overlooked examples of important findings with large p-values. If there are some, I would be happy to get to know some of them. Otherwise a small p-value is no guarantee of importance, but a prerequisite. best regards, Heinz Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Apr 1, 2016 at 3:26 PM, Duncan Murdochwrote: On 01/04/2016 6:14 PM, John Sorkin wrote: How can I get the p values from a glm ? I want to get the p values so I can add them to a custom report fitwean<- glm(data[,"JWean"]~data[,"Group"],data=data,family=binomial(link ="logit")) summary(fitwean) # This lists the coefficeints, SEs, z and p values, but I can't isolate the pvalues. names(summary(fitwean)) # I see the coefficients, but not the p values names(fitmens) # p values are not found here. Doesn't summary(fitwean) give a matrix? Then it's colnames(summary(fitwean)$coefficients) you want, not names(fitwean). Duncan Murdoch P.S. If you had given a reproducible example, I'd try it myself. Thank you! John John David Sorkin M.D., Ph.D. Professor of Medicine Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] post_processor in rmarkdown not working
Are you sure that you want to read in the output_file in text <- readLines(output_file, warn = FALSE)? best regards, Heinz Thierry Onkelinx wrote/hat geschrieben on/am 06.09.2017 11:41: Dear all, I'm trying to write a post_processor() for a custom rmarkdown format. The goal of the post_processor() is to modify the latex file before it is compiled. For some reason the post_processor() is not run. The post_processor() does work when I run it manually on the tex file. Any suggestions on what I'm doing wrong? Below is the relevant snippet of the code. The full code is available at https://github.com/inbo/INBOmd/blob/post_processor/R/rsos_article.R https://github.com/inbo/INBOmd/blob/post_processor/inst/rmarkdown/templates/rsos_article/skeleton/skeleton.Rmd is an Rmd is a MWE that fails compile because the post_processor() is not run. Best regards, Thierry post_processor <- function( metadata, input_file, output_file, clean, verbose ) { text <- readLines(output_file, warn = FALSE) # set correct text in fmtext environment end_first_page <- grep("EndFirstPage", text) #nolint if (length(end_first_page) == 1) { maketitle <- grep("maketitle", text) #nolint text <- c( text[1:(maketitle - 1)], "\\begin{fmtext}", text[(maketitle + 1):(end_first_page - 1)], "\\end{fmtext}", "\\maketitle", text[(end_first_page + 1):length(text)] ) writeLines(enc2utf8(text), output_file, useBytes = TRUE) } output_file } output_format( knitr = knitr_options( opts_knit = list( width = 60, concordance = TRUE ), opts_chunk = opts_chunk, knit_hooks = knit_hooks ), pandoc = pandoc_options( to = "latex", latex_engine = "xelatex", args = args, keep_tex = keep_tex ), post_processor = post_processor, clean_supporting = !keep_tex ) ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Heinz Tüchler +436605653878 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] boot.stepAIC fails with computed formula
It seems that if you build the formula as a character string, and postpone the "as.formula" into the lm call, it works. instead of frm1 <- as.formula(paste(trg,"~1")) use frm1a <- paste(trg,"~1") and then strt <- lm(as.formula(frm1a),dat) regards, Heinz Stephen O'hagan wrote/hat geschrieben on/am 23.08.2017 12:07: Until I get a fix that works, a work-around would be to rename the 'y1' column, used a fixed formula, and rename it back afterwards. Thanks for your help. SGO. -Original Message- From: Bert Gunter [mailto:bgunter.4...@gmail.com] Sent: 22 August 2017 20:38 To: Stephen O'haganCc: r-help@r-project.org Subject: Re: [R] boot.stepAIC fails with computed formula OK, here's the problem. Continuing with your example: strt1 <- lm(y1 ~1, dat) strt2 <- lm(frm1,dat) strt1 Call: lm(formula = y1 ~ 1, data = dat) Coefficients: (Intercept) 41.73 strt2 Call: lm(formula = frm1, data = dat) Coefficients: (Intercept) 41.73 Note that the formula objects of the lm object are different: strt2 does not evaluate the formula. So presumably boot.step.AIC does no evaluation and therefore gets confused with the errors you saw. So you need to get the evaluated formula into the lm object. This can be done, e.g. via: strt2 <- eval(substitute(lm(form,data = dat), list(form = frm1))) ## yielding strt2 Call: lm(formula = y1 ~ 1, data = dat) Coefficients: (Intercept) 41.73 So this looks like it should fix the problem, but alas no, the boot.stepAIC call still fails with the same error message. Here's why: identical(strt$call, strt2$call) [1] FALSE So one might rightfully ask, what the heck is going on here?! Further digging: str(strt$call) language lm(formula = y1 ~ 1, data = dat) str(strt2$call) language lm(formula = y1 ~ 1, data = dat) These certainly look identical! -- but of course they're not: names(strt$call) [1] """formula" "data" names(strt2$call) [1] """formula" "data" So the difference must lie in the formula component, right? ... strt$call$formula y1 ~ 1 strt2$call$formula y1 ~ 1 So, thus far, huhh? But.. class(strt2$call$formula) [1] "formula" class(strt$call$formula) [1] "call" So I think therein lies the critical difference that is screwing things up. NOTE: If I am wrong about this someone **PLEASE** correct me. I see no clear workaround for this other than to explicitly avoid passing a formula in the lm() call with y~1 or y ~ . I think the real fix is to make the boot.stepAIC function smarter in how it handles its formula argument, and that is above my paygrade (and degree of interest) . You should probably email the maintainer, who may not monitor this list. But give it a day or so to give someone else a chance to correct me if I'm wrong. HTH. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Aug 22, 2017 at 8:17 AM, Stephen O'hagan wrote: I'm trying to use boot.stepAIC for feature selection; I need to be able to specify the name of the dependent variable programmatically, but this appear to fail: In R-Studio with MS R Open 3.4: library(bootStepAIC) #Fake data n<-200 x1 <- runif(n, -3, 3) x2 <- runif(n, -3, 3) x3 <- runif(n, -3, 3) x4 <- runif(n, -3, 3) x5 <- runif(n, -3, 3) x6 <- runif(n, -3, 3) x7 <- runif(n, -3, 3) x8 <- runif(n, -3, 3) y1 <- 42+x3 + 2*x6 + 3*x8 + runif(n, -0.5, 0.5) dat <- data.frame(x1,x2,x3,x4,x5,x6,x7,x8,y1) #the real data won't have these names... cn <- names(dat) trg <- "y1" xvars <- cn[cn!=trg] frm1<-as.formula(paste(trg,"~1")) frm2<-as.formula(paste(trg,"~ 1 + ",paste(xvars,collapse = "+"))) strt=lm(y1~1,dat) # boot.stepAIC Works fine #strt=do.call("lm",list(frm1,data=dat)) ## boot.stepAIC FAILS ## #strt=lm(frm1,dat) ## boot.stepAIC FAILS ## limit<-5 stp=stepAIC(strt,direction='forward',steps=limit, scope=list(lower=frm1,upper=frm2)) bst <- boot.stepAIC(strt,dat,B=50,alpha=0.05,direction='forward',steps=limit, scope=list(lower=frm1,upper=frm2)) b1 <- bst$Covariates ball <- data.frame(b1) names(ball)=unlist(trg) Any ideas? Cheers, SOH [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SE for all fixed factor effect in GLMM
maybe qvcalc https://cran.r-project.org/web/packages/qvcalc/index.html is useful for you. Marc Girondot via R-help wrote/hat geschrieben on/am 30.12.2018 05:31: Dear members, Let do a example of simple GLMM with x and G as fixed factors and R as random factor: (note that question is the same with GLM or even LM): x <- rnorm(100) y <- rnorm(100) G <- as.factor(sample(c("A", "B", "C", "D"), 100, replace = TRUE)) R <- as.factor(rep(1:25, 4)) library(lme4) m <- lmer(y ~ x + G + (1 | R)) summary(m)$coefficients I get the fixed effect fit and their SE summary(m)$coefficients Estimate Std. Errort value (Intercept) 0.07264454 0.1952380 0.3720820 x -0.02519892 0.1238621 -0.2034433 GB 0.10969225 0.3118371 0.3517614 GC -0.09771555 0.2705523 -0.3611706 GD -0.12944760 0.2740012 -0.4724344 The estimate for GA is not shown as it is fixed to 0. Normal, it is the reference level. But is there a way to get SE for GA of is-it non-sense question because GA is fixed to 0 ? __ I propose here a solution but I don't know if it is correct. It is based on reordering levels and averaging se for all reordering: G <- relevel(G, "A") m <- lmer(y ~ x + G + (1 | R)) sA <- summary(m)$coefficients G <- relevel(G, "B") m <- lmer(y ~ x + G + (1 | R)) sB <- summary(m)$coefficients G <- relevel(G, "C") m <- lmer(y ~ x + G + (1 | R)) sC <- summary(m)$coefficients G <- relevel(G, "D") m <- lmer(y ~ x + G + (1 | R)) sD <- summary(m)$coefficients seA <- mean(sB["GA", "Std. Error"], sC["GA", "Std. Error"], sD["GA", "Std. Error"]) seB <- mean(sA["GB", "Std. Error"], sC["GB", "Std. Error"], sD["GB", "Std. Error"]) seC <- mean(sA["GC", "Std. Error"], sB["GC", "Std. Error"], sD["GC", "Std. Error"]) seD <- mean(sA["GD", "Std. Error"], sB["GD", "Std. Error"], sC["GD", "Std. Error"]) seA; seB; seC; seD Thanks, Marc __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Value Labels: SPSS Dataset to R
Maybe it helps searching at https://rseek.org/ for "SPSS to R transition value labels". In particular https://cran.r-project.org/web/packages/expss/vignettes/labels-support.html seems useful, as well as https://www.r-bloggers.com/migrating-from-spss-to-r-rstats/ best regards, Heinz Jim Lemon wrote on 07.02.2020 22:58: Hi Yawo, From your recent post, you say you have coerced the variables to factors. If so, perhaps: as.character(x) is what you want. If not, creating a new variable like this: Scratch$new_race<-factor(as.character(Scratch$race),levels=c("WHITE","BLACK")) may do it. Note the "levels" argument to get the numeric values in the same order as the original. Jim On Sat, Feb 8, 2020 at 7:32 AM Yawo Kokuvi wrote: Thanks for all your assistance Attached please is the Rdata scratch I have been using - head(Scratch, n=13) # A tibble: 13 x 6 ID maritalsex racepaeducspeduc 1 1 3 [DIVORCED] 1 [MALE] 1 [WHITE]NANA 2 2 1 [MARRIED] 1 [MALE] 1 [WHITE]NANA 3 3 3 [DIVORCED] 1 [MALE] 1 [WHITE] 4NA 4 4 4 [SEPARATED] 1 [MALE] 1 [WHITE]16NA 5 5 3 [DIVORCED] 1 [MALE] 1 [WHITE]18NA 6 6 1 [MARRIED] 2 [FEMALE] 1 [WHITE]1420 7 7 1 [MARRIED] 2 [FEMALE] 2 [BLACK]NA12 8 8 1 [MARRIED] 2 [FEMALE] 1 [WHITE]NA12 9 9 3 [DIVORCED] 2 [FEMALE] 1 [WHITE]11NA 1010 1 [MARRIED] 2 [FEMALE] 1 [WHITE]1612 1111 5 [NEVER MARRIED] 2 [FEMALE] 2 [BLACK]NANA 1212 3 [DIVORCED] 2 [FEMALE] 2 [BLACK]NANA 1313 3 [DIVORCED] 2 [FEMALE] 2 [BLACK]16NA - and below is my script/command file. *#1: Load library and import SPSS dataset* library(haven) Scratch <- read_sav("~/Desktop/Scratch.sav") *#2: save the dataset with a name* save(ScratchImport, file="Scratch.Rdata") *#3: install & load necessary packages for descriptive statistics* install.packages ("freqdist") library (freqdist) install.packages ("sjlabelled") library (sjlabelled) install.packages ("labelled") library (labelled) install.packages ("surveytoolbox") library (surveytoolbox) *#4: Check the value labels of gender and marital status* Scratch$sex %>% attr('labels') Scratch$marital %>% attr('labels') *#5: Frequency Distribution and BarChart for Categorical/Ordinal Level Variables such as Gender - SEX* freqdist(Scratch$sex) barplot(table(Scratch$marital)) - As you can see from above, I use the package to import the data from SPSS. Apparently, the haven function keeps the value labels, as the attribute options in section #4 of my script shows. The problem is that when I run frequency distribution for any of the categorical variables like sex or marital status, only the numbers (1, 2,) are displayed in the output. The labels (male, female) for example are not. Is there any way to force these to be shown in the output? Is there a global property that I have to set so that these value labels are reliably displayed with every output? I read I can declare them as factors using the , but once I do so, how do I invoke them in my commands so that the value labels show... Sorry about all the noobs questions, but Ihopefully, I am able to get this working. Thanks in advance. Thanks - cY On Fri, Feb 7, 2020 at 1:14 PM wrote: I've never used it, but there is a labels function in haven... On 7 Feb 2020 17:05, Bert Gunter wrote: What does your data look like after importing? -- see ?head and ?str to tell us. Show us the code that failed to provide "labels." See the posting guide below for how to post questions that are likely to elicit helpful responses. I know nothing about the haven package, but see ?factor or go through an R tutorial or two to learn about factors, which may be part of the issue here. R *generally* obtains whatever "label" info it needs from the object being tabled -- see ?tabulate, ?table etc. -- if that's what you're doing. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Feb 7, 2020 at 8:28 AM Yawo Kokuvi wrote: Hello, I am just transitioning from SPSS to R. I used the haven library to import some of my spss data files to R. However, when I run procedures such as frequencies or crosstabs, value labels for categorical variables such as gender (1=male, 2=female) are not shown. The same applies to many other output. I am confused. 1. Is there a global setting that I can use to force all categorical variables to display labels? 2. Or, are these
Re: [R] My dream ...
Abby Spurdle wrote/hat geschrieben on/am 12.05.2020 10:38: In my opinion the advantage of computers is not Artificial Intelligence, but rather Artificial Patience (most AI that I have seen is really doing a bunch of what I would consider to be boring, really fast so people don't have to). Leave the Intelligence to the people. Hmmm... https://en.wikipedia.org/wiki/Artificial_intelligence_in_video_games Also, I found the following while searching for battle chess: https://youtu.be/hBNG7444lOw (Warning: Contains aggressive chess tactics). Also, correct me if I'm wrong, but doesn't Emacs have historical connections to AI research...? Maybe a matter of definition, but admittedly I have to use a lot of my intelligence for doing boring work. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Some seemingly odd behavior of survfit in library survival
Jeff Newmiller wrote/hat geschrieben on/am 17.03.2020 05:39: The coxph function appears to rely on finding the name of the data argument in the environment in which the formula was created. The lm function does not have this problem. Oh, and df is the name of the F distribution density function, which explains why the error complained about a "closure". new<-as.data.frame(list(t=1:4,d=rep(1,4),s=c(1,0,0,1))) library(survival) mine<-function(ff,df){ fit<-coxph(as.formula(ff),data=df) survfit(fit,df) } mine("Surv(t,d)~s",new) Therefore a workaround could be to use the character string for the formula as argument and apply as.formula() within the function. mine2 <-function(fstr,df){ fit<-coxph(as.formula(fstr),data=df) out<-survfit(fit,df) out } mine2("Surv(t,d)~s",new) Heinz On March 16, 2020 7:23:26 PM PDT, John Kolassa wrote: I ran across an issue that looks like variable scoping in survfit is not acting as I would expect. Here's a minimal example: new<-as.data.frame(list(t=1:4,d=rep(1,4),s=c(1,0,0,1))) library(survival) mine<-function(ff,df){ fit<-coxph(ff,data=df) out<-survfit(fit,df) } mine(as.formula("Surv(t,d)~s"),new) I would expect this to fit the proportional hazards regression model using formula Surv(t,d)~s, using data set new, and then calculate a separate fitted survival curve for each member of the data set. Instead I get an error Error in eval(predvars, data, env) : invalid 'envir' argument of type 'closure' The code runs without error if I modify it by copying the data set new to the local variable within the function mine before running: new<-as.data.frame(list(t=1:4,d=rep(1,4),s=c(1,0,0,1))) library(survival) mine<-function(ff,df){ fit<-coxph(ff,data=df) out<-survfit(fit,df) } df<-new mine(as.formula("Surv(t,d)~s"),new) which leads me to believe that there's some variable scoping error. Can anyone point out what I'm doing wrong? Thanks, John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to convert European short dates to ISO format?
maybe isoDates <- as.Date(oriDates, format = "%d/%m/%y") Heinz Luigi Marongiu wrote/hat geschrieben on/am 10.06.2020 10:20: Hello, I have been trying to convert European short dates formatted as dd/mm/yy into the ISO 8601 but the function as.Dates interprets them as American ones (mm/dd/yy), thus I get: ``` oriDates = c("23/01/20", "24/01/20", "25/01/20", "26/01/20", "27/01/20", "28/01/20", "29/01/20", "30/01/20", "31/01/20", "01/02/20", "02/02/20", "03/02/20", "04/02/20", "05/02/20", "06/02/20", "07/02/20") isoDates = as.Date(oriDates, format = "%m/%d/%y") isoDates [1] NA NA NA NA NA NA NA [8] NA NA "2020-01-02" "2020-02-02" "2020-03-02" "2020-04-02" "2020-05-02" [15] "2020-06-02" "2020-07-02" ``` How can I convert properly? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] svglite with multiple files
Dear Bert, of course I have read the posting guide, almost two decades ago, but usually I focus on reading carefully a posting before replying. To be more precise, I did *not* ask about the svglite-package, as it works as described. My question was: "Is there a simple solution to make svglite() work like svg() to produce several files?" To make it more explicit for you, I could add "using standard packages distributed with R". So maybe you or others have some ideas, how to solve that question in a convenient way. As mentioned I know complicated solutions, as e.g. call svglite() before dev.off() after every plot. best, Heinz Bert Gunter wrote/hat geschrieben on/am 09.12.2020 17:35: Sigh... Per the posting guide (which you have read, right?): "For questions about functions in standard packages distributed with R (see the FAQ Add-on packages in R <http://cran.r-project.org/doc/FAQ/R-FAQ.html#Add-on-packages-in-R>), ask questions on R-help.If the question relates to a *contributed package* , e.g., one downloaded from CRAN, try contacting the package maintainer first. You can also use find("functionname") and packageDescription("packagename") to find this information. *Only* send such questions to R-help or R-devel if you get no reply or need further assistance. This applies to both requests for help and to bug reports. " This certainly sounds like a question for the svglite maintainer, ?maintainer, who might know about "tricks" that one could use. Though you might get lucky here -- it's just that you should not expect to. If you tried to contact the maintainer but received no response, do include that info in your post. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Dec 9, 2020 at 3:33 AM Heinz Tuechler wrote: Dear All, while svg() (package grDevices) can produce several files, svglite() (package svglite) is limited to one file/page only (as documented in the respective help page). Is there a simple solution to make svglite() work like svg() to produce several files? Of course one could call svglite() before dev.off() after every plot. best regards, Heinz ## example svg("Rplot%03d.svg") plot(1) plot(2) plot(3) dev.off() ## three files Rplot001.svg, Rplot002.svg, Rplot003.svg are produced library(svglite) svglite("Rplot-lite.svg") plot(1) plot(2) ## as documented: Error in plot.new() : svglite only supports one page __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] svglite with multiple files
Dear All, while svg() (package grDevices) can produce several files, svglite() (package svglite) is limited to one file/page only (as documented in the respective help page). Is there a simple solution to make svglite() work like svg() to produce several files? Of course one could call svglite() before dev.off() after every plot. best regards, Heinz ## example svg("Rplot%03d.svg") plot(1) plot(2) plot(3) dev.off() ## three files Rplot001.svg, Rplot002.svg, Rplot003.svg are produced library(svglite) svglite("Rplot-lite.svg") plot(1) plot(2) ## as documented: Error in plot.new() : svglite only supports one page __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inappropriate color name
inline - David Wright wrote on 19.11.2020 12:39: Appropriation of Indian Red as 'Chestnut' (or other alternative) will be viewed by some as 'making appropriate' the label for a colour, and no doubt by other groups as cultural theft by excising reference to its origin. Seems the best option is to recognise the actual etymology carries no semblance of offense whatsoever, and leave well alone. One may remember that people who might feel offended by "Indian Red" (Native Americans) make up less than 0.5 percent of all "Indians". It is hardly the fault of the people of India that Native Americans were called Indians by an Italian navigator who thought he had landed in India. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Defining partial list of variables
What about the Cs()-function in Hmisc? library(Hmisc) Cs(a,b,c) [1] "a" "b" "c" Steven Yen wrote/hat geschrieben on/am 05.01.2021 13:29: Thanks Eric. Yes, "unlist" makes a difference. Below, I am doing not regression but summary to keep the example simple. > set.seed(123) > data<-matrix(runif(1:25),nrow=5) > colnames(data)<-c("x1","x2","x3","x4","x5"); data x1x2x3 x4x5 [1,] 0.2875775 0.0455565 0.9568333 0.89982497 0.8895393 [2,] 0.7883051 0.5281055 0.4533342 0.24608773 0.6928034 [3,] 0.4089769 0.8924190 0.6775706 0.04205953 0.6405068 [4,] 0.8830174 0.5514350 0.5726334 0.32792072 0.9942698 [5,] 0.9404673 0.4566147 0.1029247 0.95450365 0.6557058 > j<-strsplit(gsub("[\n ]","","x1,x3,x5"),",") > j<-unlist(j); j [1] "x1" "x3" "x5" > summary(data[,j]) x1 x3 x5 Min. :0.2876 Min. :0.1029 Min. :0.6405 1st Qu.:0.4090 1st Qu.:0.4533 1st Qu.:0.6557 Median :0.7883 Median :0.5726 Median :0.6928 Mean :0.6617 Mean :0.5527 Mean :0.7746 3rd Qu.:0.8830 3rd Qu.:0.6776 3rd Qu.:0.8895 Max. :0.9405 Max. :0.9568 Max. :0.9943 On 2021/1/5 下午 07:08, Eric Berger wrote: wrap it in unlist xx <- unlist(strsplit( )) On Tue, Jan 5, 2021 at 12:59 PM Steven Yen mailto:st...@ntu.edu.tw>> wrote: Thanks Eric. Perhaps I should know when to stop. The approach produces a slightly different variable list (note the [[1]]). Consequently, I was not able to use xx in defining my regression formula. > x<-colnames(subset(mydata,select=c( +hhsize,urban,male, +age3045,age4659,age60, # age1529 +highsc,tert, # primary +gov,nongov,# unemp +married))); x [1] "hhsize" "urban" "male""age3045" "age4659" "age60" "highsc" "tert" [9] "gov" "nongov" "married" > xx<-strsplit(gsub("[\n ]","", +"hhsize,urban,male, + age3045,age4659,age60, + highsc,tert, + gov,nongov, + married" + ),","); xx [[1]] [1] "hhsize" "urban" "male""age3045" "age4659" "age60" "highsc" "tert" [9] "gov" "nongov" "married" > eq1<-my.formula(y="cig",x=x); eq1 cig ~ hhsize + urban + male + age3045 + age4659 + age60 + highsc + tert + gov + nongov + married > eq2<-my.formula(y="cig",x=xx); eq2 cig ~ c("hhsize", "urban", "male", "age3045", "age4659", "age60", "highsc", "tert", "gov", "nongov", "married") On 2021/1/5 下午 06:01, Eric Berger wrote: If your column names have no spaces the following should work x<-strsplit(gsub("[\n ]","", "hhsize,urban,male, + gov,nongov,married"),","); x On Tue, Jan 5, 2021 at 11:47 AM Steven Yen mailto:st...@ntu.edu.tw>> wrote: Here we go! BUT, it works great for a continuous line. With line break(s), I got the nuisance "\n" inserted. > x<-strsplit("hhsize,urban,male,gov,nongov,married",","); x [[1]] [1] "hhsize" "urban" "male""gov" "nongov" "married" > x<-strsplit("hhsize,urban,male, + gov,nongov,married",","); x [[1]] [1] "hhsize""urban" "male" "\ngov" [5] "nongov""married" On 2021/1/5 下午 05:34, Eric Berger wrote: zx<-strsplit("age,exercise,income,white,black,hispanic,base,somcol,grad,employed,unable,homeowner,married,divorced,widowed",",") On Tue, Jan 5, 2021 at 11:01 AM Steven Yen mailto:st...@ntu.edu.tw>> wrote: Thank you, Jeff. IMO, we are all here to make R work better to suit our various needs. All I am asking is an easier way to define variable list zx, differently from the way z0 , x0, and treat are defined. > zx<-colnames(subset(mydata,select=c( + age,exercise,income,white,black,hispanic,base,somcol,grad,employed, + unable,homeowner,married,divorced,widowed))) > z0<-c("fruit","highblood") > x0<-c("vgood","poor") > treat<-"depression" > eq1 <-my.formula(y="depression",x=zx,z0) > eq2 <-my.formula(y="bmi", x=zx,x0) > eq2t<-my.formula(y="bmi", x=zx,treat) > eqs<-list(eq1,eq2); eqs [[1]] depression ~ age + exercise + income + white + black + hispanic + base + somcol + grad + employed + unable + homeowner + married + divorced + widowed + fruit + highblood [[2]] bmi ~ age + exercise + income + white + black + hispanic + base + somcol + grad + employed + unable + homeowner + married + divorced + widowed + vgood + poor > eqt<-list(eq1,eq2t); eqt [[1]] depression ~ age + exercise + income + white
Re: [R] Defining partial list of variables
see below Steven Yen wrote/hat geschrieben on/am 05.01.2021 08:14: I constantly define variable lists from a data frame (e.g., to define a regression equation). Line 3 below does just that. Placing each variable name in quotation marks is too much work especially for a long list so I do that with line 4. Is there an easier way to accomplish thisto define a list of variable names containing "a","c","e"? Thank you! data<-as.data.frame(matrix(1:30,nrow=6)) colnames(data)<-c("a","b","c","d","e"); data a b c d e 1 1 7 13 19 25 2 2 8 14 20 26 3 3 9 15 21 27 4 4 10 16 22 28 5 5 11 17 23 29 6 6 12 18 24 30 x1<-c("a","c","e"); x1 # line 3 [1] "a" "c" "e" x2<-colnames(subset(data,select=c(a,c,e))); x2 # line 4 [1] "a" "c" "e" What about: x3 <- names(data)[c(1,3,5)] x3 [1] "a" "c" "e" If I have to compile longer vectors of variable names I do it as follows: First I use: dput(names(data)) resulting in a vector of names. c("a", "b", "c", "d", "e") Then I edit the output by hand, e.g. x4 <- c("a", "b", "c", "d", "e") x4 <- c("a", "c", "e") This is especially useful with long names, where I could easily make typing errors. regards, Heinz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Translation of the charter
Greg, Greg Minshall wrote/hat geschrieben on/am 02.11.2021 08:57: Heinz, x <- c("a","b","c") lettersnum <- 1:length(letters[]) names(lettersnum) <- letters[] lettersnum[x] lettersnum[x] a b c 1 2 3 i'm not sure if the following is obviously better, but one might do b <- match(a, a) names(b) <- a b a b c 1 2 3 cheers, Greg You are right - match seems obviously better, but why not do x <- c("a","b","c") match(x, letters[]) best, Heinz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Translation of the charter
Alice wrote/hat geschrieben on/am 31.10.2021 07:33: Dear members, How to translate the charter to the underline inter? I tried this: x <- c("a","b","c") as.numeric(x) [1] NA NA NA Warning message: NAs introduced by coercion It didn't work. Sorry for my newbie questions. B.R. Alice [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Is this, what you are looking for: x <- c("a","b","c") lettersnum <- 1:length(letters[]) names(lettersnum) <- letters[] lettersnum[x] > lettersnum[x] a b c 1 2 3 best, Heinz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] return value of {....}
09.01.2023 18:05:58 akshay kulkarni : We are living in the 21st century world, and the R-core team might,I suppose, have a definite reason ... Maybe compatibility reasons with S and R-versions from the 20st century? But maybe, you would have expected some reason even then. best regards, Heinz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple Stacking of Two Columns
Jeff Newmiller wrote/hat geschrieben on/am 03.04.2023 18:26: unname(unlist(NamesWide)) Why not: NamesWide <- data.frame(Name1=c("Tom","Dick"),Name2=c("Larry","Curly")) NamesLong <- data.frame(Names=with(NamesWide, c(Name1, Name2))) On April 3, 2023 8:08:59 AM PDT, "Sparks, John" wrote: Hi R-Helpers, Sorry to bother you, but I have a simple task that I can't figure out how to do. For example, I have some names in two columns NamesWide<-data.frame(Name1=c("Tom","Dick"),Name2=c("Larry","Curly")) and I simply want to get a single column NamesLong<-data.frame(Names=c("Tom","Dick","Larry","Curly")) NamesLong Names 1 Tom 2 Dick 3 Larry 4 Curly Stack produces an error NamesLong<-stack(NamesWide$Name1,NamesWide$Names2) Error in if (drop) { : argument is of length zero So does bind_rows NamesLong<-dplyr::bind_rows(NamesWide$Name1,NamesWide$Name2) Error in `dplyr::bind_rows()`: ! Argument 1 must be a data frame or a named atomic vector. Run `rlang::last_error()` to see where the error occurred. I tried making separate dataframes to get around the error in bind_rows but it puts the data in two different columns Name1<-data.frame(c("Tom","Dick")) Name2<-data.frame(c("Larry","Curly")) NamesLong<-dplyr::bind_rows(Name1,Name2) NamesLong c..TomDick.. c..LarryCurly.. 1 Tom 2 Dick 3Larry 4Curly gather makes no change to the data NamesLong<-gather(NamesWide,Name1,Name2) NamesLong Name1 Name2 1 Tom Larry 2 Dick Curly Please help me solve what should be a very simple problem. Thanks, John Sparks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.