Re: [R] plot rownames
On Tue, 27 May 2008, T.D.Rudolph wrote: In the following example: x - rnorm(1:100) y - seq(from=-2.5, to=3.5, by=0.5) z - as.matrix(table(cut(x,c(-Inf, y, +Inf ## I wish to transform the values in z j - log(z) ## Yet retain the row names row.names(j)-row.names(z) Hmm. The rownames were retained and row.names() is for data frames, rownames() for matrices. Now, how can I go about creating a scatterplot with row.names(j) on the x-axis and j(m:nrow(j)) on the y-axis? Keep in mind the transformation I am conducting is more complicated and therefore cannot be plotted directly using log(z), which places me in this unique position. You will need to explain what exactly you want plotted. A scatterplot is of two continuous variables and you only have one (and 'j(m:nrow(j))' is invaild R). But here is one possibility to get you started: m - nrow(j) plot(j, xaxt=n, xlab=) axis(1, 1:m, rownames(j), las=2) Tyler -- View this message in context: http://www.nabble.com/plot-rownames-tp17504882p17504882.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Programming.
Hi Marcus! On Wednesday 28 May 2008 05:56, Marcus Vinicius wrote: Dear all, May anyone explain to me how I run a linear programming or Data Envelopment Analysis (DEA models) into R? Package DEA (on CRAN): http://cran.r-project.org/web/packages/DEA/index.html Package FEAR (NOT on CRAN): http://www.clemson.edu/economics/faculty/wilson/Software/FEAR/fear.html Best wishes, Arne Thanks a lot. Best Regards. Marcus Vinicius [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Arne Henningsen http://www.arne-henningsen.name __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to make R running faster
Hi everyone, I run the R loops on window XP and vista. Both are Intel core 2 Duo 2.2 GHz with 2 GB ram and XP is significantly faster than vista. Dose anyone know how speed up R loops in vista? Thank you in advance. Chunhao Tu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tukey HSD (or other post hoc tests) following repeated measures ANOVA
Ullrich Ecker ullrich.ecker at uwa.edu.au writes: I am fairly new to R, and I am aware that others have had this problem before, but I have failed to solve the problem from previous replies I found in the archives. Now, I want to calculate a post-hoc test following up a within-subjects ANOVA. Probably the best is package multcomp. By default, it gives confidence intervals, which is certainly much better than p-values. Haven't tried if you can get p-values. Dieter library(multcomp) amod - aov(breaks ~ wool + tension, data = warpbreaks) wht - glht(amod, linfct = mcp(tension = Tukey)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R News, volume 8, issue 1 is now available
Dear R users, The May 2008 issue of `R News' is now available on CRAN under the Documentation/Newsletter link. John (on behalf of the R News Editorial Board) John Fox, Professor Department of Sociology McMaster University Hamilton ON Canada L8S 4M4 web: socserv.mcmaster.ca/jfox ___ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-announce __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] can I do this with R?
Hello, I am just about to install R and was wondering about a few things. I have only worked in Matlab because I wanted to do a logistic regression. However Matlab does not do logistic regression with stepwiseforward method. Therefore I thought about testing R. So my question is can I do logistic regression with stepwise forward in R? Thanks /M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hash or other quick lookup function?
Hi Duncan, Duncan Murdoch wrote: Duncan Murdoch wrote: Esmail Bonakdarian wrote: Hello all, I have a matrix of bit values. I compute certain values based on the bits in each row. There may be *duplicate* entries in the matrix, ie several rows may be identical. These rows change over time, and identical entries may not be next to each other. Computing these values is time consuming, so I would like a way to store a value once it's computed. That way I would be able to look up the value based on a given bitstring (row in the matrix) and avoid having to re-compute the same value. So, my thought is a hash function based on a bit string - does R have such a thing? Or is there an alternative approach that would do the same in R? The lookup should be quick, otherwise the gains from avoiding recomputing identical values would be lost to some extent. Environments can be hashed. To use this, you'd convert the bit string into a character string, and use that as the name of a variable in an environment. For example, cache - new.env(hash=TRUE, parent=emptyenv()) ... bits - '0101' if (exists(bits, cache)) { return(get(bits, cache)) } else { value - long.computation() assign(bits, value) Whoops, that assignment should have been in the cache: assign(bits, value, envir=cache) return(value) } Thank you for posting this, I'll have to give it a try. If I can get this to work it should save some some time spent on recomputing values. (Sorry for the delayed reply, I was traveling for the whole last day - suffering from some reverse jet lag at the moment :-) Can anyone else think of other alternatives? Always happy to learn something new. Thanks! Esmail ps: do you think the conversion to a character string before comparison/search will cause a noticeable performance hit? I still lack an intuitive feel for R the programming language. I'd have more of an idea with regular high-level programming languages. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behaviour in reading genomic coordinate files of R-2.7.0
From the NEWS file for R-patched: o A field containing just a sign is no longer regarded as numeric (it was on all platforms in 2.7.0, but not on most in earlier versions of R). So the default behaviour has already been changed. The right way to overcome this was (as you discovered) to say what type you want the variables to be rather than let R guess for you. But you cna also update your R (as the posting guide suggests). On Wed, 28 May 2008, Margherita wrote: Great R people, I have noticed a strange behaviour in read.delim() and friends in the R 2.7.0 version. I will describe you the problem and also the solution I already found, just to be sure it is an expected behaviour and also to tell people, who may experience the same difficulty, a way to overcome it. And also to see if it is a proper behaviour or maybe a correction is needed. Here is the problem: I have some genomic coordinates files (bed files, a standard format, for example) containing a column (Strand) in which there is either a + or a -. In R-2.6.2patched (and every past version I have used) I never had problems in reading them in, as for example: a - read.table(coords.bed, skip=1) disp(a) class data.frame dimensions are 38650 6 first rows: V1V2V3V4 V5 V6 1 chr1 100088396 100088446 seq1 0 + 2 chr1 100088764 100088814 seq2 0 - If I do exactly the same command on the same file in R-2.7.0 the result I obtain is: a - read.table(coords.bed, skip=1) disp(a) class data.frame dimensions are 38650 6 first rows: V1V2V3V4 V5 V6 1 chr1 100088396 100088446 seq1 0 0 2 chr1 100088764 100088814 seq2 0 0 and I completely loose the strand information, they are all zeros! I have also tried to put quotes around + and - in the file before reading it, to set in read.table() call stringsAsFactors=FALSE, to set encoding to a few different alternatives, but the result was always the same: they are all transformed in 0. Then I tried scan() and I saw it was reading the character + properly: scan(coords.bed, skip=1, nlines=1, what=ch) Read 6 items [1] chr1 100088396100088446.00 seq1 0 [6] + ...my conclusion is that the lone + or - are not taken as characters in the data frame creation step, they are taken as numeric but, being without numbers are all converted to 0. Is it correct if this behaviour happens also if they are surrounded by quotes? Anyway, my temporary solution (which works without the need of changing the files) is: a - read.table(coords.bed, skip=1, colClasses=c(character, numeric, numeric, character, numeric, character)) a[1:2,] V1V2V3V4 V5 V6 1 chr1 100088396 100088446 seq1 0 + 2 chr1 100088764 100088814 seq2 0 - Another way to avoid loosing strand information was to manually substitute an R to - and an F to + in the file before reading it in R. But it is much more cumbersome since the use of + and - is, for example, a standard format in bed files accepted and generated by the Genome Browser and other genome sites. Please let me know what do you think. Ps. I saw this first in the Fedora version (rpm automatically updated), but it is reproduced also in the Windows version. Thank you all people for your work and for making R the wonderful tool it is! Cheers, Margherita -- -- --- Margherita Mutarelli, PhD Seconda Universita' di Napoli Dipartimento di Patologia Generale via L. De Crecchio, 7 80138 Napoli - Italy Tel/Fax. +39.081.5665802 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] request: which integer in each column is in majority
Muhammad Azam wrote: Respected R helpers/ users I am one of the new R user. I have a problem regarding to know which of the integer in each column of the following matrix is in majority. I want to know that integer e.g. in the first column 1 is in majority. Similarly in the third column 4 is in majority. So what is the suitable way to get the desired integer for each column. I am looking for some kind reply. Thanks example: x=matrix(c(1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4,1,2,3,3),ncol=4) x [,1] [,2] [,3] [,4] [1,]1234 [2,]1241 [3,]1342 [4,]2343 [5,]2343 Probably bad but working example: apply(x, 2, function(x) as.numeric(names(which.max(table(x)))[1])) Uwe Ligges best regards Muhammad Azam Ph.D. Student Department of Medical Statistics, Informatics and Health Economics University of Innsbruck, Austria [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rmeta package: metaplot or forestplot of meta-analysis under DSL (ramdon) model
I don't know if this helps, but recently I tried to use rmeta to make a forest plot and gave up because the data I had were not in the right format, so I simulated a forest plot using gplots. I did it all sideways and then rotated the PostScript. See http://www.sas.upenn.edu/~baron/journal/jdm71128.pdf Figure 4 on p. 300 for the result, and http://www.sas.upenn.edu/~baron/journal/71128/figs.R for the code (Figure 4). The last line in the comment shows how to rotate the PostScript. Jon On 05/27/08 20:59, Shi, Jiajun [BSD] - KNP wrote: Dear all, I could not draw a forest plot for meta-analysis under ramdon models using the rmeta package. The rmeta has a default function for MH (fixed-effect) model. Has the rmeta package been updated for such a function? Or someone revised it and kept a private code? -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron Editor: Judgment and Decision Making (http://journal.sjdm.org) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unexpected behaviour in reading genomic coordinate files of R-2.7.0
Great R people, I have noticed a strange behaviour in read.delim() and friends in the R 2.7.0 version. I will describe you the problem and also the solution I already found, just to be sure it is an expected behaviour and also to tell people, who may experience the same difficulty, a way to overcome it. And also to see if it is a proper behaviour or maybe a correction is needed. Here is the problem: I have some genomic coordinates files (bed files, a standard format, for example) containing a column (Strand) in which there is either a + or a -. In R-2.6.2patched (and every past version I have used) I never had problems in reading them in, as for example: a - read.table(coords.bed, skip=1) disp(a) class data.frame dimensions are 38650 6 first rows: V1V2V3V4 V5 V6 1 chr1 100088396 100088446 seq1 0 + 2 chr1 100088764 100088814 seq2 0 - If I do exactly the same command on the same file in R-2.7.0 the result I obtain is: a - read.table(coords.bed, skip=1) disp(a) class data.frame dimensions are 38650 6 first rows: V1V2V3V4 V5 V6 1 chr1 100088396 100088446 seq1 0 0 2 chr1 100088764 100088814 seq2 0 0 and I completely loose the strand information, they are all zeros! I have also tried to put quotes around + and - in the file before reading it, to set in read.table() call stringsAsFactors=FALSE, to set encoding to a few different alternatives, but the result was always the same: they are all transformed in 0. Then I tried scan() and I saw it was reading the character + properly: scan(coords.bed, skip=1, nlines=1, what=ch) Read 6 items [1] chr1 100088396100088446.00 seq1 0 [6] + ...my conclusion is that the lone + or - are not taken as characters in the data frame creation step, they are taken as numeric but, being without numbers are all converted to 0. Is it correct if this behaviour happens also if they are surrounded by quotes? Anyway, my temporary solution (which works without the need of changing the files) is: a - read.table(coords.bed, skip=1, colClasses=c(character, numeric, numeric, character, numeric, character)) a[1:2,] V1V2V3V4 V5 V6 1 chr1 100088396 100088446 seq1 0 + 2 chr1 100088764 100088814 seq2 0 - Another way to avoid loosing strand information was to manually substitute an R to - and an F to + in the file before reading it in R. But it is much more cumbersome since the use of + and - is, for example, a standard format in bed files accepted and generated by the Genome Browser and other genome sites. Please let me know what do you think. Ps. I saw this first in the Fedora version (rpm automatically updated), but it is reproduced also in the Windows version. Thank you all people for your work and for making R the wonderful tool it is! Cheers, Margherita -- -- --- Margherita Mutarelli, PhD Seconda Universita' di Napoli Dipartimento di Patologia Generale via L. De Crecchio, 7 80138 Napoli - Italy Tel/Fax. +39.081.5665802 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Tukey HSD (or other post hoc tests) following repeated measures ANOVA
Hi everyone, I am fairly new to R, and I am aware that others have had this problem before, but I have failed to solve the problem from previous replies I found in the archives. As this is such a standard procedure in psychological science, there must be an elegant solution to this...I think. I would much appreciate a solution that even I could understand... ;-) Now, I want to calculate a post-hoc test following up a within-subjects ANOVA. The dv is reaction time (RT), there is a 3-level Condition factor (Cond; within-subjects), a number of subjects (Subj), and the dataframe is called WMU3C. The model is RT.aov - aov(RT~Cond + Error(Subj/Cond), WMU3C) I understand that TukeyHSD only works with an aov object, but that RT.aov is an aovlist object. class(RT.aov) [1] aovlist listof I've tried to work around it using the maiz example in the MMC documentation of the HH package (a solution previously recommended), but I couldn't get it to work: My best shot was to calculate another aov avoiding the error term (I don't see how this could be a feasible approach, but that's how I understood the MMC example) and a contrast vector (contrasting conditions 2 and 3): I have to admit that I don't quite understand what I'm doing here (not that you couldn't tell) RT2.aov - aov(terms(RT~Subj*Cond, WMU3C)) Cond.lmat - c(0,1,-1) Tukey - glht.mmc(RT2.aov, focus = Cond, focus.lmat = Cond.lmat) yielding Error in mvt(lower = carg$lower, upper = carg$upper, df = df, corr = carg$corr, : NA/NaN/Inf in foreign function call (arg 6) In addition: Warning message: In cov2cor(covm) : diagonal has non-finite entries Tukey height Thank you very much for your help! Ullrich Dr Ullrich Ecker Postdoctoral Research Associate Cognitive Science Laboratories School of Psychology (Mailbag M304) Room 211 Sanders Building University of Western Australia 35 Stirling Hwy Crawley WA 6009 Australia Office: +61 8 6488 3266 Mobile: +61 4 5822 0072 Fax: +61 8 6488 1006 E-mail: [EMAIL PROTECTED] Web: www.cogsciwa.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] confidence interval for the logit - predict.glm
Hello all, I've come across an online posting http://www.biostat.wustl.edu/archives/html/s-news/2001-10/msg00119.html that described how to get confidence intervals for predicted values from predict.glm. These instructions were meant for S-Plus. Yet, it generally seems to work with R too, but I am encountering some problems. I am explaining my procedure in the following and would be most grateful if anyone could reply to me on that. I have run glm models to predict fish species from a number of environmental predictors, using a stepwise approach (I am aware that there is some controversy going on about this). I have got a data set that is called testfishes.txt, predy.txt contains the predictors for the testfishes only and predallx.txt contains all predictors for all planning units. I used the following commands: stepmodfi is the output of the stepwise approach predictionlogit - predict(stepmodfi, predallx, type=link, se.fit=TRUE) upperlogit - predictionlogit$fit + 2*predictionlogit$se.fit lowerlogit - predictionlogit$fit - 2*predictionlogit$se.fit upperresponse[,i] - exp(upperlogit)/(1+exp(upperlogit)) lowerresponse[,i] - exp(lowerlogit)/(1+exp(lowerlogit)) predictionresponse - predict(stepmodfi, predallx, type=response, se.fit=FALSE) fishoccur[,i] - predictionresponse type=link should be the equivalent to the S-Plus version of type=1, which was indicated in the online posting explaining this procedure. So I first tried to get predictions on the logit scale and there is already the point where I am a bit bewildered. Predictionlogit would only result in ONE value for every single planning unit contained in predallx.txt, when actually I would assume that there would be planning unit times fishes predicted values - at least this is what I get from predictionresponse. Predictionresponse generates as many predicted values as there are planning units times fishes (1680 planning units x 205 fish species, whihc are predicted over all these planning units), which I put into a constructed matrix named fishoccur. In fact this is what I did in the very beginning, generating predicted values with the command predictionresponse - predict(stepmodfi, predallx, type=response, se.fit=FALSE). Then I wanted to construct confidence intervals around these values. Therefore I first generated the logit predicted values, but that already is the point I am unsure about. The problem in fact is, that quite some of my upper values of the confidence interval result in NAs. The lower interval seems fine. Yet, I am not even sure if my approach is correct, regarding this single value that is being produced by the prediction on the logit scale. I hope that my descriptions were clear enough and would greatly appreciate if anyone could suggest how to tackle this, whether the approach itself is fine (which I believe indeed) and how to get proper lower and upper confidence values. Many thanks in advance! Christine -- Super-Aktion nur in der GMX Spieleflat: 10 Tage für 1 Euro. Über 180 Spiele downloaden: http://flat.games.gmx.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Computing P-Value
Hi, Is there an R function or package that computes p-value? -GV __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to bind lists recursively
[EMAIL PROTECTED] wrote: This is somewhat subtle. Rolf's solution (here corrected to...) a - list() for(i in 0:1) a[[i+1]] - i is the best of the loop solutions (or at least the best I know of). The But Bill does know a better way -- it just slipped his mind. system.time({ a - list(); for(i in 0:1) a[[i+1]] - i}) user system elapsed 1.140.001.16 system.time({ a - vector(list, 10001); for(i in 0:1) a[[i+1]] - i}) user system elapsed 0.110.000.12 The lesson is to create the object to be the final length rather than growing the object. Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) apparently similar a - list(0) for(i in 1:1) a - c(a, list(i)) will take a lot longer, although the result is the same. For example: system.time({ a - list() for(i in 0:1) a[[i+1]] - i }) user system elapsed 0.590.000.59 system.time({ a - list(0) for(i in 1:1) a - c(a, list(i)) }) user system elapsed 6.870.006.89 That's a factor of about 11 times as long. The best of the lot is a - as.list(0:1) of course, but this has problems with generalisation, (which everyone suspects is going to be needed here...). Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rolf Turner Sent: Wednesday, 28 May 2008 1:02 PM To: Daniel Yang Cc: r-help@r-project.org Subject: Re: [R] how to bind lists recursively On 28/05/2008, at 2:43 PM, Daniel Yang wrote: Dear all, I want to create a list that contains 0,1,2,3, ..., 1 as its elements. I used the following code, which apparently doesn't work very well. a - 0 for(i in 1:1) { a - list(a, i) } The result is not what I wanted. So how to create the bind lists recursively so that the last element would be the newly added one while the previous elements all remain the same? a - list() for(i in 1:1) a[[i]] - i (The word ``bind'' is inappropriate.) cheers, Rolf Turner ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] request: which integer in each column is in majority
Respected R helpers/ users I am one of the new R user. I have a problem regarding to know which of the integer in each column of the following matrix is in majority. I want to know that integer e.g. in the first column 1 is in majority. Similarly in the third column 4 is in majority. So what is the suitable way to get the desired integer for each column. I am looking for some kind reply. Thanks example: x=matrix(c(1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4,1,2,3,3),ncol=4) x [,1] [,2] [,3] [,4] [1,]1234 [2,]1241 [3,]1342 [4,]2343 [5,]2343 best regards Muhammad Azam Ph.D. Student Department of Medical Statistics, Informatics and Health Economics University of Innsbruck, Austria [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] request: which integer in each column is in majority
Muhammad Azam: I am one of the new R user. I have a problem regarding to know which of the integer in each column of the following matrix is in majority. I want to know that integer e.g. in the first column 1 is in majority. x=matrix(c(1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4,1,2,3,3),ncol=4) x [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 1 2 4 1 [3,] 1 3 4 2 [4,] 2 3 4 3 [5,] 2 3 4 3 As long as the matrix only contains integers, the following should work: apply(x, 2, function(z) which.max(tabulate(z)) ) Output: 1 3 4 3 -- Karl Ove Hufthammer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pros and Cons of R
I feel the discussion about ease of installation on Linux (/*NIX type systems) isn't really relevant to the Pros and Cons of R. The problems encountered by people are often a consequence of their lack of knowledge/understanding of the operating system, and not a deficiency of R itself. Just my tuppence, but I consider myself to be competent in using Linux so this view may be somewhat biased, Neil -- View this message in context: http://www.nabble.com/Pros-and-Cons-of-R-tp17407521p17510625.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sample size for 2-sample proportion tests
Hallo! I found a question exactly as mine, but I did not found an answer. Therefore I post this again - hopefully there will be an answer! Thanks in advance! karl From: Berta ibanez Date: Tue, 27 Feb 2007 18:58:48 +0100 Hi R-users, I want to calculate the sample size needed to carry out a 2-sample proprotion test. I have the hypotesized treatment probability of success (0.80), the hypotesized control probability of success (0.05), and also de proportion of the sample devoted to treated group (5%), (fraction=rho=0.05, n2/n1=19). Using the Hsmisc library, it seemss that I can use bsamsize (option 1) or samplesize.bin (option 2, alpha=0.05 or option 3 alpha=0.05/2, I am not sure after reading the help page) and I can use STATA (option 4). library(Hmisc) #OPTION 1: bsamsize(p1=0.8, p2=0.05, fraction=0.05, alpha=.05, power=.9) # n1 =2.09, n2=39.7, TOTAL=42 #OPTION 2: samplesize.bin(alpha=0.05, beta=0.9, pit=0.8, pic=0.05, rho=0.05) # n= 58, TOTAL= 58 #OPTION 3: samplesize.bin(alpha=0.025, beta=0.9, pit=0.8, pic=0.05, rho=0.05) # n= 72, TOTAL= 72 #OPTION 4: sampsi 0.8 0.05, p(0.90) a(0.05) r(19) # n1=4, n2 = 76 TOTAL=80 Can the method used produces the differences (42 vs 72 vs 80)? Can somebody give me hints about the possible reasons (asymptotic-exact distribution- continuity correction-my own error)? Which method would be recomended? Thanks a lot in advance, Berta __ Ge .mail.yahoo.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] extracting information from lmer objects
Hi, I wish to extract a subset of the information of given by summary(lmer.object) as a dataframe. In particular, I wish to extract just a table listing the Estimate, Std Error, and t-values rounded to 3 decimal places. I have learned how to extract the coefficients with round(fixef(lmer.object),3) and the standard errors with round(sqrt(diag(vcov(a.lmer))),3) but I do not know how to extract the t-values; the extractor methods do not seem to help in this regard. I inspected the structure of the summary with str(summary(lmer.object)) but unfortunately I am new to R and didn't find this very enlightening. Here is an example of the dataframe I would like to produce: FactorEstimate Std. Err t FixedFac1 0.0910.140 0.651 FixedFac2 0.0540.012 4.461 FixedFac3 -0.0780.021-3.664 Cheers, Barry. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Evidence Theory in R
Hello, well, I searched list-archive, cran and the references, but found nothing. Thus: Does anybody around here know anything about Dempster-Shafer Theory, Evidence Theory or Hints in R? Has anybody stumbled about a package that I overlooked or implemented something in this area? I really would like to not implement a hint-model a second time. My apologies if I missed something obvious, but I would be glad if someone could give me a hint. Regards, Lars -- Lars Fischertel: +49 (0)6151 16-2889 Technische Universität Darmstadt Fachbereich Informatik/ FG Sicherheit in der Informationstechnik PGP FPR: A197 CBE1 91FC 0CE3 A71D 77F2 1094 CB6E CEE3 7111 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to remove NAs and lme function
I am working on a project to find a model for the concentration of dissolved oxygen in the river clyde. Ive fitted a linear mixed model as lme(DOW~Temperature+Salinity+Year+factor(Station)*factor(Depth), random~1|id), where id is an identifier of the day over 20 years defined as Day*1 + Month*100 + (1900 - Year). Anyway, there are some NAs for the concentration of dissolved oxygen in the water so I know you add in na.action = na.omit and that omits the NAs so there are 9008 observations in the model, but it doesnt do it for the whole data set where there are 10965 including observations with NAs. I would like to plot the residuals from the model against the Salinity, Temperature and Year, but when I try, it seems to want to take the observations of these variables from the full data set and the residuals from the model which of course doesnt work. I have tried using data1 - data[data$DOW != NA,] on the whole data set but it doesnt work. How can I remove the NAs from a data set? -- View this message in context: http://www.nabble.com/How-to-remove-NAs-and-lme-function-tp17510564p17510564.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package functions documentation
Ardia David [EMAIL PROTECTED] writes: Great, thanks a lot! It works properly now. By the way, how can I get rid of the warning message : * checking line endings in C/C++/Fortran sources/headers ... WARNING Found the following sources/headers with CR or CRLF line endings: Should I open an save the C files in another format ? I am using notepad on a Windows XP machine. Change formats yes. With Notepad, I don't think you can. There are hints at http://en.wikipedia.org/wiki/Newline#Conversion_utilities Martin Martin Morgan wrote: Ardia David [EMAIL PROTECTED] writes: Dear all, I am currently creating a package and writing the documention files for some functions (the main functions). When using R CMD check myPackage, I get a warning message indicating that some of my functions are not documented. How can I get rid of this problem? Of course, I don't want to document every function in the package... Thanks for your help. Adding a NAMESPACE file to your package allows you to separate functions that are visible to the end user (and hence requiring documentation) from those used purely internally to the package (and perhaps documented less formally). See the Writing R Extensions manual for details. This helps to provide the user with access to the relevant documentation, without too much clutter. It also helps you to structure your code to have a 'public' interface that should be well thought out and relatively unchanging, and more flexibly defined and used functions that are 'private' to your package. Martin -- David Ardia H-building, room 11-26 Econometric Institute Erasmus University PO Box 1738 NL 3000 DR Rotterdam The Netherlands Phone: +31 (0)10 408 2269 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- David Ardia H-building, room 11-26 Econometric Institute Erasmus University PO Box 1738 NL 3000 DR Rotterdam The Netherlands Phone: +31 (0)10 408 2269 -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to remove NAs and lme function
Jen, try na.action = na.exclude Andrew On Wed, May 28, 2008 9:26 pm, Jen_mp3 wrote: I am working on a project to find a model for the concentration of dissolved oxygen in the river clyde. Ive fitted a linear mixed model as lme(DOW~Temperature+Salinity+Year+factor(Station)*factor(Depth), random~1|id), where id is an identifier of the day over 20 years defined as Day*1 + Month*100 + (1900 - Year). Anyway, there are some NAs for the concentration of dissolved oxygen in the water so I know you add in na.action = na.omit and that omits the NAs so there are 9008 observations in the model, but it doesnt do it for the whole data set where there are 10965 including observations with NAs. I would like to plot the residuals from the model against the Salinity, Temperature and Year, but when I try, it seems to want to take the observations of these variables from the full data set and the residuals from the model which of course doesnt work. I have tried using data1 - data[data$DOW != NA,] on the whole data set but it doesnt work. How can I remove the NAs from a data set? -- View this message in context: http://www.nabble.com/How-to-remove-NAs-and-lme-function-tp17510564p17510564.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Andrew Robinson Senior Lecturer in Statistics Tel: +61-3-8344-6410 Department of Mathematics and StatisticsFax: +61-3-8344 4599 University of Melbourne, VIC 3010 Australia Email: [EMAIL PROTECTED]Website: http://www.ms.unimelb.edu.au __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to bind lists recursively
Dear Brian and Bill, Here's an interesting contrasting example (taken from this month's Help Desk column in R News, which Bill has already seen), first verifying the relative timings for Brian's example: system.time({ + a - vector(list, 10001) + for(i in 0:1) a[[i+1]] - i + }) user system elapsed 0.030.000.03 system.time({ + a - list() + for(i in 0:1) a[[i+1]] - i + }) user system elapsed 0.470.020.49 system.time({ + matrices - vector(mode=list, length=1000) + for (i in 1:1000) + matrices[[i]] - matrix(rnorm(1), 100, 100) + }) user system elapsed 2.590.002.61 system.time({ + matrices - list() + for (i in 1:1000) + matrices[[i]] - matrix(rnorm(1), 100, 100) + }) user system elapsed 2.610.002.60 Brian will, I'm sure, have the proper explanation, but I suppose that the NULL elements in the initialized list in the first example provide sufficient space for the integers that eventually get stored there, while that's not the case for the matrices in the second example. I believe that the second example is more typical of what one would actually put in a list. Regards, John -- John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Prof Brian Ripley Sent: May-28-08 2:04 AM To: [EMAIL PROTECTED] Cc: r-help@r-project.org Subject: Re: [R] how to bind lists recursively But pre-allocation still helps a - vector(list, 10001) for(i in 0:1) a[[i+1]] - i gives user system elapsed 0.042 0.001 0.057 on my system, against user system elapsed 0.743 0.000 1.907 for Bill's 'best' solution. On Wed, 28 May 2008, [EMAIL PROTECTED] wrote: This is somewhat subtle. Rolf's solution (here corrected to...) a - list() for(i in 0:1) a[[i+1]] - i is the best of the loop solutions (or at least the best I know of). The apparently similar a - list(0) for(i in 1:1) a - c(a, list(i)) will take a lot longer, although the result is the same. For example: system.time({ a - list() for(i in 0:1) a[[i+1]] - i }) user system elapsed 0.590.000.59 system.time({ a - list(0) for(i in 1:1) a - c(a, list(i)) }) user system elapsed 6.870.006.89 That's a factor of about 11 times as long. The best of the lot is a - as.list(0:1) of course, but this has problems with generalisation, (which everyone suspects is going to be needed here...). Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rolf Turner Sent: Wednesday, 28 May 2008 1:02 PM To: Daniel Yang Cc: r-help@r-project.org Subject: Re: [R] how to bind lists recursively On 28/05/2008, at 2:43 PM, Daniel Yang wrote: Dear all, I want to create a list that contains 0,1,2,3, ..., 1 as its elements. I used the following code, which apparently doesn't work very well. a - 0 for(i in 1:1) { a - list(a, i) } The result is not what I wanted. So how to create the bind lists recursively so that the last element would be the newly added one while the previous elements all remain the same? a - list() for(i in 1:1) a[[i]] - i (The word ``bind'' is inappropriate.) cheers, Rolf Turner -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rotated text on a regression line
Christoph, I see two problems: (1) use plot(x,y,pch=16,xlim=c(0,10),asp=1), as the default has the x/y scales different. (2) It looks to me that the expression srt=180/pi*atan(slope) should be srt=180*atan(slope)/pi Regards, Tom Dr. Christoph Scherber wrote: Dear all, I stumbled over a problem recently when trying to use srt with text() on a windows device. What I intended to do was to plot a simple regression line, and to rotate a piece of text such that the text has the same angle as the regression line. However, the text is always plotted in a slightly wrong angle: x=1:10 #create arbitrary x and y values y=x*2-rnorm(1:10) plot(x,y,pch=16,xlim=c(0,10)) #create the graph abline(lm(y~x)) #calculate the y coordinate of the text: yval=predict(lm(y~x),list(x=rep(2,length(x[1] #calculate the slope: slope=as.numeric(lm(y~x)[[1]][2]) text(2,yval,Regression,srt=180/pi*atan(slope),adj=0) What am I doing wrong here? Many thanks in advance for any help! Best wishes Christoph (using R 2.6.1 on Windows XP) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Thomas E Adams National Weather Service Ohio River Forecast Center 1901 South State Route 134 Wilmington, OH 45177 EMAIL: [EMAIL PROTECTED] VOICE: 937-383-0528 FAX:937-383-0033 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make R running faster
Dear Chunhao Tu, There is, coincidentally, a discussion of loops and related issues in the Help Desk column in the current issue of R News (see the newsletter link on CRAN). Regards, John -- John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: May-28-08 4:27 AM To: r-help@r-project.org Subject: [R] How to make R running faster Hi everyone, I run the R loops on window XP and vista. Both are Intel core 2 Duo 2.2 GHz with 2 GB ram and XP is significantly faster than vista. Dose anyone know how speed up R loops in vista? Thank you in advance. Chunhao Tu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need help for building R in Ubuntu 8.04
On Wed, May 28, 2008 at 02:29:10PM +0200, Martin Maechler wrote: EH == Erin Hodgess [EMAIL PROTECTED] on Sun, 25 May 2008 13:27:04 -0500 writes: EH Try: ./configure --with-x=no well.. no! really don't. Seconded. At best this qualified for the 'then do not do it' school of advice to the 'it hurts when I do this'. But it truly missed the underlying issue. See below. If you want to enjoy a Linux system and building from the source, and then maybe learn how that is happening, learning about shell scripts and 'make' and ... then rather do get the extra ubuntu packages needed. Or if you 'just' want to run it, install Ubuntu and learn to take advantage of the work of others. The advice (below) to get the 'xorg-dev' is definitely good advice. I have it on the list of packages I'd always want to install in addition to the basic ubuntu/debian list. But you most probably will find that you need a few more tools / libraries / headers for your ubuntu system such that you can build R with all the bells and whistles possible. There's the Debian (and hence Ubuntu) package 'r-base-dev' which contains 'r-base' (i.e. a *binary* version of R; the one Dirk Eddelbuettel mentioned), but also most of the compilers/libraries/... that you'd want to build R from the sources. Just to be a bit more precise: i) 'apt-get install r-base' will get you r-base-core and all the recommended packages --- use this if you want to _run_ R ii) 'apt-get install r-base-dev' will get all the common header files, as well as r-base-core use this if you _also want to build / install R packages_ incl from CRAN iii) 'apt-get build-dep r-base' will get you _build dependencies_ for R and is probably what Martin wanted here. Last time I did get 'r-base-dev' on a virgin ubuntu system, I vaguely remember that it did not contain *really* all the tools I'd wanted, but almost all. Bug reports are always welcome and a more constructive form of moving things forward than an off-hand comment here :-) Note that I tend not to get the ones filed against Ubuntu so file against Debian please. e.g., you may also want the two packages tcl8.4-dev tk8.4-dev Just curious: what did you need them for ? In case you wanted to build R, see iii) above as a possibly more focussed way to get there. Dirk -- Three out of two people have difficulties with fractions. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rbinom not using probability of success right
I am trying to simulate a series of ones and zeros (1 or 0) and I am using rbinom but realizing that the number of successes expected is not accurate. Any advice out there. This is the example: N-500 status-rbinom(N, 1, prob = 0.15) count-sum(status) 15 percent of 500 should be 75 but what I obtain from the count variable is 77 that gives the probability of success to be 0.154. Not very good. Is there another way beyond using sample and rep together? A Smile costs Nothing But Rewards Everything Happiness is not perfected until it is shared -Jane Porter [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can I do this with R?
A google search for logistic regression with stepwise forward in r returns the following post: https://stat.ethz.ch/pipermail/r-help/2003-December/043645.html Haris Skiadas Department of Mathematics and Computer Science Hanover College On May 28, 2008, at 7:01 AM, Maria wrote: Hello, I am just about to install R and was wondering about a few things. I have only worked in Matlab because I wanted to do a logistic regression. However Matlab does not do logistic regression with stepwiseforward method. Therefore I thought about testing R. So my question is can I do logistic regression with stepwise forward in R? Thanks /M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function to compute consensus DNA sequence by plurality?
Kim, Is is what you want? tmp - readLines(textConnection( TGCATACACCGACAACATCCTCGACGACTACACCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG AGCATACACCGACAACATCCTCGATGACTACTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG AGCATACACCGACAACATCCTCGATGACTACTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG)) library(seqinr) mat - matrix(sapply(tmp, s2c), nrow = length(tmp), byrow = TRUE) bma - function(nucl){ nucl - tolower(nucl) iupac - sapply(amb(), amb) iupac$u - NULL # no RNA names(iupac)[unlist(lapply(iupac, setequal, nucl))] } res - apply(mat, 2, bma) toupper(c2s(res)) [1] HGCMTACACCRACRAYRTCCTSGAYGACTWCWSCTACTACG Best, Jean -- Jean R. Lobry([EMAIL PROTECTED]) Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I, 43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE allo : +33 472 43 27 56 fax: +33 472 43 13 88 http://pbil.univ-lyon1.fr/members/lobry/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to bind lists recursively
On Wed, 28 May 2008, John Fox wrote: Dear Brian and Bill, Here's an interesting contrasting example (taken from this month's Help Desk column in R News, which Bill has already seen), first verifying the relative timings for Brian's example: system.time({ + a - vector(list, 10001) + for(i in 0:1) a[[i+1]] - i + }) user system elapsed 0.030.000.03 system.time({ + a - list() + for(i in 0:1) a[[i+1]] - i + }) user system elapsed 0.470.020.49 system.time({ + matrices - vector(mode=list, length=1000) + for (i in 1:1000) + matrices[[i]] - matrix(rnorm(1), 100, 100) + }) user system elapsed 2.590.002.61 system.time({ + matrices - list() + for (i in 1:1000) + matrices[[i]] - matrix(rnorm(1), 100, 100) + }) user system elapsed 2.610.002.60 Brian will, I'm sure, have the proper explanation, but I suppose that the NULL elements in the initialized list in the first example provide sufficient space for the integers that eventually get stored there, while that's not the case for the matrices in the second example. I believe that the second example is more typical of what one would actually put in a list. Your explanation is incorrect as a list is just a header plus a set of SEXP pointers, and is never used to store data directly. The main point is that with only 1000 elements, the difference in re-allocating the list on your system is only going to be about (0.49 - 0.04)/10 ~ 0.04, too small to time accurately (especially under Windows). So finding 0.02 is not really contrasting. The point that pre-allocation may only help when list allocation is taking an appreciable part of the time is a sound one. Note that because of GC tuning such things are very variable. Running repeatedly either version on my Linux box gets timings from about 3.8 down to 2.5 seconds -- and after settling down to 2.5-2.7s, the 13th run took 3.2s. Regards, John -- John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Prof Brian Ripley Sent: May-28-08 2:04 AM To: [EMAIL PROTECTED] Cc: r-help@r-project.org Subject: Re: [R] how to bind lists recursively But pre-allocation still helps a - vector(list, 10001) for(i in 0:1) a[[i+1]] - i gives user system elapsed 0.042 0.001 0.057 on my system, against user system elapsed 0.743 0.000 1.907 for Bill's 'best' solution. On Wed, 28 May 2008, [EMAIL PROTECTED] wrote: This is somewhat subtle. Rolf's solution (here corrected to...) a - list() for(i in 0:1) a[[i+1]] - i is the best of the loop solutions (or at least the best I know of). The apparently similar a - list(0) for(i in 1:1) a - c(a, list(i)) will take a lot longer, although the result is the same. For example: system.time({ a - list() for(i in 0:1) a[[i+1]] - i }) user system elapsed 0.590.000.59 system.time({ a - list(0) for(i in 1:1) a - c(a, list(i)) }) user system elapsed 6.870.006.89 That's a factor of about 11 times as long. The best of the lot is a - as.list(0:1) of course, but this has problems with generalisation, (which everyone suspects is going to be needed here...). Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rolf Turner Sent: Wednesday, 28 May 2008 1:02 PM To: Daniel Yang Cc: r-help@r-project.org Subject: Re: [R] how to bind lists recursively On 28/05/2008, at 2:43 PM, Daniel Yang wrote: Dear all, I want to create a list that contains 0,1,2,3, ..., 1 as its elements. I used the following code, which apparently doesn't work very well. a - 0 for(i in 1:1) { a - list(a, i) } The result is not what I wanted. So how to create the bind lists recursively so that the last element would be the newly added one while the previous elements all remain the same? a - list() for(i in 1:1) a[[i]] - i (The word ``bind'' is inappropriate.) cheers, Rolf Turner -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861
Re: [R] rbinom not using probability of success right
You asked for each of 500 to be included with probability 0.15, not for 15% of 500. If you want the latter, use sample, e.g. sample(c(rep(1,75), rep(0,425))) And to see if your 77 is reasonable for binomial sampling: binom.test(77, 500, 0.15) Exact binomial test data: 77 and 500 number of successes = 77, number of trials = 500, p-value = 0.8022 alternative hypothesis: true probability of success is not equal to 0.15 95 percent confidence interval: 0.1234860 0.1886725 sample estimates: probability of success 0.154 so it certainly is. On Wed, 28 May 2008, Philip Twumasi-Ankrah wrote: I am trying to simulate a series of ones and zeros (1 or 0) and I am using rbinom but realizing that the number of successes expected is not accurate. Any advice out there. This is the example: N-500 status-rbinom(N, 1, prob = 0.15) count-sum(status) 15 percent of 500 should be 75 but what I obtain from the count variable is 77 that gives the probability of success to be 0.154. Not very good. Is there another way beyond using sample and rep together? [...] -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbinom not using probability of success right
On 28-May-08 12:53:26, Philip Twumasi-Ankrah wrote: I am trying to simulate a series of ones and zeros (1 or 0) and I am using rbinom but realizing that the number of successes expected is not accurate. Any advice out there. This is the example: N-500 status-rbinom(N, 1, prob = 0.15) count-sum(status) 15 percent of 500 should be 75 but what I obtain from the count variable is 77 that gives the probability of success to be 0.154. Not very good. The difference (77 - 75 =2) is well within the likely sampling variation when 500 values are sampled independently with P(1)=0.15: The standard deviation of the resulting number of 1s is sqrt(500*0.15*0.85) = 7.98, so the difference of 2 is only 1/4 of a standard deviation, hence very likely to be equalled or exceeded. Your chance of getting exactly 75 by this method is quite small: dbinom(75,500,0.15) [1] 0.04990852 and your chance of being 2 or more off your target is 1 - sum(dbinom((74:76),500,0.15)) [1] 0.8510483 Is there another way beyond using sample and rep together? It looks as though you are seeking to obtain exactly 75 1s, randomly situated, the rest being 0s, so in effect you do need to do something on the lines of sample and rep. Hence, something like status - rep(0,500) status[sample((1:500),75,replace=FALSE)] - 1 Hoping this helps, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 28-May-08 Time: 14:19:24 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbinom not using probability of success right
Do it again. What did you get this time? Then do it another time. Do you see what is happening? Charles Annis, P.E. [EMAIL PROTECTED] phone: 561-352-9699 eFax: 614-455-3265 http://www.StatisticalEngineering.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Philip Twumasi-Ankrah Sent: Wednesday, May 28, 2008 8:53 AM To: r-help@r-project.org Subject: [R] rbinom not using probability of success right I am trying to simulate a series of ones and zeros (1 or 0) and I am using rbinom but realizing that the number of successes expected is not accurate. Any advice out there. This is the example: N-500 status-rbinom(N, 1, prob = 0.15) count-sum(status) 15 percent of 500 should be 75 but what I obtain from the count variable is 77 that gives the probability of success to be 0.154. Not very good. Is there another way beyond using sample and rep together? A Smile costs Nothing But Rewards Everything Happiness is not perfected until it is shared -Jane Porter [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make R running faster
I remember reading the colSum and colMean were better, when you need sums and means On Wed, May 28, 2008 at 8:26 AM, Esmail Bonakdarian [EMAIL PROTECTED] wrote: Neil Shephard wrote: Loops are not massively efficient within R. Look into using the apply() family of functions (eapply()/lapply()/mapply/rapply()/tapply()). Didn't someone post not too long ago that apply is internally represented as a for-loop? Or am I not remembering this correctly? The reason for apply was that it was more concise but not necessarily more efficient. Esmail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbinom : Does randomness preclude precision?
Teds reply is a bit comforting and as indicated in my post, I am resorting to using sample but as an academic issue, does randomness preclude precision? Randomness should be in the sequence of zeros and ones and how they are simulated at each iteration of the process but not in the eventual nature of the distribution. I mean if I simulated a Normal (0, 1) and got a Normal(1.5, 2) these would be very different distributions. It is the same with simulating a Binomial(1, p=0.15) and getting Binomial(1, 0.154) [EMAIL PROTECTED] wrote: On 28-May-08 12:53:26, Philip Twumasi-Ankrah wrote: I am trying to simulate a series of ones and zeros (1 or 0) and I am using rbinom but realizing that the number of successes expected is not accurate. Any advice out there. This is the example: N-500 status-rbinom(N, 1, prob = 0.15) count-sum(status) 15 percent of 500 should be 75 but what I obtain from the count variable is 77 that gives the probability of success to be 0.154. Not very good. The difference (77 - 75 =2) is well within the likely sampling variation when 500 values are sampled independently with P(1)=0.15: The standard deviation of the resulting number of 1s is sqrt(500*0.15*0.85) = 7.98, so the difference of 2 is only 1/4 of a standard deviation, hence very likely to be equalled or exceeded. Your chance of getting exactly 75 by this method is quite small: dbinom(75,500,0.15) [1] 0.04990852 and your chance of being 2 or more off your target is 1 - sum(dbinom((74:76),500,0.15)) [1] 0.8510483 Is there another way beyond using sample and rep together? It looks as though you are seeking to obtain exactly 75 1s, randomly situated, the rest being 0s, so in effect you do need to do something on the lines of sample and rep. Hence, something like status - rep(0,500) status[sample((1:500),75,replace=FALSE)] - 1 Hoping this helps, Ted. E-Mail: (Ted Harding) Fax-to-email: +44 (0)870 094 0861 Date: 28-May-08 Time: 14:19:24 -- XFMail -- A Smile costs Nothing But Rewards Everything Happiness is not perfected until it is shared -Jane Porter [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make R running faster
Neil Shephard wrote: Loops are not massively efficient within R. Look into using the apply() family of functions (eapply()/lapply()/mapply/rapply()/tapply()). Didn't someone post not too long ago that apply is internally represented as a for-loop? Or am I not remembering this correctly? The reason for apply was that it was more concise but not necessarily more efficient. Esmail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Computing P-Value
Dear Ben, Given a set of words ('foo', 'bar', 'bar', 'bar', quux . foo) this can be in 10.000 items. I would like to compute the significance of the word occurrence with P-Value. Is there a simple way to do it? - GV On Wed, May 28, 2008 at 11:46 PM, Ben Bolker [EMAIL PROTECTED] wrote: Edward Wijaya gundalav at gmail.com writes: Hi, Is there an R function or package that computes p-value? -GV Many, but this is far too vague a question for us to answer usefully. Perhaps if you tell us specifically what you want to do we can help. (Please make sure to read the posting guide, while you're at it ...) Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gundala Viswanath __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
Philip Twumasi-Ankrah nana_kwadwo_derkyi at yahoo.com writes: Teds reply is a bit comforting and as indicated in my post, I am resorting to using sample but as an academic issue, does randomness preclude precision? Randomness should be in the sequence of zeros and ones and how they are simulated at each iteration of the process but not in the eventual nature of the distribution. I mean if I simulated a Normal (0, 1) and got a Normal(1.5, 2) these would be very different distributions. It is the same with simulating a Binomial(1, p=0.15) and getting Binomial(1, 0.154) It's impossible to have a sampler that is (1) Binomial(1,p) on each draw and (2) independent for each draw and (3) has exactly Np successes in N draws. For example, suppose you had drawn 99 Binomial(1,p=0.15) samples and had got(ten) 14 successes so far ... your last draw would be constrained to be 1 if you wanted property #3 to hold. So I guess the answer to your question is yes (except in the limit of infinitely large samples). Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbinom : Does randomness preclude precision?
What do you mean by ... *eventual* nature of the distribution? If you simulated 100 samples, would you expect to see 1.5 successes? Or 1? Or 2? How many, in your thinking, is eventual? Charles Annis, P.E. [EMAIL PROTECTED] phone: 561-352-9699 eFax: 614-455-3265 http://www.StatisticalEngineering.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Philip Twumasi-Ankrah Sent: Wednesday, May 28, 2008 9:52 AM To: [EMAIL PROTECTED] Cc: r-help@r-project.org Subject: Re: [R] rbinom : Does randomness preclude precision? Teds reply is a bit comforting and as indicated in my post, I am resorting to using sample but as an academic issue, does randomness preclude precision? Randomness should be in the sequence of zeros and ones and how they are simulated at each iteration of the process but not in the eventual nature of the distribution. I mean if I simulated a Normal (0, 1) and got a Normal(1.5, 2) these would be very different distributions. It is the same with simulating a Binomial(1, p=0.15) and getting Binomial(1, 0.154) [EMAIL PROTECTED] wrote: On 28-May-08 12:53:26, Philip Twumasi-Ankrah wrote: I am trying to simulate a series of ones and zeros (1 or 0) and I am using rbinom but realizing that the number of successes expected is not accurate. Any advice out there. This is the example: N-500 status-rbinom(N, 1, prob = 0.15) count-sum(status) 15 percent of 500 should be 75 but what I obtain from the count variable is 77 that gives the probability of success to be 0.154. Not very good. The difference (77 - 75 =2) is well within the likely sampling variation when 500 values are sampled independently with P(1)=0.15: The standard deviation of the resulting number of 1s is sqrt(500*0.15*0.85) = 7.98, so the difference of 2 is only 1/4 of a standard deviation, hence very likely to be equalled or exceeded. Your chance of getting exactly 75 by this method is quite small: dbinom(75,500,0.15) [1] 0.04990852 and your chance of being 2 or more off your target is 1 - sum(dbinom((74:76),500,0.15)) [1] 0.8510483 Is there another way beyond using sample and rep together? It looks as though you are seeking to obtain exactly 75 1s, randomly situated, the rest being 0s, so in effect you do need to do something on the lines of sample and rep. Hence, something like status - rep(0,500) status[sample((1:500),75,replace=FALSE)] - 1 Hoping this helps, Ted. E-Mail: (Ted Harding) Fax-to-email: +44 (0)870 094 0861 Date: 28-May-08 Time: 14:19:24 -- XFMail -- A Smile costs Nothing But Rewards Everything Happiness is not perfected until it is shared -Jane Porter [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouped weighted.mean
Dear all -- I want to compute weighted.mean() for grouped rows. Data frame extract is just below. For each Key, I want the mean of IAC weighted by Wt. DP0[1:20,] Key IACWt 2 C3-PD030020050.PD030020050.3.12.3.0 0.765 0.859 3 C3-PD030020050.PD030020050.3.12.3.0 0.764 0.8449651 4 C3-PD030020050.PD030020050.3.12.3.0 0.760 0.8024975 5 C3-PD030020050.PD030020050.3.12.3.0 0.753 0.7326575 6 C3-PD030020050.PD030020050.3.12.3.0 0.743 0.6381150 7 C3-PD030020050.PD030020050.3.12.3.0 0.730 0.5187296 8 C3-PD030020050.PD030020050.3.12.3.0 0.714 0.378 65 C3-PD0.PD0.3.12.3.0 0.770 0.859 66 C3-PD0.PD0.3.12.3.0 0.770 0.8449651 67 C3-PD0.PD0.3.12.3.0 0.771 0.8024975 68 C3-PD0.PD0.3.12.3.0 0.772 0.7326575 69 C3-PD0.PD0.3.12.3.0 0.774 0.6381150 70 C3-PD0.PD0.3.12.3.0 0.777 0.5187296 71 C3-PD0.PD0.3.12.3.0 0.780 0.378 128 C3-PD2.PD2.3.12.3.0 0.685 0.859 129 C3-PD2.PD2.3.12.3.0 0.685 0.8449651 130 C3-PD2.PD2.3.12.3.0 0.684 0.8024975 131 C3-PD2.PD2.3.12.3.0 0.682 0.7326575 132 C3-PD2.PD2.3.12.3.0 0.679 0.6381150 133 C3-PD2.PD2.3.12.3.0 0.675 0.5187296 mean -- works OK MN = by( DP0$IAC, DP0$Key, mean) weighted.mean -- no go due to length of Wt WMN = by( DP0$IAC, DP0$Key, weighted.mean, DP0$Wt) Error in FUN(data[x, ], ...) : 'x' and 'w' must have the same length So I tried tapply() -- same result WMN = tapply( DP0, DP0$Key, function(x) weighted.mean( x$IAC, x$Wt)) Error in tapply(DP0, DP0$Key, function(x) weighted.mean(x$IAC, x$Wt)) : arguments must have same length How does one pass grouped arguments to processing functions such as weighted.mean() ? TIA, Chip Barnaby - Chip Barnaby [EMAIL PROTECTED] Vice President of Research Wrightsoft Corp. 781-862-8719 x118 voice 131 Hartwell Ave 781-861-2058 fax Lexington, MA 02421 www.wrightsoft.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] individual analysis
Dear R-community, I'm looking forward to analyse the results statistically for a new medical diagnostic tool. The aim is to know whether a given subject is different from the mean or distribution of a population of controls for one continuous variable (a neuro-imaging results which can be quantified). Does anyone knwows which test is appropriate for such an individual analysis ? Which package is necessary to perform the analysis ? Thanks a lot for your help, Julien __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to remove NAs and lme function
Thanks, that worked! Andrew Robinson-6 wrote: Jen, try na.action = na.exclude Andrew On Wed, May 28, 2008 9:26 pm, Jen_mp3 wrote: I am working on a project to find a model for the concentration of dissolved oxygen in the river clyde. Ive fitted a linear mixed model as lme(DOW~Temperature+Salinity+Year+factor(Station)*factor(Depth), random~1|id), where id is an identifier of the day over 20 years defined as Day*1 + Month*100 + (1900 - Year). Anyway, there are some NAs for the concentration of dissolved oxygen in the water so I know you add in na.action = na.omit and that omits the NAs so there are 9008 observations in the model, but it doesnt do it for the whole data set where there are 10965 including observations with NAs. I would like to plot the residuals from the model against the Salinity, Temperature and Year, but when I try, it seems to want to take the observations of these variables from the full data set and the residuals from the model which of course doesnt work. I have tried using data1 - data[data$DOW != NA,] on the whole data set but it doesnt work. How can I remove the NAs from a data set? -- View this message in context: http://www.nabble.com/How-to-remove-NAs-and-lme-function-tp17510564p17510564.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Andrew Robinson Senior Lecturer in Statistics Tel: +61-3-8344-6410 Department of Mathematics and StatisticsFax: +61-3-8344 4599 University of Melbourne, VIC 3010 Australia Email: [EMAIL PROTECTED]Website: http://www.ms.unimelb.edu.au __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/How-to-remove-NAs-and-lme-function-tp17510564p17514479.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fixing the coefficient of a regressor in formula
Dear R users, I want to estimate a Cox PH model with time-dependent covariates so I am using a counting process format with the following formula: Surv(data$start, data$stop, data$event.time) ~ cluster(data$id) + G1 + G2 + G3 + G4 + G5 +G6 Gs represent a B-spline basis functions so they sum to 1 and I can't estimate the model as is without getting the last coefficient to be NA, which makes sense given the perfect collinearity. without getting in lengthy details about my code, let me just say that to avoid the colinearity problem,. I do not want to omit G1 from the regression. Instead, I want to fix the regression coefficient of one of the regressors, G1, to 1. I have read the R manual section on formulae but I have not found how to do fix a regression coefficient. Conceptually speaking it seems to me that it should be simple, and I am sure that someone explained it somewhere, but I did not find the proper keywords to find it! So, does someone know how to fix the coefficient of a regressor in the formula for Cox model so that the coefficient is not estimated but still taken into account? Thank you in advance, MP __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbinom : Does randomness preclude precision?
I think I see the rub: You would like to see the distribution of a sample be identical to the distribution from which it was sampled. But if it is random then that can happen only in the long run, not on every sample. That is why samples from a normal density are *not* themselves normal - they're t. When the sample size is large enough the differences between a random sample's density and its parent density become vanishingly small. Thus the differences you observe from repeated random samples from the binomial. Repeated sampling produces slightly different numbers of successes. How could it be otherwise? Charles Annis, P.E. [EMAIL PROTECTED] phone: 561-352-9699 eFax: 614-455-3265 http://www.StatisticalEngineering.com From: Philip Twumasi-Ankrah [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 28, 2008 10:36 AM To: [EMAIL PROTECTED] Subject: RE: [R] rbinom : Does randomness preclude precision? Charles, When you simulate data from a distribution, what you effect are doing is generating a sequence of values that would correspond to that distribution. So you can generate 1000 values from a normal distribution and expect that when you check on the distribution of your sample (what you do with your qqnorm or Q-Q plot), it should be a close fit with the theoretical distribution with the assigned parameter values. It will be difficult to explain why a simulated data may be different from the distribution it is was generated from . I think you can not blame it on randomness. I hope you understand what I am trying to determine. Charles Annis, P.E. [EMAIL PROTECTED] wrote: What do you mean by ... *eventual* nature of the distribution? If you simulated 100 samples, would you expect to see 1.5 successes? Or 1? Or 2? How many, in your thinking, is eventual? Charles Annis, P.E. [EMAIL PROTECTED] phone: 561-352-9699 eFax: 614-455-3265 http://www.StatisticalEngineering.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Philip Twumasi-Ankrah Sent: Wednesday, May 28, 2008 9:52 AM To: [EMAIL PROTECTED] Cc: r-help@r-project.org Subject: Re: [R] rbinom : Does randomness preclude precision? Teds reply is a bit comforting and as indicated in my post, I am resorting to using sample but as an academic issue, does randomness preclude precision? Randomness should be in the sequence of zeros and ones and how they are simulated at each iteration of the process but not in the eventual nature of the distribution. I mean if I simulated a Normal (0, 1) and got a Normal(1.5, 2) these would be very different distributions. It is the same with simulating a Binomial(1, p=0.15) and getting Binomial(1, 0.154) [EMAIL PROTECTED] wrote: On 28-May-08 12:53:26, Philip Twumasi-Ankrah wrote: I am trying to simulate a series of ones and zeros (1 or 0) and I am using rbinom but realizing that the number of successes expected is not accurate. Any advice out there. This is the example: N-500 status-rbinom(N, 1, prob = 0.15) count-sum(status) 15 percent of 500 should be 75 but what I obtain from the count variable is 77 that gives the probability of success to be 0.154. Not very good. The difference (77 - 75 =2) is well within the likely sampling variation when 500 values are sampled independently with P(1)=0.15: The standard deviation of the resulting number of 1s is sqrt(500*0.15*0.85) = 7.98, so the difference of 2 is only 1/4 of a standard deviation, hence very likely to be equalled or exceeded. Your chance of getting exactly 75 by this method is quite small: dbinom(75,500,0.15) [1] 0.04990852 and your chance of being 2 or more off your target is 1 - sum(dbinom((74:76),500,0.15)) [1] 0.8510483 Is there another way beyond using sample and rep together? It looks as though you are seeking to obtain exactly 75 1s, randomly situated, the rest being 0s, so in effect you do need to do something on the lines of sample and rep. Hence, something like status - rep(0,500) status[sample((1:500),75,replace=FALSE)] - 1 Hoping this helps, Ted. E-Mail: (Ted Harding) Fax-to-email: +44 (0)870 094 0861 Date: 28-May-08 Time: 14:19:24 -- XFMail -- A Smile costs Nothing But Rewards Everything Happiness is not perfected until it is shared -Jane Porter [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. A Smile costs Nothing But Rewards Everything Happiness is not perfected until it is shared -Jane Porter __ R-help@r-project.org mailing
Re: [R] can I do this with R?
Hi Maria, But why do you want to use forwards or backwards methods? These all are 'backward' methods of modeling. Try using AIC or BIC. BIC is much better than AIC. And, you do not have to believe me or any one else on this. Just make a small data set with a few variables with known relationship amongst them. With this simulated data set, use all your modeling methods: backwards, forwards, AIC, BIC etc and then see which one gives you a answer closest to the truth. The beauty of using a simulated dataset is that, you 'know' the truth, as you are the 'creater' of it! smita --- Charilaos Skiadas [EMAIL PROTECTED] wrote: A google search for logistic regression with stepwise forward in r returns the following post: https://stat.ethz.ch/pipermail/r-help/2003-December/043645.html Haris Skiadas Department of Mathematics and Computer Science Hanover College On May 28, 2008, at 7:01 AM, Maria wrote: Hello, I am just about to install R and was wondering about a few things. I have only worked in Matlab because I wanted to do a logistic regression. However Matlab does not do logistic regression with stepwiseforward method. Therefore I thought about testing R. So my question is can I do logistic regression with stepwise forward in R? Thanks /M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] odds ratio's and function
Hallo, i tried writing a function to extract all the odds ratio's from a ftable: (+ p.adjust needs to build in) So i tried the following: ORCalcul - function(m) { or-matrix(1:(length(m[,1])-1)*(length(m[1,])-1)*5,length(m[,1])-1,length(m[1,])-1) for(i in 1:length(m[,1])-1) { for(j in 1:length(m[1,])-1) { or[i,j,1] - attr(m,row.vars)[[1]][i] or[i,j,2] - attr(m,col.vars)[[1]][j] or[i,j,3] - attr(m,row.vars)[[1]][i+1] or[i,j,4] - attr(m,col.vars)[[1]][j+1] or[i,j,5] -(m[i,j]*m[i+1,j1])/(m[i1,j]*m[i,j1]) } } ORCalcul - c(or) } ## which doenst work, because of the way R works with matrices, ## i am justed to other syntax, using the indices of a matrix ## so first question: i don't think there is way in R to work with matrix m = new matrix(10,5,45) # m being a three dim matrix eg? # so i tried, and tried, finally ORCalcul - function(m) { t for(i in 1:length(m[,1])-1) { for(j in 1:length(m[1,])-1) { row1 - attr(m,row.vars)[[1]][i] col1 - attr(m,col.vars)[[1]][j] row2 - attr(m,row.vars)[[1]][i+1] col2 - attr(m,col.vars)[[1]][j+1] or - (m[i,j]*m[i+1,j+1])/(m[i+1,j]*m[i,j+1]) ORCalculij - c(row1, col1, row2, col2, or) append(t,ORCalculij) } } ORCalcul - t ORCalcul } # which unfornately only gives as output:, so doesnt work either [,1] [1,] NA # the only variant i wrote that gave some output, # but logically, only the last OR, is: ORCalcul - function(m) { for(i in 1:length(m[,1])-1) { for(j in 1:length(m[1,])-1) { row1 - attr(m,row.vars)[[1]][i] col1 - attr(m,col.vars)[[1]][j] row2 - attr(m,row.vars)[[1]][i+1] col2 - attr(m,col.vars)[[1]][j+1] or - (m[i,j]*m[i+1,j+1])/(m[i+1,j]*m[i,j+1]) ORCalcul - c(row1, col1, row2, col2, or) # putting an append here, just gives me the code # back.. } } ORCalcul } # what should i do? Suggestions? (for the algoritmic logic, i need two more nested loops, that is not the problem) mvg, Wim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouped weighted.mean
Hello - Chip Barnaby wrote: Dear all -- I want to compute weighted.mean() for grouped rows. Data frame extract is just below. For each Key, I want the mean of IAC weighted by Wt. DP0[1:20,] Key IACWt 2 C3-PD030020050.PD030020050.3.12.3.0 0.765 0.859 3 C3-PD030020050.PD030020050.3.12.3.0 0.764 0.8449651 4 C3-PD030020050.PD030020050.3.12.3.0 0.760 0.8024975 5 C3-PD030020050.PD030020050.3.12.3.0 0.753 0.7326575 6 C3-PD030020050.PD030020050.3.12.3.0 0.743 0.6381150 7 C3-PD030020050.PD030020050.3.12.3.0 0.730 0.5187296 8 C3-PD030020050.PD030020050.3.12.3.0 0.714 0.378 65 C3-PD0.PD0.3.12.3.0 0.770 0.859 66 C3-PD0.PD0.3.12.3.0 0.770 0.8449651 67 C3-PD0.PD0.3.12.3.0 0.771 0.8024975 68 C3-PD0.PD0.3.12.3.0 0.772 0.7326575 69 C3-PD0.PD0.3.12.3.0 0.774 0.6381150 70 C3-PD0.PD0.3.12.3.0 0.777 0.5187296 71 C3-PD0.PD0.3.12.3.0 0.780 0.378 128 C3-PD2.PD2.3.12.3.0 0.685 0.859 129 C3-PD2.PD2.3.12.3.0 0.685 0.8449651 130 C3-PD2.PD2.3.12.3.0 0.684 0.8024975 131 C3-PD2.PD2.3.12.3.0 0.682 0.7326575 132 C3-PD2.PD2.3.12.3.0 0.679 0.6381150 133 C3-PD2.PD2.3.12.3.0 0.675 0.5187296 mean -- works OK MN = by( DP0$IAC, DP0$Key, mean) weighted.mean -- no go due to length of Wt WMN = by( DP0$IAC, DP0$Key, weighted.mean, DP0$Wt) Error in FUN(data[x, ], ...) : 'x' and 'w' must have the same length Just a little confusion with how 'by' works I think, try test - data.frame(key = rep(c(A, B), each = 100), iac = rnorm(200, 10), wt = runif(200)) test by(test, test$key, function(x) weighted.mean(x$iac, x$wt)) So I tried tapply() -- same result WMN = tapply( DP0, DP0$Key, function(x) weighted.mean( x$IAC, x$Wt)) Error in tapply(DP0, DP0$Key, function(x) weighted.mean(x$IAC, x$Wt)) : arguments must have same length How does one pass grouped arguments to processing functions such as weighted.mean() ? TIA, Chip Barnaby - Chip Barnaby [EMAIL PROTECTED] Vice President of Research Wrightsoft Corp. 781-862-8719 x118 voice 131 Hartwell Ave 781-861-2058 fax Lexington, MA 02421 www.wrightsoft.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Searchreplace string?
Hi there, I would like to know if it is possible to modify a text file with a R function. In fact I would like to know if a function Search Replace exists. My problem is to create config files from a Base file in which I have to modify values of parameters. My Base File: #... #... Param1= V1_1 #... Param2 = V2_1 Param3 = V3_1 #... What I would like for each created file i: #... #... Param1= V1_i #... Param2 = V2_i Param3 = V3_i #... For the moment my solution is to read each line of the config file, modify if needed the line and recopy it in the new file. But the problem is that my file is about 500 lines and I would like to create a lot of different version of Config_file from my Base file so with this method it takes a long time and i'm sure i can improve it. (For example for 85 differents files I have to wait 10min...) My dream would be a function like this one : Function - function(Base_file, Param_array, Value_array) { file.copy(Base_file, Config_file) ... for (i in length(Param_array) { SearchReplace(Config_file, Param_array[i] = something, Param_array[i] = Vallue_array[i]) } ... } I hope I have been clear enough. If not, don't hesitate to ask me more details! Thank you for your help. Have a good day! Vous aussi bénéficiez d'1 Go de stockage gratuit en ligne avec Voila http://macle.voila.fr __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Computing P-Value
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Gundala Viswanath wrote: | Dear Ben, | | Given a set of words | ('foo', 'bar', 'bar', 'bar', quux . foo) this can be in 10.000 items. | I would like to compute the significance of the word occurrence with P-Value. | | Is there a simple way to do it? | | - GV | ~ Closer, but still not enough information. What is your null hypothesis? Equidistribution? If so, ... dat - sample(c(foo,bar,quux,pridznyskie), ~ replace=TRUE,size=1) tab - table(dat) chisq.test(tab) from ?chisq.test: ~ If 'x' is a matrix with one row or column, or if 'x' is a vector ~ and 'y' is not given, then a _goodness-of-fit test_ is performed ~ ('x' is treated as a one-dimensional contingency table). The ~ entries of 'x' must be non-negative integers. In this case, the ~ hypothesis tested is whether the population probabilities equal ~ those in 'p', or are all equal if 'p' is not given. ~ Note that this won't test the significance of *individual* deviations from equiprobability, just the overall pattern. If you wanted to test individual words you could use binom.test -- but if you tested more than one word, or tested words on the basis of those that appeared to have extreme frequencies, you'd start running into multiple comparisons/ post hoc testing issues. ~ Do you know something about the methods that people usually use in this area? ~ Ben Bolker -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIPXhVc5UpGjwzenMRAsunAJ9to/KGX0ohSrhUC8qTkhIR0CO8OgCfcejV +LpiB16YBG9ExiHd2tD0sOg= =w5FE -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbinom : Does randomness preclude precision?
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Philip Twumasi-Ankrah Sent: Wednesday, May 28, 2008 6:52 AM To: [EMAIL PROTECTED] Cc: r-help@r-project.org Subject: Re: [R] rbinom : Does randomness preclude precision? Teds reply is a bit comforting and as indicated in my post, I am resorting to using sample but as an academic issue, does randomness preclude precision? I would say yes, at least in the manner you seem to be thinking of it. Think about it for a minute. Say you use the following code status - sample(rep(c(0,1),c(425,75))) You will get your 75 ones randomly distributed throughout the vector status. What would you expect the value to be for sum(status[1:100]) I suggest that you will see the same kind of variation in the sum as you would see for sum(rbinom(100, 1, p=.15)) So, you can randomly arrange a certain percentage of ones in a sequence, but you should not expect to see that exact percentage in any subset of the sequence. One other example, if you flip a fair coin 100 times, do you expect to get exactly 50 heads? Hope this is helpful, Dan Daniel Nordlund Bothell, WA USA Randomness should be in the sequence of zeros and ones and how they are simulated at each iteration of the process but not in the eventual nature of the distribution. I mean if I simulated a Normal (0, 1) and got a Normal(1.5, 2) these would be very different distributions. It is the same with simulating a Binomial(1, p=0.15) and getting Binomial(1, 0.154) [EMAIL PROTECTED] wrote: On 28-May-08 12:53:26, Philip Twumasi- Ankrah wrote: I am trying to simulate a series of ones and zeros (1 or 0) and I am using rbinom but realizing that the number of successes expected is not accurate. Any advice out there. This is the example: N-500 status-rbinom(N, 1, prob = 0.15) count-sum(status) 15 percent of 500 should be 75 but what I obtain from the count variable is 77 that gives the probability of success to be 0.154. Not very good. The difference (77 - 75 =2) is well within the likely sampling variation when 500 values are sampled independently with P(1)=0.15: The standard deviation of the resulting number of 1s is sqrt(500*0.15*0.85) = 7.98, so the difference of 2 is only 1/4 of a standard deviation, hence very likely to be equalled or exceeded. Your chance of getting exactly 75 by this method is quite small: dbinom(75,500,0.15) [1] 0.04990852 and your chance of being 2 or more off your target is 1 - sum(dbinom((74:76),500,0.15)) [1] 0.8510483 Is there another way beyond using sample and rep together? It looks as though you are seeking to obtain exactly 75 1s, randomly situated, the rest being 0s, so in effect you do need to do something on the lines of sample and rep. Hence, something like status - rep(0,500) status[sample((1:500),75,replace=FALSE)] - 1 Hoping this helps, Ted. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help on Calculating day differences
Hello R Freaks, I calculate the difference in days between two events with the following litte R expresseion: T1a - strptime(T1,%m/%d/%y %H:%M:%S); T2a - strptime(T2,%m/%d/%y %H:%M:%S); T1b - as.Date(T1a); T2b - as.Date(T2a); days - T2b-T1b; time - T2a - T1a; In the project I would like to calculate only working day. I the a possibility to count on woring days without weekends? Is it maybe also possible to skip in addition the national holiday dates? Thanks a lit for your help Thorsten Mit freundlichen GrüÃen / Best Regards / С наилÑÑÑими пожеланиÑми / üdvözlettel Dr .Th.Mühge, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] superposing barplots having different scales
Hello. I know how to make a bar plot in which a numeric y variable is plotted against some grouping variable X (say, groups A, B, C) when this grouping variable is subdivided into each of two subgroups; so the bars would be: (group A subgroup 1) beside (group A subgroup 2), then (group B subgroup 1) beside (group B subgroup 2), and so on. This is done using the beside=TRUE argument in the barplot() function. However, I want to make a slightly different type of barplot in which the numerical values of the subgroups (subgroups 1 and 2) are measured in very different scales. I want to use different scales to define the numerical y values for each subgroup. This graph would have one scale on the left-hand y-axis and a second scale on the right-hand y-axis. I cannot simply superimpose two bar plots because I have to make sure that the subgroup bars are beside each other. Bill Shipley North American Editor, Annals of Botany Département de biologie Université de Sherbrooke Sherbrooke (Québec) J1K 2R1 Canada (819) 821-8000, poste 62079 (819) 821-8049 FAX http://pages.usherbrooke.ca/jshipley/recherche/ http://pages.usherbrooke.ca/jshipley/recherche/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make R running faster
At 10:25 AM 5/28/2008, Esmail Bonakdarian wrote: Erin Hodgess wrote: I remember reading the colSum and colMean were better, when you need sums and means Well .. I'm waiting for the experts to jump in and give us the straight story on this :-) All of the algorithms are represented internally by sequential program logic using C or Fortran, for example. So the issue isn't the algorithm itself. Instead, it's where the algorithm is implemented. However, R is an interpreter, not a compiler. This means that it reads each line of R code one character at a time to develop an understanding of what is desired done, and to check for errors in syntax and data classes. Interpreters are very slow compared to compiled code, where the lines have been pre-interpreted and already converted to machine code with error checking resolved. For example a simple for loop iteration might take only 0.1 microsecond in a compiled program, but 20-100 microseconds in an interpreted program. This overhead of parsing each line can be bounded by function calls inside each line. If the compiled function executes on a large number of cases in one call, then the 50 microsecond overhead per call is diluted out. R is a parallel processing language. If you use vectors and arrays and the built-in (i.e., compiled) function calls, you get maximum use of the compiled programs and minimum use of the interpreted program. This is why functions such as colMeans() or apply() are faster than writing direct loops in R. You can speed things up by 200-1000x for large arrays. Interpreted languages are very convenient to use, as they do instant error checking and are very interactive. No overhead of learning and using compilers and linkers. But they are very slow on complex calculations. This is why the array processing is stuffed into compiled functions. The best of both worlds then. Interpreted languages are Java, R, MatLab, Gauss and others. Compiled languages are C and Fortran. Some, like variants of BASIC, can be interpreted, line-compiled or compiled, depending upon implementation. Some compiled languages (such as Fortran), can allow parallel processing via multiprocessing on multiple CPUs, which speeds things up even more. Compiled languages also typically optimize code for the target machine, which can speed things up a factor of 2 or so. So the general rule for R is: If you are annoyed at processing time, alter your program to maximize calculations within compiled functions (i.e., vectorize your program to process an entire array at one time) and minimize the number of lines of R. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on Calculating day differences
See ?is.holiday in chron. You need to supply a .Holidays vector and then do this: library(chron) d1 - Sys.Date() d2 - Sys.Date() + 100 sq - sq(d1, d2, by = day) sum(!is.holiday(sq) !is.weekend(sq)) # endpoints included The fCalendar package also has functionality in this area. On Wed, May 28, 2008 at 9:30 AM, Thorsten Muehge [EMAIL PROTECTED] wrote: Hello R Freaks, I calculate the difference in days between two events with the following litte R expresseion: T1a - strptime(T1,%m/%d/%y %H:%M:%S); T2a - strptime(T2,%m/%d/%y %H:%M:%S); T1b - as.Date(T1a); T2b - as.Date(T2a); days - T2b-T1b; time - T2a - T1a; In the project I would like to calculate only working day. I the a possibility to count on woring days without weekends? Is it maybe also possible to skip in addition the national holiday dates? Thanks a lit for your help Thorsten Mit freundlichen Grüßen / Best Regards / С наилучшими пожеланиями / üdvözlettel Dr .Th.Mühge, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Evidence Theory in R
Lars Fischer lars at sec.informatik.tu-darmstadt.de writes: Hello, well, I searched list-archive, cran and the references, but found nothing. Thus: Does anybody around here know anything about Dempster-Shafer Theory, Evidence Theory or Hints in R? Has anybody stumbled about a package that I overlooked or implemented something in this area? I really would like to not implement a hint-model a second time. Dempster-Shafer Theory was (almost) a hype 15 or more years ago when it was one of several reasoning systems. It has been heavily discussed and criticized, and it was considered by many as statistically unsound. This might be the reason why it has not been realized in an R package. As a scheme for reasoning under uncertainty Dempster-Shafer Theory has been replaced by Bayesian Networks and their learning capabilities in many modern knowledge-based systems. Fortunately, there are several packages in R that implement Bayesian statistics and networks, including learning facilities. You will find references in the 'Bayesian' and 'Machine Learning' task views. Regards, Hans Werner My apologies if I missed something obvious, but I would be glad if someone could give me a hint. Regards, Lars __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm() output with quantiative predictors not the same as SAS
alicia.senauer at yale.edu writes: I am trying to use R lm() with quantitative and qualitative predictors, but am getting different results than those that I get in SAS. In the R ANOVA table documentation I see that Type-II tests corresponds to the tests produced by SAS for analysis-of-variance models, where all of the predictors are factors, but not more generally (i.e., when there are quantitative predictors). Is this the problem? Is there a way around this so that the output matches the results that I am getting in SAS? Is there a better/more appropriate way to handle quantitative predictors? First of all, it looks you're using Anova, in the car package -- this is an important piece of information. Second, you're presumably (you didn't tell us exactly what commands you ran -- this is also important) using the default type=II in Anova, vs. SAS type III sums of squares (as it says in your SAS output), so I wouldn't expect these to agree. Third, it's highly questionable whether type-III SS are appropriate here, in the presence of an interaction (Treat*Plant) -- you have violated marginality restrictions -- see http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf for example. While I understand the desire to match the output from SAS, a better set of questions is why do these two disagree? What are they doing differently? Which one comes closer to answering the questions I am really interested in? Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] superposing barplots having different scales
Bill Shipley bill.shipley at usherbrooke.ca writes: Hello. I know how to make a bar plot in which a numeric y variable is plotted against some grouping variable X (say, groups A, B, C) when this grouping variable is subdivided into each of two subgroups; so the bars would be: (group A subgroup 1) beside (group A subgroup 2), then (group B subgroup 1) beside (group B subgroup 2), and so on. This is done using the beside=TRUE argument in the barplot() function. However, I want to make a slightly different type of barplot in which the numerical values of the subgroups (subgroups 1 and 2) are measured in very different scales. I want to use different scales to define the numerical y values for each subgroup. This graph would have one scale on the left-hand y-axis and a second scale on the right-hand y-axis. I cannot simply superimpose two bar plots because I have to make sure that the subgroup bars are beside each other. Bill Shipley Bill, I think that all the arguments for why NOT to create scatterplots with two different y-axes apply here -- (see recent R-help threads and the wiki) -- consider a scatterplot, or some other way of presenting the data, instead? Nevertheless -- the easiest way to do this (I think) is to scale subgroup 2, then plot the right-hand axis with labels appropriate for the unscaled data, as follows: dat = matrix(c(1,2,3,1000,2000,3000),ncol=2, dimnames=list(c(A,B,C),c(meas1,meas2))) ## scale groups to have the same max. value scale - max(dat[,meas2]/dat[,meas1]) ## apply scale to second subgroup dat2 = dat dat2[,meas2] - dat2[,meas2]/scale barplot(t(dat2),beside=TRUE) ticks = pretty(dat[,meas2]) ## on original data scale axis(side=4,at=ticks/scale,label=ticks) I haven't bothered with any prettiness like setting aside margins for the right-hand axis, etc etc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot rownames
I did have the problem of not having two continuous variables and this approach circumvents this, allowing me in fact to plot the rownames. Prof Brian Ripley wrote: On Tue, 27 May 2008, T.D.Rudolph wrote: In the following example: x - rnorm(1:100) y - seq(from=-2.5, to=3.5, by=0.5) z - as.matrix(table(cut(x,c(-Inf, y, +Inf ## I wish to transform the values in z j - log(z) ## Yet retain the row names row.names(j)-row.names(z) Hmm. The rownames were retained and row.names() is for data frames, rownames() for matrices. Now, how can I go about creating a scatterplot with row.names(j) on the x-axis and j(m:nrow(j)) on the y-axis? Keep in mind the transformation I am conducting is more complicated and therefore cannot be plotted directly using log(z), which places me in this unique position. You will need to explain what exactly you want plotted. A scatterplot is of two continuous variables and you only have one (and 'j(m:nrow(j))' is invaild R). But here is one possibility to get you started: m - nrow(j) plot(j, xaxt=n, xlab=) axis(1, 1:m, rownames(j), las=2) Tyler -- View this message in context: http://www.nabble.com/plot-rownames-tp17504882p17504882.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/plot-rownames-tp17504882p17518854.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] manipulating multiply imputed data sets
Hi folks, I have five imputed data sets and would like to apply the same recoding routines to each. I could do this sort of thing pretty easily in Stata using MIM, but I've decided to go cold turkey on other stats packages as a incentive for learning more about R. Most of the recoding is for nominal variables, like race, religion, urbanicity, and the like. So, for example, to recode race for my first dataset, inmi1, I would do the following: miset1$white - recode(miset1$RACE, '1=1; else=0; ') miset1$black - recode(miset1$RACE, '2=1; else=0; ') miset1$asian - recode(miset1$RACE, '3=1; else=0; ') miset1$hispanic - recode(miset1$RACE, '4=1; else=0; ') miset1$raceother - recode(miset1$RACE, '5=1; else=0; ') I've tried a number of variations, e.g., on the following using recode (from the car package) with imputationList (from the mitools package), though without success: files.allmisets - list.files(getwd(),pattern=miset*.csv$,full=TRUE) allmis - imputationList(lapply(files.allmisets, read.csv)) allmis - update(allmis, white - recode(RACE, '1=1; else=0; ')) I've also tried some basic loops. I guess I'm also a bit confused as to when R references the original object and when it creates a new one. I suppose I could do this in Python and the use PyR, but I'd really like to learn a bit more about how R syntax. Any help on this specific problem or general advice on manipulating data in multiply imputed datasets in R would be much appreciated. -- Donald Braman http://www.law.gwu.edu/Faculty/profile.aspx?id=10123 http://research.yale.edu/culturalcognition http://ssrn.com/author=286206 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Suitable package for carrying out sigma and beta convergence in panel data
Dear all nbsp; I wish to carry out sigma- and beta-convergence analysis in respect of panel data [wherein current value of one of the variables needs be regressed upon suitably transformed lagged values of another variable(s)]. I am quite new to the R-language and am not very much aware of the availbaility of suitable package(s)/ code in the language. Can any one help me in letting me know of appropriate package/procedural steps to undertake the anlaysis. Kindly let me know as well, format of the input data file for such a package. nbsp; Regards nbsp; Amarjit Singh Sethi Bollywood, fun, friendship, sports and more. You name it, we have it on http://in.promos.yahoo.com/groups/bestofyahoo/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R reference Books
Hi I am still fairly new to R but picking it up quickly. I have some problems manipulating data in tables. I was wondering if anyone new any good resources such as an R manual. Some of the intro pdfs I viewed do not show how...much appreciated. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] manipulating multiply imputed data sets
Dear Donald, I can't guarantee that there aren't other problems, but your call to update() is in error; you need allmis - update(allmis, white = recode(RACE, '1=1; else=0; ')) not allmis - update(allmis, white - recode(RACE, '1=1; else=0; ')) [The last ; in the recode specification is unnecessary, but should do no harm; as well, for a simple recode like this you might prefer ifelse() to recode().] Here's an example: - snip - library(mitools) library(car) data(smi) smi - update(smi, sex=recode(sex, 0 = 'M'; 1 = 'F')) with(smi, table(sex, drkfre)) [[1]] drkfre sex Non drinker not in last wk 3 days last wk =3 days last wk F 207194 134 35 M 282201 105 12 [[2]] drkfre sex Non drinker not in last wk 3 days last wk =3 days last wk F 200200 132 38 M 282195 109 14 . . . attr(,call) with.imputationList(smi, table(sex, drkfre)) --- snip --- I hope this helps, John -- John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Donald Braman Sent: May-28-08 2:21 PM To: r-help@r-project.org Subject: [R] manipulating multiply imputed data sets Hi folks, I have five imputed data sets and would like to apply the same recoding routines to each. I could do this sort of thing pretty easily in Stata using MIM, but I've decided to go cold turkey on other stats packages as a incentive for learning more about R. Most of the recoding is for nominal variables, like race, religion, urbanicity, and the like. So, for example, to recode race for my first dataset, inmi1, I would do the following: miset1$white - recode(miset1$RACE, '1=1; else=0; ') miset1$black - recode(miset1$RACE, '2=1; else=0; ') miset1$asian - recode(miset1$RACE, '3=1; else=0; ') miset1$hispanic - recode(miset1$RACE, '4=1; else=0; ') miset1$raceother - recode(miset1$RACE, '5=1; else=0; ') I've tried a number of variations, e.g., on the following using recode (from the car package) with imputationList (from the mitools package), though without success: files.allmisets - list.files(getwd(),pattern=miset*.csv$,full=TRUE) allmis - imputationList(lapply(files.allmisets, read.csv)) allmis - update(allmis, white - recode(RACE, '1=1; else=0; ')) I've also tried some basic loops. I guess I'm also a bit confused as to when R references the original object and when it creates a new one. I suppose I could do this in Python and the use PyR, but I'd really like to learn a bit more about how R syntax. Any help on this specific problem or general advice on manipulating data in multiply imputed datasets in R would be much appreciated. -- Donald Braman http://www.law.gwu.edu/Faculty/profile.aspx?id=10123 http://research.yale.edu/culturalcognition http://ssrn.com/author=286206 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Gantt chart like graphics
Dear R Community, I have a dataframe like this dat product1 product2 ... productn 01.1.2008 1 1 1 02.1.2008 1 1 2 . 15.2.2008 2 2 NA . 04.4.2008 2 2 1 05.4.2008 NA 2 NA (date ascending order, 1:n products with status 1, 2 or NA) and want to produce a graphic like this, where 1 or 2 are colored rectangles, NAs are white: product1 11..22_ product2 11..222 . . productn 12.._1_ time_axis (=dat) - which looks like a Gantt chart with multiple time slots. I have tried it with image() which looks good, but I don't know how to produce an y-axis with labels product1, etc. and a x-axis with date labels. Further, I want to draw a scatterplot below (using layout()) with the same x-axis. Maybe someone has a better idea which is easier to label? Regards Klemens -- View this message in context: http://www.nabble.com/Gantt-chart-like-graphics-tp17518407p17518407.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multistate survival analysis w/ time varying covariates
Hi, I've seen in the NestCohort package that one can do a hazard model with a binary outcome using covariates. I am interested in multistate hazard models with time-varying covariates, but can't seem to find this already implemented in an R package. Is this included somewhere but called something else? I feel like I've looked all over. Thanks! Michaela Gulemetova-Swan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R reference Books
There is a very good book called Data Manipulation with R by Phil Spector that just became available. It is brief and concise. I would recommend that book for learning about manipulating data in tables. For anyone interested in data exploration and graphics I would also recommend Deepayan Sarkar's book Lattice: Multivariate Data Visualization with R. Amazon.com is selling them as a pair for $88.80 (US) (which I guess is about 3.72 euros these days). On Wed, May 28, 2008 at 3:25 PM, Neil Gupta [EMAIL PROTECTED] wrote: Hi I am still fairly new to R but picking it up quickly. I have some problems manipulating data in tables. I was wondering if anyone new any good resources such as an R manual. Some of the intro pdfs I viewed do not show how...much appreciated. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] library(Matrix) and image() colors?
Hi, I'm trying to produce a plot of an image of a Matrix, but I don't get other colors than the default grey scale: image(Matrix(topo.matrix.2),col.regions=topo.colors(100),colorkey=FALSE) this still is plotted in grey. Is there any mistake in my syntax? Thanks and regards, Javier -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Can plot() be used for multiple plots?
Greetings helpRs -- I would like to use plot() to plot two cumulative distribution curves so that a user of the plot can compare the distributions of the two variables. The following code draws two distributions separately, but I cannot find the instruction necessary to add a second cumulative distribution to the first one. Any suggestion would be very welcome. x1 - sort(rnorm(1000,50,10)) x2 - sort(rnorm(1000,40,8)) plot(x1,1:length(x1)/length(x1),type=l) plot(x2,1:length(x2)/length(x2),type=l) grid(col = black) Ben Fairbank [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] library(Matrix) and image() colors?
On 5/28/08, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi, I'm trying to produce a plot of an image of a Matrix, but I don't get other colors than the default grey scale: image(Matrix(topo.matrix.2),col.regions=topo.colors(100),colorkey=FALSE) this still is plotted in grey. Is there any mistake in my syntax? Not sure what the problem is, but although the first call does not work, the second one does: image(Matrix(volcano), col.regions = topo.colors(100), colorkey = TRUE) image(as(Matrix(volcano), sparseMatrix), col.regions = topo.colors(100), colorkey = TRUE) -Deepayan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can I do this with R?
step or stepAIC functions do the job. You can opt to use BIC by changing the mulplication of penalty. I think AIC and BIC are not only limited to compare two pre-defined models, they can be used as model search criteria. You could enumerate the information criteria for all possible models if the size of full model is relatively small. But this is not generally scaled to practical high-dimensional applications. Hence, it is often only possible to find a 'best' model of a local optimum, e.g. measured by AIC/BIC. On the other way around, I wouldn't like to say the over-penalization of BIC. Instead, I think AIC is usually underpenalizing larger models in terms of the positive probability of incoperating irrevalent variables in linear models. X Frank E Harrell Jr 写道: Smita Pakhale wrote: Hi Maria, But why do you want to use forwards or backwards methods? These all are 'backward' methods of modeling. Try using AIC or BIC. BIC is much better than AIC. And, you do not have to believe me or any one else on this. How does that help? BIC gives too much penalization in certain contexts; both AIC and BIC were designed to compare two pre-specified models. They were not designed to fix problems of stepwise variable selection. Frank Just make a small data set with a few variables with known relationship amongst them. With this simulated data set, use all your modeling methods: backwards, forwards, AIC, BIC etc and then see which one gives you a answer closest to the truth. The beauty of using a simulated dataset is that, you 'know' the truth, as you are the 'creater' of it! smita --- Charilaos Skiadas [EMAIL PROTECTED] wrote: A google search for logistic regression with stepwise forward in r returns the following post: https://stat.ethz.ch/pipermail/r-help/2003-December/043645.html Haris Skiadas Department of Mathematics and Computer Science Hanover College On May 28, 2008, at 7:01 AM, Maria wrote: Hello, I am just about to install R and was wondering about a few things. I have only worked in Matlab because I wanted to do a logistic regression. However Matlab does not do logistic regression with stepwiseforward method. Therefore I thought about testing R. So my question is can I do logistic regression with stepwise forward in R? Thanks /M __ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] calling C function from R
Hi, I am reading the source code of rpart. I have problems understand the following code and would appreciate for any helps. In rpart.s, there is a line: rpfit lt;- .C(C_s_to_rp, nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; n = as.integer(nobs), nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nvarx = as.integer(nvar), nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; ncat = as.integer(cats* !isord), nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; method= as.integer(method.int), nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; as.double(unlist(controls)), nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; parms = as.double(unlist(init$parms)), nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; as.integer(xval), nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; as.integer(xgroups), nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; as.double(t(init$y)), nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; as.double(X), nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; as.integer(!is.finite(X)), # R lets Infs through nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; error = character(1), nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; wt = as.double(wt), nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; as.integer(init$numy), nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; as.double(cost), nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; NAOK=TRUE) Is this line calling the rpart.c function? If so, what was sent to rpfit? I am confused because rpart.c only returns an integer, not an object. The following is the rpart.c function: int rpart(int n,nbsp; int nvarx, Sint *ncat, int method,nbsp; intnbsp; maxpri,nbsp; double *parms,nbsp; double *ymat,nbsp; FLOAT *xmat, Sint *missmat, struct cptable *cptable, struct node **tree,nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp; char **error,nbsp;nbsp; int *which, int xvals,nbsp;nbsp;nbsp;nbsp; Sint *x_grp,nbsp;nbsp;nbsp; double *wt,nbsp;nbsp;nbsp;nbsp; double *opt, int ny,nbsp; double *cost) -- Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can I do this with R?
Xiaohui Chen wrote: step or stepAIC functions do the job. You can opt to use BIC by changing the mulplication of penalty. I think AIC and BIC are not only limited to compare two pre-defined models, they can be used as model search criteria. You could enumerate the information criteria for all possible models if the size of full model is relatively small. But this is not generally scaled to practical high-dimensional applications. Hence, it is often only possible to find a 'best' model of a local optimum, e.g. measured by AIC/BIC. Sure you can use them that way, and they may perform better than other measures, but the resulting model will be highly biased (regression coefficients biased away from zero). AIC and BIC were not designed to be used in this fashion originally. Optimizing AIC or BIC will not produce well-calibrated models as does penalizing a large model. On the other way around, I wouldn't like to say the over-penalization of BIC. Instead, I think AIC is usually underpenalizing larger models in terms of the positive probability of incoperating irrevalent variables in linear models. If you put some constraints on the process (e.g., if using AIC to find the optimum penalty in penalized maximum likelihood estimation), AIC works very well and BIC results if far too much shrinkage (underfitting). If using a dangerous process such as stepwise variable selection, the more conservative BIC may be better in some sense, worse in others. The main problem with stepwise variable selection is the use of significance levels for entry below 1.0 and especially below 0.1. Frank X Frank E Harrell Jr 写道: Smita Pakhale wrote: Hi Maria, But why do you want to use forwards or backwards methods? These all are 'backward' methods of modeling. Try using AIC or BIC. BIC is much better than AIC. And, you do not have to believe me or any one else on this. How does that help? BIC gives too much penalization in certain contexts; both AIC and BIC were designed to compare two pre-specified models. They were not designed to fix problems of stepwise variable selection. Frank Just make a small data set with a few variables with known relationship amongst them. With this simulated data set, use all your modeling methods: backwards, forwards, AIC, BIC etc and then see which one gives you a answer closest to the truth. The beauty of using a simulated dataset is that, you 'know' the truth, as you are the 'creater' of it! smita --- Charilaos Skiadas [EMAIL PROTECTED] wrote: A google search for logistic regression with stepwise forward in r returns the following post: https://stat.ethz.ch/pipermail/r-help/2003-December/043645.html Haris Skiadas Department of Mathematics and Computer Science Hanover College On May 28, 2008, at 7:01 AM, Maria wrote: Hello, I am just about to install R and was wondering about a few things. I have only worked in Matlab because I wanted to do a logistic regression. However Matlab does not do logistic regression with stepwiseforward method. Therefore I thought about testing R. So my question is can I do logistic regression with stepwise forward in R? Thanks /M __ -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can I do this with R?
Frank E Harrell Jr 写道: Xiaohui Chen wrote: step or stepAIC functions do the job. You can opt to use BIC by changing the mulplication of penalty. I think AIC and BIC are not only limited to compare two pre-defined models, they can be used as model search criteria. You could enumerate the information criteria for all possible models if the size of full model is relatively small. But this is not generally scaled to practical high-dimensional applications. Hence, it is often only possible to find a 'best' model of a local optimum, e.g. measured by AIC/BIC. Sure you can use them that way, and they may perform better than other measures, but the resulting model will be highly biased (regression coefficients biased away from zero). AIC and BIC were not designed to be used in this fashion originally. Optimizing AIC or BIC will not produce well-calibrated models as does penalizing a large model. Sure, I agree with this point. AIC is used to correct the bias from the estimations which minimize the KL distance of true model, provided the assumed model family contains the true model. BIC is designed for approximating the model marginal likelihood. Those are all post-selection estimating methods. For simutaneous variable selection and estimation, there are better penalizations like L1 penalty, which is much better than AIC/BIC in terms of consistency. On the other way around, I wouldn't like to say the over-penalization of BIC. Instead, I think AIC is usually underpenalizing larger models in terms of the positive probability of incoperating irrevalent variables in linear models. If you put some constraints on the process (e.g., if using AIC to find the optimum penalty in penalized maximum likelihood estimation), AIC works very well and BIC results if far too much shrinkage (underfitting). If using a dangerous process such as stepwise variable selection, the more conservative BIC may be better in some sense, worse in others. The main problem with stepwise variable selection is the use of significance levels for entry below 1.0 and especially below 0.1. What's the point to use AIC on penalized MLE? I think generally you can view the penalty as the prior regularization and using certain optimization algorithm to find the MAP estimate. Frank X Frank E Harrell Jr 写道: Smita Pakhale wrote: Hi Maria, But why do you want to use forwards or backwards methods? These all are 'backward' methods of modeling. Try using AIC or BIC. BIC is much better than AIC. And, you do not have to believe me or any one else on this. How does that help? BIC gives too much penalization in certain contexts; both AIC and BIC were designed to compare two pre-specified models. They were not designed to fix problems of stepwise variable selection. Frank Just make a small data set with a few variables with known relationship amongst them. With this simulated data set, use all your modeling methods: backwards, forwards, AIC, BIC etc and then see which one gives you a answer closest to the truth. The beauty of using a simulated dataset is that, you 'know' the truth, as you are the 'creater' of it! smita --- Charilaos Skiadas [EMAIL PROTECTED] wrote: A google search for logistic regression with stepwise forward in r returns the following post: https://stat.ethz.ch/pipermail/r-help/2003-December/043645.html Haris Skiadas Department of Mathematics and Computer Science Hanover College On May 28, 2008, at 7:01 AM, Maria wrote: Hello, I am just about to install R and was wondering about a few things. I have only worked in Matlab because I wanted to do a logistic regression. However Matlab does not do logistic regression with stepwiseforward method. Therefore I thought about testing R. So my question is can I do logistic regression with stepwise forward in R? Thanks /M __ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make R running faster
[EMAIL PROTECTED] wrote: Hi everyone, I run the R loops on window XP and vista. Both are Intel core 2 Duo 2.2 GHz with 2 GB ram and XP is significantly faster than vista. Dose anyone know how speed up R loops in vista? Thank you in advance. Chunhao Tu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This is a matter of operating system ... Mr Gates or Mr Ballmer may be able to help. No matter what you do in R the code will probably be faster in XP ... -- Dr Richard Rowe Zoology Tropical Ecology School of Marine Tropical Biology James Cook University Townsville 4811 AUSTRALIA ph +61 7 47 81 4851 fax +61 7 47 25 1570 JCU has CRICOS Provider Code 00117J __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can I do this with R?
On Wed, May 28, 2008 at 03:47:49PM -0700, Xiaohui Chen wrote: Frank E Harrell Jr ??: Xiaohui Chen wrote: step or stepAIC functions do the job. You can opt to use BIC by changing the mulplication of penalty. I think AIC and BIC are not only limited to compare two pre-defined models, they can be used as model search criteria. You could enumerate the information criteria for all possible models if the size of full model is relatively small. But this is not generally scaled to practical high-dimensional applications. Hence, it is often only possible to find a 'best' model of a local optimum, e.g. measured by AIC/BIC. Sure you can use them that way, and they may perform better than other measures, but the resulting model will be highly biased (regression coefficients biased away from zero). AIC and BIC were not designed to be used in this fashion originally. Optimizing AIC or BIC will not produce well-calibrated models as does penalizing a large model. Sure, I agree with this point. AIC is used to correct the bias from the estimations which minimize the KL distance of true model, provided the assumed model family contains the true model. BIC is designed for approximating the model marginal likelihood. Those are all post-selection estimating methods. For simutaneous variable selection and estimation, there are better penalizations like L1 penalty, which is much better than AIC/BIC in terms of consistency. Xiaohui, Tibshirani (1996) suggests that the quality of the L1 penalty depends on the structure of the dataset. As I recall, subset selection was preferred for finding a small number of large effects, lasso (L1) for finding a small to moderate number of moderate-sized effects, and ridge (L2) for many small effects. Can you provide any references to more up-to-date simulations that you would recommend? Cheers, Andrew -- Andrew Robinson Department of Mathematics and StatisticsTel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 http://www.ms.unimelb.edu.au/~andrewpr http://blogs.mbs.edu/fishing-in-the-bay/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gantt chart like graphics
Try gantt.chart in the plotrix package. On Wed, May 28, 2008 at 1:54 PM, kljosc [EMAIL PROTECTED] wrote: Dear R Community, I have a dataframe like this dat product1 product2 ... productn 01.1.2008 1 1 1 02.1.2008 1 1 2 . 15.2.2008 2 2 NA . 04.4.2008 2 2 1 05.4.2008 NA 2 NA (date ascending order, 1:n products with status 1, 2 or NA) and want to produce a graphic like this, where 1 or 2 are colored rectangles, NAs are white: product1 11..22_ product2 11..222 . . productn 12.._1_ time_axis (=dat) - which looks like a Gantt chart with multiple time slots. I have tried it with image() which looks good, but I don't know how to produce an y-axis with labels product1, etc. and a x-axis with date labels. Further, I want to draw a scatterplot below (using layout()) with the same x-axis. Maybe someone has a better idea which is easier to label? Regards Klemens -- View this message in context: http://www.nabble.com/Gantt-chart-like-graphics-tp17518407p17518407.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Writing intermediate analysis to disk
Hi Stephen, Have you looked at 'save' and 'load'? As far as I understand, to really release the memory you need to close R, so you may try to write a script (shell script on Unix, batch file on Windows) which invokes Rcmd to load the data, make an iteration and save the result, so that R dies between subsequent calls to Rcmd. I have never used this but I believe that this shouldn't be too difficult. --- stephen sefick [EMAIL PROTECTED] wrote: Is there a way to write and analysis to disk and then reconstruct the whole thing back into an object. wavCWT() #wmtsa package I am running out of memory on my computer and I was wondering if there was a way to iterate through this process (as it is an iterative process anyway- it just stores the whole thing to memory). Or is there a way to set the scale that I want to look at so that wavCWT can use something other than the default. In the documentation it says that the timestep or anything larger can be used. My time step is 1/15, but I can not use anything larger like 96 (which is one day of fifteen minute readings). thanks Stephen -- Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] confidence interval for the logit - predict.glm
Dear Brian and list members, Thanks very much for your response Brian! I applied the adjusted calculation that you advised me to use [1/(1+exp(-upperlogit))] and as a result I don't get any more NA values in my upper confidence interval values. Yet, some outcomes are very akward, since for very small values such as 1.98E-10, the lower values very reasonably turns out to be 0, yet the upper value is simply set to 1, which kind of makes the confidence interval redundant and doesn't give any proper idea how good of a predicted value this 1.98E-10 value and other similar ones are. In the following I try to retrace what excatly I am doing, yet I suppose the principle of reproducible code will be difficult to accomplish, since I can't add my input files here, I try my best and hope it kind of fits the criteria: My input txt.files are: testfishes.txt (column names are the fish species, row names are the planning units, where fish surveys where conducted, so the matrix is binary, presence/absence data, 0s and 1s) predy.txt (column names are the predictor variables, numerical, one is categorical, row names are again the planning units where fish surveys were conducted) predallx.txt (column names are the predictor variables, row names now are ALL the planning units, also those where no fish survey was conducted) # I do a glm analysis for my fish models, loading the files first: predallx-read.table(predallx.txt,header=TRUE,row.names=1,sep=\t) predallx$exposure-factor(predallx$exposure) predallx$exposure-ordered(predallx$exposure,levels=c(Sheltered,Medium,Medexposed,Exposed)) attach(predallx) predy-read.table(predy.txt,header=TRUE, row.names=1,sep=\t) predy$exposure-factor(predy$exposure) attach(predy) testfishes-read.table(testfishes.txt,header=TRUE,row.names=1,sep=\t) attach(testfishes) # Creating a table for my glm output, models are compared on the basis of AIC values: cat('fish',' ','model',' ','AIC',' ','df.residual',' ','deviance',\n, file=AICfish.txt,append=FALSE) # Creating a matrix for the predicted values (fishoccur) and a matrix for the upper and lower ci: fishoccur-matrix(0,nrow(predallx),ncol(testfishes)) colnames(fishoccur)-colnames(testfishes) rownames(fishoccur)-rownames(predallx) upperresponse-matrix(0,nrow(predallx),ncol(testfishes)) colnames(upperresponse)-colnames(testfishes) rownames(upperresponse)-rownames(predallx) lowerresponse-matrix(0,nrow(predallx),ncol(testfishes)) colnames(lowerresponse)-colnames(testfishes) rownames(lowerresponse)-rownames(predallx) # Now I run the anlysis in a loop + stepwise approach and enter results in the tables created above, i stands for every single fish that in turn should run through the loop: for (i in 1:5) { predy$eachfish - testfishes[,i] modelfish - glm(eachfish~depth500m + exposure + nearest.est + island + mangroves + seagrass, family=binomial, data=predy) stepmodfi - step(modelfish, trace= 0) cat(i, make.names(c(stepmodfi$call),unique=TRUE), AIC(stepmodfi), df.residual(stepmodfi),deviance(stepmodfi),\n, file=AICfish.txt,append=TRUE) # Eventually I want to get predicted values for all those planning units where no fish surveys were conducted: # First I get my predicted values from the response, this yields a predicted value for every single fish and every single planning unit, and these results will be fed into the previously constructed matrix fishoccur: predictionresponse - predict(stepmodfi, predallx, type=response, se.fit=FALSE) fishoccur[,i] - predictionresponse # Now I want to get confidence intervals for these predicted values. I figured out that I would need to get these from the logit scale first and only thereafter transfer them to the response scale. So, first I get again predicted values, but on the logit scale. This time though I get just ONE result for every planning unit, when actually I also expected to get a result for every fish and every planning unit. So, it seems that there is just one value for each planning unit, but I thought it would be nessesary to get one value for each fish and planning unit combination(?) predictionlogit - predict(stepmodfi, predallx, type=link, se.fit=TRUE) # Then, nevertheless, I calculated the upper and lower ci values and putting them into the previously created matrices of upperresponse and lowerresponse, the i stands for the number of fishes used in this run, so that I get one value for every combination of fish and planning unit. Yet, I wonder whether this can be done, if first I only get one predicted value from the logit scale: upperresponse[,i] - exp(upperlogit)/(1+exp(upperlogit)) lowerresponse[,i] - exp(lowerlogit)/(1+exp(lowerlogit)) # Or I use your adjusted formula: upperresponse[,i] - 1/(1+exp(-upperlogit)) lowerresponse[,i] - 1/(1+exp(-lowerlogit)) # Close the loop: } # Finally extracting tables: write.table(fishoccur,file=fishoccur.txt,row.names=TRUE,sep=,)
[R] Separator argument in read.table
Hi, Suppose I have the following tabular data: 1729_at | TRADD | TNFRSF1A-associated via death domain | protein-coding 1773_at | FNTB | farnesyltransferase, CAAX box, beta | protein-coding 177_at | PLD1 | phospholipase D1, phosphatidylcholine-specific | protein-coding What is the right separator used for read.table function? I tried this: dat - read.table(geo2geneinfo_bymodel.txt, sep = |) print(dat) It doesn't seem to work. It flattens the table above into just two columns meaning only contain $V1 and $V2. sep= | also won't work. Please advice. -- Gundala Viswanath __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can I do this with R?
Using any 'significance level', I think is the main problem in the stepwise variable selection method. As such in 'normal' circumstances the interpretation of p-value is topsy-turvy. Then you can only imagine as to what happens to this p-value interpretation in this process of variable selection...you no longer no, what does the significance level mean, if at all anything? smita --- Frank E Harrell Jr [EMAIL PROTECTED] wrote: Xiaohui Chen wrote: step or stepAIC functions do the job. You can opt to use BIC by changing the mulplication of penalty. I think AIC and BIC are not only limited to compare two pre-defined models, they can be used as model search criteria. You could enumerate the information criteria for all possible models if the size of full model is relatively small. But this is not generally scaled to practical high-dimensional applications. Hence, it is often only possible to find a 'best' model of a local optimum, e.g. measured by AIC/BIC. Sure you can use them that way, and they may perform better than other measures, but the resulting model will be highly biased (regression coefficients biased away from zero). AIC and BIC were not designed to be used in this fashion originally. Optimizing AIC or BIC will not produce well-calibrated models as does penalizing a large model. On the other way around, I wouldn't like to say the over-penalization of BIC. Instead, I think AIC is usually underpenalizing larger models in terms of the positive probability of incoperating irrevalent variables in linear models. If you put some constraints on the process (e.g., if using AIC to find the optimum penalty in penalized maximum likelihood estimation), AIC works very well and BIC results if far too much shrinkage (underfitting). If using a dangerous process such as stepwise variable selection, the more conservative BIC may be better in some sense, worse in others. The main problem with stepwise variable selection is the use of significance levels for entry below 1.0 and especially below 0.1. Frank X Frank E Harrell Jr åé: Smita Pakhale wrote: Hi Maria, But why do you want to use forwards or backwards methods? These all are 'backward' methods of modeling. Try using AIC or BIC. BIC is much better than AIC. And, you do not have to believe me or any one else on this. How does that help? BIC gives too much penalization in certain contexts; both AIC and BIC were designed to compare two pre-specified models. They were not designed to fix problems of stepwise variable selection. Frank Just make a small data set with a few variables with known relationship amongst them. With this simulated data set, use all your modeling methods: backwards, forwards, AIC, BIC etc and then see which one gives you a answer closest to the truth. The beauty of using a simulated dataset is that, you 'know' the truth, as you are the 'creater' of it! smita --- Charilaos Skiadas [EMAIL PROTECTED] wrote: A google search for logistic regression with stepwise forward in r returns the following post: https://stat.ethz.ch/pipermail/r-help/2003-December/043645.html Haris Skiadas Department of Mathematics and Computer Science Hanover College On May 28, 2008, at 7:01 AM, Maria wrote: Hello, I am just about to install R and was wondering about a few things. I have only worked in Matlab because I wanted to do a logistic regression. However Matlab does not do logistic regression with stepwiseforward method. Therefore I thought about testing R. So my question is can I do logistic regression with stepwise forward in R? Thanks /M __ -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Levels error after printing
Hi all, After running this code: __BEGIN__ dat - read.table(gene_prob.txt, sep = \t) n - length(dat$V1) print(n) print(dat$V1) __END__ With this input in gene_prob.txt __INPUT__ HFE 0.00107517988586552 NF1 0.000744355305599206 PML 0.000661649160532628 TCF30.000661649160532628 NF2 0.000578943015466049 GNAS0.000578943015466049 GGA20.000578943015466049 . I get this print out. .. [8541] LOC552889 GPR15 SLC2A11 GRIP2 SGEF [8546] PIK3IP1 RPS27 AQP7 8548 Levels: 3.8-1 A2M A4GALT A4GNT AAAS AAK1 AAMP AANAT AARSD1 AASS ... hCG_1730474 What's the meaning of the last line? Is it an error? How can I fix it? -- Gundala Viswanath __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extracting variables from random effects formulas
I would like to be able to extract the names of the variables in a formula that specifies random effects. For example, given: random = ~ 1+year | st_name I'd like to be able to get year and st_name out of random. Neither terms(random) nor random[2] seems to work. Is there a way to get variable names out? Thanks in advance! Rebecca __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.