Re: [R] variance of discrete uniform distribution
On Mon, Mar 8, 2010 at 3:44 PM, casperyc caspe...@hotmail.co.uk wrote: Hi Rolf Turner , God, it directed to the wrong page. I firstly find the formula in wiki, than tried to verify the answer in R, now, given that 143/12 ((n^2-1)/12 ) is the correct answer for a discrete uniform random variable, I am still not sure what R is calculating there? why it gives me 13? Of RT's two points, you addressed (b) continuous vs. discrete, but you have yet to address (a) population estimate based on a sample. Hint: var(1:12) tries to estimate the population variance based on a sample. You are interested in the population variance. They are calculated different formulas that differ *only in the denominator*. Michael Thanks! -- View this message in context: http://n4.nabble.com/variance-of-discrete-uniform-distribution-tp1585328p1585355.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] limit to p-value using t.test()
On Sat, Feb 6, 2010 at 8:53 AM, Pete Shepard peter.shep...@gmail.com wrote: I am using t-test to check if the difference between two populations is significant. I have a large N=20,000, 10,000 in each population. I compare a few different populations with each other and though I get different t-scores, I get the same p-value of 10^-16 which seems like the limit for this function. Is this true and is so, is there a workaround to get a more sensitive/accurate p-value? Three comments -- First, with a given value of t and the df for your test, you can get p-values smaller than 2.2e-16 by plugging that information into pt(). pt(500, df=10, lower.tail=FALSE) [1] 1.259769e-23 pt(1500, df=10, lower.tail=FALSE) [1] 2.133778e-28 Second, if these are *populations* then a t-test is inappropriate. Just compute the means, and if they do not equal one another, then the population means are different. All the statistical tests that I can think of try to make and place bounds on inferences about the population based upon samples drawn from those populations. If you have the populations, this makes no sense. It seems like you need to decide what kinds of differences are meaningful, and then check to see if the population differences meet those criteria. Third, why do you want a more accurate p-value? The only reason I can think of is using Rosenthal Rubin's method to compute effect sizes from a p-value, but again, if you have the populations, you can compute effect sizes directly. Good luck! Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R on amazon's EC2 cloud?
On Sun, Dec 27, 2009 at 3:33 PM, Carlos J. Gil Bellosta c...@datanalytics.com wrote: I tried Amazon EC2 with R recently and wrote an entry about it to a blog I collaborate with: http://analisisydecision.es/probando-r-sobre-el-ec2-de-amazon/ (Unfortunately, it is in Spanish...) Google's translation is pretty readable and useful... http://translate.google.com/translate?js=yprev=_thl=enie=UTF-8layout=1eotf=1u=http%3A%2F%2Fanalisisydecision.es%2Fprobando-r-sobre-el-ec2-de-amazon%2Fsl=estl=en Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [ how can sample from f(x)~x^(a-1)
On Fri, Dec 25, 2009 at 8:07 AM, khaz...@ceremade.dauphine.fr wrote: Hello all how can sample from f(x)~x^(a-1)*ind(0,min(b,-log(u)) in R? where a and b is positive constand and 0u1 If the idea is that X is a random variable, then you need to decide what kind of random variable it is. For example, if you wanted to assume that it was a random uniform variable roughly on the interval of -infinity to infinity, you can do something like: runif(1, -1e100, 1e+100)^(a-1)*ind(0,min(b,-log(u))) (I'm not sure whether my limits are a good stand-in for -infinity to +infinity.) There are a number of functions that all start with r that sample from various distributions -- rnorm(), runif(), rexp(), ... Once you decide what X is, it is pretty straightforward. I was going to look at your function, but I don't know what ind() is, so I was stuck. Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regex question
Here's what I came up with: gsub((\\w)[^ ]+[\\b ], \\1, astr) [1] Timtowtdit You might be interested in Regular Expressions Cookbook from O'Reilly (publisher not author) or http://www.regular-expressions.info/ I usually bumble along knowing there are better ways to do whatever I am doing. Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hmisc summarize() with level in by variable
I was using summarize() in a data set in which one of the levels of the by variable was . The summary statistic was consistently off by one level and the level was not in the output data frame. I tried to report it as a bug, but I could not log into the Hmisc bug reporting website to do so. I searched for this in the email archives. If it's there, I failed to find it. Should I try to pursue this as a bug, or am I using summarize incorrectly? Here is my example along with the output: tst1 - data.frame(a=factor(c(, A, B, C)), + x=1:4) tst1 a x 1 1 2 A 2 3 B 3 4 C 4 with(tst1, summarize(x, by=llist(a), FUN=mean)) a x 1 A 1 2 B 2 3 C 3 with(tst1, aggregate(x, by=list(a), FUN=mean)) Group.1 x 1 1 2 A 2 3 B 3 4 C 4 sessionInfo() R version 2.9.0 (2009-04-17) i486-pc-linux-gnu locale: LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=C;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Hmisc_3.6-0 loaded via a namespace (and not attached): [1] cluster_1.11.13 grid_2.9.0 lattice_0.17-22 Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] latex(Hmisc): cgroup + rownames shifts column names
I have submitted this as a bug (http://biostat.mc.vanderbilt.edu/trac/Hmisc/ticket/29) but I am wondering if anyone else has seen it or perhaps developed a workaround. I could certainly fix the LaTeX by hand, but I am using this inside Sweave, so it is a bit cumbersome. The exact same code used to work fine, but something changed underneath it. Even so, perhaps I am using the latex() command incorrectly. When I do a latex table with rownames but without cgroup, everything is fine, but when I use cgroup (as in the second case), the rownames column gets the first column's name and subsequent column names are shifted. This worked fine for me a year ago, so something has changed in my setup or in Hmisc. I have another machine with Hmisc_3.4-3 installed, and its cgroup()-ed table was correct as well. I included the output from sessionInfo() at the bottom. test.df - data.frame(row.names=letters[1:4], col1=1:4, col2=4:1, col3=4:7) latex(test.df, file=, table.env=FALSE, center=none) % latex.default(test.df, file = , table.env = FALSE, center = none) % \begin{tabular}{lrrr} \hline\hline %% Comment added by ME -- this next line is the one that is fine without cgroup \multicolumn{1}{l}{test.df} \multicolumn{1}{c}{col1} \multicolumn{1}{c}{col2} \multicolumn{1}{c}{col3} \tabularnewline \hline a$1$$4$$4$\tabularnewline b$2$$3$$5$\tabularnewline c$3$$2$$6$\tabularnewline d$4$$1$$7$\tabularnewline \hline \end{tabular} latex(test.df, file=, n.cgroup=c(1,2), cgroup=c(,95\\% Conf. Limits), table.env=FALSE, center=none) % latex.default(test.df, file = , n.cgroup = c(1, 2), cgroup = c(, 95\\% Conf. Limits), table.env = FALSE, center = none) % \begin{tabular}{lrcrr} \hline\hline \multicolumn{1}{l}{\bfseries test.df} \multicolumn{1}{c}{\bfseries } \multicolumn{1}{c}{\bfseries } \multicolumn{2}{c}{\bfseries 95\% Conf. Limits} \tabularnewline \cline{4-5} %% Comment added by ME -- this next line is the one that is broken with cgroup \multicolumn{1}{l}{col1} \multicolumn{1}{c}{} \multicolumn{1}{c}{col2} \multicolumn{1}{c}{col3} \multicolumn{1}{c}{col1} \tabularnewline \hline a$1$$4$$4$\tabularnewline b$2$$3$$5$\tabularnewline c$3$$2$$6$\tabularnewline d$4$$1$$7$\tabularnewline \hline \end{tabular} I am attaching a .pdf of the two tables. (I hope it goes through.) sessionInfo() R version 2.8.1 (2008-12-22) i486-pc-linux-gnu locale: LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=C;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Hmisc_3.5-2 loaded via a namespace (and not attached): [1] cluster_1.11.13 grid_2.9.0 lattice_0.17-22 tcltk_2.9.0 [5] tools_2.9.0 I hope this is enough information to make the problem clear and that I haven't missed something obvious. Finally, I will subscribe to the help list (I usually just search the archives), but I would appreciate replies sent directly to me as well! Thanks! Michael x.pdf Description: Adobe PDF document __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.