Re: [R] variance of discrete uniform distribution

2010-03-08 Thread Michael Erickson
On Mon, Mar 8, 2010 at 3:44 PM, casperyc caspe...@hotmail.co.uk wrote:

 Hi Rolf Turner ,

 God, it directed to the wrong page.

 I firstly find the formula in wiki, than tried to verify the answer in R,
 now, given that 143/12 ((n^2-1)/12 ) is the correct answer for a discrete
 uniform random variable,
 I am still not sure what R is calculating there?
 why it gives me 13?

Of RT's two points, you addressed (b) continuous vs. discrete, but you
have yet to address (a) population estimate based on a sample.  Hint:
var(1:12) tries to estimate the population variance based on a sample.
 You are interested in the population variance.  They are calculated
different formulas that differ *only in the denominator*.

Michael



 Thanks!
 --
 View this message in context: 
 http://n4.nabble.com/variance-of-discrete-uniform-distribution-tp1585328p1585355.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] limit to p-value using t.test()

2010-02-06 Thread Michael Erickson
On Sat, Feb 6, 2010 at 8:53 AM, Pete Shepard peter.shep...@gmail.com wrote:
 I am using t-test to check if the difference between two populations is
 significant. I have a large N=20,000, 10,000 in each population. I compare a
 few different populations with each other and though I get different t-scores,
 I get the same  p-value of 10^-16 which seems like the limit for this
 function. Is this true and is so, is there a workaround to get a more
 sensitive/accurate p-value?

Three comments --

First, with a given value of t and the df for your test, you can get
p-values smaller than  2.2e-16 by plugging that information into pt().

 pt(500, df=10, lower.tail=FALSE)
[1] 1.259769e-23
 pt(1500, df=10, lower.tail=FALSE)
[1] 2.133778e-28

Second, if these are *populations* then a t-test is inappropriate.
Just compute the means, and if they do not equal one another, then the
population means are different.  All the statistical tests that I can
think of try to make and place bounds on inferences about the
population based upon samples drawn from those populations.  If you
have the populations, this makes no sense.  It seems like you need to
decide what kinds of differences are meaningful, and then check to see
if the population differences meet those criteria.

Third, why do you want a more accurate p-value?  The only reason I can
think of is using Rosenthal  Rubin's method to compute effect sizes
from a p-value, but again, if you have the populations, you can
compute effect sizes directly.

Good luck!

Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R on amazon's EC2 cloud?

2009-12-29 Thread Michael Erickson
On Sun, Dec 27, 2009 at 3:33 PM, Carlos J. Gil Bellosta 
c...@datanalytics.com wrote:

 I tried Amazon EC2 with R recently and wrote an entry about it to a blog I
 collaborate with:

 http://analisisydecision.es/probando-r-sobre-el-ec2-de-amazon/

 (Unfortunately, it is in Spanish...)


Google's translation is pretty readable and useful...

http://translate.google.com/translate?js=yprev=_thl=enie=UTF-8layout=1eotf=1u=http%3A%2F%2Fanalisisydecision.es%2Fprobando-r-sobre-el-ec2-de-amazon%2Fsl=estl=en

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [ how can sample from f(x)~x^(a-1)

2009-12-26 Thread Michael Erickson
On Fri, Dec 25, 2009 at 8:07 AM, khaz...@ceremade.dauphine.fr wrote:

 Hello all
 how can sample from f(x)~x^(a-1)*ind(0,min(b,-log(u)) in R?
 where a and b is positive constand and   0u1


If the idea is that X is a random variable, then you need to decide what
kind of random variable it is.  For example, if you wanted to assume that it
was a random uniform variable roughly on the interval of -infinity to
infinity, you can do something like:

runif(1, -1e100, 1e+100)^(a-1)*ind(0,min(b,-log(u)))

(I'm not sure whether my limits are a good stand-in for -infinity to
+infinity.)

There are a number of functions that all start with r that sample from
various distributions -- rnorm(), runif(), rexp(), ...  Once you decide what
X is, it is pretty straightforward.

I was going to look at your function, but I don't know what ind() is, so I
was stuck.

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex question

2009-08-04 Thread Michael Erickson
Here's what I came up with:

 gsub((\\w)[^ ]+[\\b ], \\1, astr)
[1] Timtowtdit

You might be interested in Regular Expressions Cookbook from O'Reilly
(publisher not author) or http://www.regular-expressions.info/

I usually bumble along knowing there are better ways to do whatever I am doing.

Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Hmisc summarize() with level in by variable

2009-06-13 Thread Michael Erickson
I was using summarize() in a data set in which one of the levels of
the by variable was .  The summary statistic was consistently off by
one level and the  level was not in the output data frame.  I tried
to report it as a bug, but I could not log into the Hmisc bug
reporting website to do so.  I searched for this in the email
archives.  If it's there, I failed to find it.  Should I try to pursue
this as a bug, or am I using summarize incorrectly?  Here is my
example along with the output:

 tst1 - data.frame(a=factor(c(, A, B, C)),
+   x=1:4)
 tst1
  a x
1   1
2 A 2
3 B 3
4 C 4
 with(tst1, summarize(x, by=llist(a), FUN=mean))
  a x
1 A 1
2 B 2
3 C 3
 with(tst1, aggregate(x, by=list(a), FUN=mean))
  Group.1 x
1 1
2   A 2
3   B 3
4   C 4

 sessionInfo()
R version 2.9.0 (2009-04-17)
i486-pc-linux-gnu

locale:
LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=C;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] Hmisc_3.6-0

loaded via a namespace (and not attached):
[1] cluster_1.11.13 grid_2.9.0  lattice_0.17-22


Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] latex(Hmisc): cgroup + rownames shifts column names

2009-04-23 Thread Michael Erickson
I have submitted this as a bug
(http://biostat.mc.vanderbilt.edu/trac/Hmisc/ticket/29) but I am
wondering if anyone else has seen it or perhaps developed a
workaround.  I could certainly fix the LaTeX by hand, but I am using
this inside Sweave, so it is a bit cumbersome.  The exact same code
used to work fine, but something changed underneath it.  Even so,
perhaps I am using the latex() command incorrectly.

When I do a latex table with rownames but without cgroup, everything
is fine, but when I use cgroup (as in the second case), the rownames
column gets the first column's name and subsequent column names are
shifted. This worked fine for me a year ago, so something has changed
in my setup or in Hmisc. I have another machine with Hmisc_3.4-3
installed, and its cgroup()-ed table was correct as well.  I included
the output from sessionInfo() at the bottom.

test.df - data.frame(row.names=letters[1:4], col1=1:4, col2=4:1, col3=4:7)

latex(test.df, file=, table.env=FALSE, center=none)

% latex.default(test.df, file = , table.env = FALSE, center = none) %
\begin{tabular}{lrrr}
\hline\hline
%% Comment added by ME -- this next line is the one that is fine without cgroup
\multicolumn{1}{l}{test.df} \multicolumn{1}{c}{col1}
\multicolumn{1}{c}{col2} \multicolumn{1}{c}{col3} \tabularnewline
\hline a$1$$4$$4$\tabularnewline
b$2$$3$$5$\tabularnewline
c$3$$2$$6$\tabularnewline
d$4$$1$$7$\tabularnewline
\hline
\end{tabular}

latex(test.df, file=, n.cgroup=c(1,2), cgroup=c(,95\\% Conf. Limits),
table.env=FALSE, center=none)

% latex.default(test.df, file = , n.cgroup = c(1, 2), cgroup = c(,
95\\% Conf. Limits), table.env = FALSE, center = none)
%
\begin{tabular}{lrcrr}
\hline\hline
\multicolumn{1}{l}{\bfseries test.df} \multicolumn{1}{c}{\bfseries }
\multicolumn{1}{c}{\bfseries }
\multicolumn{2}{c}{\bfseries 95\% Conf. Limits} \tabularnewline
 \cline{4-5}
%% Comment added by ME -- this next line is the one that is broken with cgroup
\multicolumn{1}{l}{col1} \multicolumn{1}{c}{}
\multicolumn{1}{c}{col2} \multicolumn{1}{c}{col3}
\multicolumn{1}{c}{col1} \tabularnewline
\hline
a$1$$4$$4$\tabularnewline
b$2$$3$$5$\tabularnewline c$3$$2$$6$\tabularnewline
d$4$$1$$7$\tabularnewline
\hline \end{tabular}

I am attaching a .pdf of the two tables.  (I hope it goes through.)

sessionInfo()

R version 2.8.1 (2008-12-22) i486-pc-linux-gnu

locale: 
LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=C;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] Hmisc_3.5-2

loaded via a namespace (and not attached): [1] cluster_1.11.13
grid_2.9.0 lattice_0.17-22 tcltk_2.9.0 [5] tools_2.9.0

I hope this is enough information to make the problem clear and that I
haven't missed something obvious.  Finally, I will subscribe to the
help list (I usually just search the archives), but I would appreciate
replies sent directly to me as well!

Thanks!

Michael


x.pdf
Description: Adobe PDF document
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.