[R] test deviation from a binomial distribution - lack of 50:50

2007-04-23 Thread parn
Dear R-users,

I have a data set where each observation consists of a number of trials
(n.trials) that varies between 5 and 7, 6 being most common. Each trial
can take either of two outcomes, success or failure.

A dummy data set:
n.trials - sample(5:7, 50, replace=T, prob=c(0.2, 0.6, 0.2))
success - rbinom(50, n.trials, p=0.5)
failure - n.trials - success

I know I could test for a deviation from 50:50 success:failure in one or
the other direction using a glm with binomial errors. However, I
suspect that in my 'real' data set the outcome 50:50 is
underrepresented, not due to a skew in one particular direction, but
rather that within each observation there are either many successes or
many failures. Although I did not manage to create a dummy data set
with these properties, which would be the proper way in R to test for a
'lack of 50:50 outcome' using the simple dummy data above as a starting
point?

Thanks in advance!

Henrik

--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mcmcsamp-CI instead of P-values - references?

2006-10-03 Thread Henrik Parn
Dear all,

Given the discussions and issues of d.f., p-values and mcmcsamp-CIs in 
mixed models, I wonder if anyone could recommend one or two papers (or 
other citable sources for that sake) that summarizes the arguments 
for/against P-values/mcCIs.

I have followed the discussions on R-help and have learnt a lot (in my 
naive non-statistician way). With all respect to all great 
contributions, I may need published articles (if there are any), that 
could be suitable to use in the Methods-section of a manuscript, to 
motivate a no-p-values-but-some-nice-mcmcsampCIs-instead-approach!

Thanks in advance!

Henrik


-- 

Henrik Pärn
Department of Biology
NTNU
7491 Trondheim
Norway

+47 735 96282 (office)
+47 909 89 255 (mobile)
+47 735 96100 (fax)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] equivalent of model.tables for an lm.object?

2006-09-27 Thread Henrik Parn
Dear all,

I run a linear model with three significant explanatory variabels
x1: a factor with 4 levels
x2 and x3: factors with two levels each
x4: continuous

model - lm(y ~ x1 + x2 * x3 + x4)

The data is not perfectly balanced between the different 
factor-combinations and I use treatment contrasts.

With an aov.object, I assume I could have used model.tables(aov.object, 
type = means, se = TRUE), to get the means and se for all factor 
combinations.

In an lm.object like mine, I calculate the means 'manually' from the 
Estimates (for sure it could be done with a script, but fair enough). 
For the standard error of the means, I started out using formulas of a 
variance of a sum of two variables, but I messed things up with the 
interaction. Is there a way to calculate the standard error of the means 
from Estimates and Std.Error (or other information) from the lm.object?


Thanks in advance for any advice!

Best regards,

Henrik


-- 

Henrik Pärn
Department of Biology
NTNU
7491 Trondheim
Norway

+47 735 96282 (office)
+47 909 89 255 (mobile)
+47 735 96100 (fax)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lmer applied to a wellknown (?) example: summary

2006-09-01 Thread Henrik Parn
Dear all,

I wish to thank Christoph Buser and John Wilkinson for their input, and 
especially John his examples and for for pointing me to the thread 
'Doubt about nested aov output' where the rat-example was hiding...:

http://search.gmane.org/?query=Doubt+about+nested+aov+outputemail=group=gmane.comp.lang.r.generalsort=revdateDEFAULTOP=andxP=doubt.nested.aov.output.xFILTERS=Gcomp.lang.r.general---A

In this great thread you find a description not only on using aov on 
this nested rat-data, but also how to handle it with lmer, e.g. model 
specification and coding of grouping levels.

Best regards,

Henrik


-- 

Henrik Pärn
Department of Biology
NTNU
7491 Trondheim
Norway

+47 735 96282 (office)
+47 909 89 255 (mobile)
+47 735 96100 (fax)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lmer applied to a wellknown (?) example

2006-08-30 Thread Henrik Parn
Dear all,

During my pre-R era I tried (yes, tried) to understand mixed models by 
working through the 'rat example' in Sokal and Rohlfs Biometry (2000) 
3ed p 288-292. The same example was later used by Crawley (2002) in his 
Statistical Computing p 363-373 and I have seen the same data being used 
elsewhere in the litterature.

Because this example is so thoroughly described, I thought it would be 
very interesting to analyse it also using lmer and to see how the 
different approaches and outputs differs - from the more or less manual 
old-school (?) approach in Sokal, aov in Crawley, and to mixed models by 
lmer.

In the example, three treatments (Treatment) with two rats (Rat) each 
(i.e six unique rats in total). Three liver preparations (Liver) are 
taken from each rat (i.e 18 unique liver preparations), and two glycogen 
readings (Glycogen) are taken from each liver preparation (36 readings).

We want to test if treatments has affected the glycogen levels. The 
readings are nested in preparation and the preparations nested in rats.

The data can be found here (or on p. 289 in Sokal):
http://www.bio.ic.ac.uk/research/mjcraw/statcomp/data/rats.txt
//
I was hoping to use the rat example as some kind of reference on my way 
to understand mixed models and using lmer. However, first I wish someone 
could check my suggested models!

My suggestions:

attach(rats)
rats$Glycogen - as.numeric(Glycogen)
rats$Treatment - as.factor(Treatment)
rats$Rat - as.factor(Rat)
rats$Liver - as.factor(Liver)
str(rats)

model1 - lmer(Glycogen ~ Treatment + (1|Liver) + (1|Rat), data=rats)
summary(model1)

Was that it?

I also tried to make the 'liver-in-rat' nesting explicit  (as suggested 
in 'Examples from...')
 
model2 - lmer(Glycogen ~ Treatment + (1|Rat:Liver) + (1|Rat), data=rats)
summary(model2)

but then the random effects differs from model1.

Does the non-unique coding of rats and preparations in the original data 
set mess things up? Do I need to recode the ids to unique levels...

rats$rat2 - as.factor(rep(1:6, each=6))
rats$liver2 - as.factor(rep(1:18, each=2))
str(rats)

...and then:

model3 - lmer(Glycogen ~ Treatment + (1|liver2) + (1|rat2), data=rats)
# or maybe
model3 - lmer(Glycogen ~ Treatment + (1|rat2:liver2) + (1|rat2), data=rats)
 

Can anyone help me to get this right! Thanks in advance!

P.S.
Thanks to all contributors to lme/lmer topic on the list (yes, I have 
searched for 'lmer nested'...) and also the examples provided by John 
Fox' 'Linear mixed models, Appendix to An R and S-PLUS companion...' and 
Douglas Bates' 'Examples from Multilevel Software...' and R-news 5/1. 
Very helpful, but as usually I bet I missed something...Sorry.

Regards,

Henrik 

-- 

Henrik Pärn
Department of Biology
NTNU
7491 Trondheim
Norway

+47 735 96282 (office)
+47 909 89 255 (mobile)
+47 735 96100 (fax)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lmList and missing values

2006-08-18 Thread Henrik Parn
Dear all,

I have a question on handling of missing values in lmList. My data set 
have continuous predictor and response, x and y, and a grouping variable 
group.id. All these variables have NAs and the data set also has several 
other variables that also contains NAs.

To create the lmList-object seems to work fine:
y.list - lmList(y ~ x | group.id, data=mydata, na.action=na.omit)

However, when I try to apply functions on the object such as coef or 
intervals it fails:

  coef(y.list)
Error in !unlist(lapply(coefs, is.null)) :
invalid argument type


If I beforehand select only the variables (still including missing 
values) used as arguments in lmList...
 
mydata2 - mydata[ , c(x, y, group)]

...and use mydata2 in lmList as above, coef and intervals works fine.

In order to provide a reproducible example, I made a small test data 
set. This data set contained the same pattern of NAs as in my real data 
set, i.e. NAs in both used and unused variables. The example turned out 
to be not very illustrative though, because strange enough 
'na.action=na.omit' works just fine on the small test data set so I 
don't bother to include it here...

I have not encountered any problems to apply other functions, such as lm 
or lme, to my data set.

Any idea what causes the error?

A related problem seems to have occured before, although it is not clear 
if different na.action options was tried in this case:
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/14293.html


Thanks in advance for any help!

Best regards,

Henrik

-- 

Henrik Pärn
Department of Biology
NTNU
7491 Trondheim
Norway

+47 735 96282 (office)
+47 909 89 255 (mobile)
+47 735 96100 (fax)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] fixed effects following lmer and mcmcsamp - which to present?

2006-08-08 Thread Henrik Parn
Dear all,

I am running a mixed model using lmer. In order to obtain CI of 
individual coefficients I use mcmcsamp. However, I need advice which 
values that are most appropriate to present in result section of a 
paper. I have not used mixed models and lmer so much before so my 
question is probably very naive. However, to avoid to much problems with 
journal editors and referees addicted to p-values, I would appreciate 
advice of which values of the output for the fixed factor that would be 
most appropriate to present in a result section, in order to convince 
them of the p-value free 'lmer-mcmcsamp'-approach!

Using the example from the help page on lmer:

fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy)

...I obtain the following for 'Days':


summary(fm1)
...
Estimate Std. Error  t value

Days 10.4673 1.54586.771


...and from mcmcsamp:

summary(mcmcsamp(fm1 , n = 1))

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

Mean SD Naive SE Time-series SE
Days 10.4695 1.7354 0.017354   0.015921

2. Quantiles for each variable:
2.5%   25%   50%   75%97.5%
Days  7.02279.3395   10.4712   11.5719   13.957



The standard way of presenting coefficients following a 'non-lmer' 
output is often (beta=..., SE=..., statistic=..., P=...). What would be 
the best equivalent in a 'lmer-mcmcsamp-context'? (beta=..., CI=...) is 
a good start I believe. But which beta? And what else?

I assume that the a 95% CI in this case would be 7.0227-13.957 (please, 
do correct me I have completely misunderstood!). But which would be the 
corresponding beta? 10.4673?, 10.4695? 10.4712? Is the t-value worth 
presenting or is it 'useless' without corresponding degrees of freedom 
and P-value? If I present the mcmcsamp-CI, does it make sense to present 
any of the three SE obtained in the output above? BTW, I have no idea 
what Naive SE, Time-series SE means. Could not find much in help and 
pdfs to coda or Matrix, or in Google.

Thanks in advance for any advice and hints to help-texts I have missed!


Best regards,

Henrik

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] paired t-test. Need to rearrange data?

2006-08-06 Thread Henrik Parn
Dear all,

I have received some data arranged like this:

# original data
id - rep(letters[1:6], each=2)
time - as.factor(rep(1:2, 6))
y - as.vector(replicate(6, c(rnorm(n=1, mean=1), rnorm(n=1, mean=2
test.data - data.frame(id, time, y)
test.data

I would like to perform a paired t-test of the y-values at time=1 
against those at time=2, with samples paired by their id. Is it 
necessary to arrange the data in a format like this:   

# rearranged data
id - letters[1:6]
y1 - replicate(6, rnorm(n=1, mean=1)) # y-value at time = 1
y2 - replicate(6, rnorm(n=1, mean=2)) #  y-value at time = 2
test.data2 - data.frame(id, y1, y2)
test.data2

...in order to perform a paired t-test?
t.test(y1, y2, paired=T)

If yes, which is the most convenient way to rearrange the data?
Or is it possible to apply the paired t-test function to the original 
data set?

And a side question: In my examples, I suppose can I use set.seed to 
reproduce the 'rnorm-values' created in the 'original data' also in my 
the 'rearranged data'. Can someone give me a hint of how to apply the 
same 'seed' to all the rnorms?


Thanks a lot in advance!


Henrik

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] efficient way to make NAs of empty cells in a factor (or character)

2006-08-03 Thread Henrik Parn
Dear all,

I have some csv-files (originating from Excel-files) containing empty 
cells. In my example file I have four variables of different classes, 
each with some empty cells in the original csv-file:

  test - read.csv2(test.csv, dec=.)

  test
  id id2  x   y
1  a  1  NA
2  b   e NA 2.2
3  f  3 3.3
4  c   g  4 4.4


  class(test$id)
[1] factor
  class(test$id2)
[1] factor
  class(test$x)
[1] integer
  class(test$y)
[1] numeric

In the help text of read.csv2 you can read 'Blank fields are also 
considered to be missing values in logical, integer, numeric and complex 
fields.'. Thus, empty cells in a factor (or a character I assume) is not 
considered as missing values but an own level:

  is.na(test$id)
[1] FALSE FALSE FALSE FALSE
  levels(test$id)
[1]   a b c

When I work with my real (larger) dataset I would like to use functions 
like 'is.na' and '!is.na' on factors. Now I wonder if there is an R 
alternativ to do 'search (for empty cells) and replace (with NA)' in Excel?

I have tried a modification of Uwe Ligges suggestion on missing value 
posted 2 Aug:
  is.na(test[test==]) - TRUE

...but it did not work on the data set:

Error in [-.data.frame(`*tmp*`, test == , value = c(NA, NA, NA, NA :
rhs is the wrong length for indexing by a logical matrix


However it worked fine when applied to a single vector:

  is.na(test$id[test$id==]) - TRUE
  test$id
[1] abNA c  
Levels:  a b c

  is.na(test$id)
[1] FALSE FALSE  TRUE FALSE

Is there a more efficient way to fill empty cells in all my factors in R 
or should I just do it in advance in Excel by 'search and replace'?

Thanks in advance!

-- 

Henrik Pärn
Department of Biology
NTNU
7491 Trondheim
Norway

+47 735 96282 (office)
+47 909 89 255 (mobile)
+47 735 96100 (fax)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] break axis using plotrix

2006-07-16 Thread Henrik Parn
Dear all,

I am trying to plot some data with differing range in y-values with 
type=b, adding error bars and break the y-axis into two parts, one 
lower part from 12 to 20, and one upper part from 34 to 40.

I have tried to follow the basic ideas from the script provided here by 
Jim Lemon:
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/56487.html


My attempt looks like this:

###
# x-values
x - 1:4

# small y-values with corresponding standard errors
meansarr - c(14.9, 18.2, 14.5, 18.3)
searr - c(0.47, 1.27, 1.22, 0.49)

# large values
meanslay - c(36.4, 39.0, 35.3, 38.6)
selay - c(0.51, 0.34, 0.57, 0.40)

library(plotrix)

# plot small values
plot(x, meansarr, ylim=c(12, 30), axes=F, type=b, xlab=, ylab=Day)
arrows(x, meansarr-searr, x, meansarr+searr, code = 3, angle = 90, 
length = 0.03)
box()

# x-axis
axis(1, tck=0.01, las=1, at=1:4,
labels=c(1998, 1999, 2002, 2003), mgp=c(3, 0.5, 0))

# y-axis
axis(2,at=c(12, 14, 16, 18, 20, 24, 26, 28, 
30),labels=c(12,14,16,18,20, 34,36,38,40))


# break axis
axis.break(2, 22, style=zigzag)

# add large values to same plot
par(new=TRUE)
plot(x, meanslay, ylim=c(30, 40), type=b, xlab=, ylab=Day, axes=F)
arrows(x, meanslay-selay, x, meanslay+selay, code = 3, angle = 90, 
length = 0.03)





As you can see, I have problems adding the larger y-values - they end up 
in the wrong place in the graph. I suppose Jim's warning 'just be 
careful that the ylim= and labels= arguments match up' is relevant here, 
but I don't manage to fix it...


Can anyone help me to plot the large y-values on the right 
placeaccording to the y-axis labeling?
I am using R 2.3.1 and WinXp.

Thanks a lot in advance!

Henrik


-- 

Henrik Pärn
Department of Biology
NTNU
7491 Trondheim
Norway

+47 735 96282 (office)
+47 909 89 255 (mobile)
+47 735 96100 (fax)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] more documentation on lmer?

2006-04-09 Thread Henrik Parn
Dear Bill,

You might check Faraway's 'Extending the Linear Model with R: 
Generalized Linear, Mixed Effects and Nonparametric Regression Models' 
(2006). Without having read the book properly, at least I noticed that 
package lme4 and the function lmer is used in the examples.

Best regards,
Henrik


Hello. Is there any documentation on the lmer function in the lme4 
package beyond what was published in the May 2005 R News (vol.5/1)? As 
well, has the nonlinear version of lmer appeared yet?  

Bill Shipley

-- 

Henrik Pärn
Department of Biology
NTNU
7491 Trondheim
Norway

+47 735 96282 (office)
+47 909 89 255 (mobile)
+47 735 96100 (fax)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] dist() between groups of data

2006-02-23 Thread Henrik Parn
Dear all,

I have a data set with x and y positions of nests. Each nest can be 
grouped according to a factor z. I have made a small dummy data set, 
where the grouping variable z is 0 or 1:

x - runif(6)
y - runif(6)
z - rep(c(0,1), each=3)
xyz - cbind(x,y,z)
xyz
 x   y z
[1,] 0.7176801 0.760944688 0
[2,] 0.2377928 0.080622524 0
[3,] 0.9450770 0.022039470 0
[4,] 0.8725492 0.971996677 1
[5,] 0.6356198 0.001569859 1
[6,] 0.8949670 0.066044377 1


First of all, I suppose that this is the correct way to calculate the 
Euclidean distance between /all/ nests:

xy - cbind(x,y)
d.all - dist(xy)


However, I would like to obtain a distance matrix for distances between 
nests belonging to certain groups. For example calculate all distances 
from nests where z equals something, to nests where z equals something 
else.

Is it best to split the data in two matrices according to the value of z 
in the two groups to be compared, and then try to follow the suggestions 
by Gabor Grothendieck* Date:* Wed 07 Jan 2004 - 15:16:37 EST , or are 
there other ways that can be applied straight to one matrix?


Thanks in advance!

Henrik











-- 

Henrik Pärn
Department of Biology
NTNU
7491 Trondheim
Norway

+47 735 96282 (office)
+47 909 89 255 (mobile)
+47 735 96100 (fax)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] SciViews-R_0.8-9 Console problem

2005-11-30 Thread Henrik Parn
Dear R users,

I successfully installed SciViews the other day. However, when I try to 
run it now,  the command/script window does not appear. Strange. Well, 
actually I see a tendency to a script window (in the lower part of the 
sciview window where it is suppose to be) during start up, but when the 
program is entirely open, the script window is gone.

I have tried to uninstall and reinstall latest versions available of R, 
Sciviews and Tinn-R several times, but it does not solve the problem. I 
have tried both the 'Detailed installation of Sciviews-R' and 'Manually 
installing additional R packages'.

Another 'new' problem I did not have before is the default size of the 
Sciviews window - both the upper and lower part 'disappears' outside my 
screen when starting up. And in the upper 'R-window' two prompts appear 
with maybe 6-7 lines in between. However, as soon as I hit a key the 
lower of these to prompt disappears. But still no script editor.

I suspect I have missed something fundamental...but what?

Any suggestions that can help me is appreciated!

Thanks!

Henrik

PS I have tried to mail to [EMAIL PROTECTED], but the mail just bounces.

I have WinXP, PIII, 512 RAM

R.Version()
$platform
[1] i386-pc-mingw32
$arch
[1] i386
$os
[1] mingw32
$system
[1] i386, mingw32
$status
[1] 
$major
[1] 2
$minor
[1] 2.0
$year
[1] 2005
$month
[1] 10
$day
[1] 06
$svn rev
[1] 35749
$language
[1] R

capabilities(tcltk)
tcltk TRUE

SciViews version: SciViews-R_0.8-9Setup.exe

 From the SciViews R console:
search()

[1] .GlobalEnvpackage:Rcmdr package:car [4] 
package:svGUI package:svViews   package:svIO[7] 
package:svMiscpackage:R2HTMLpackage:tcltk   [10] 
package:methods   package:stats package:graphics
[13] package:grDevices package:utils package:datasets
[16] TempEnv   RcmdrEnv  Autoloads   [19] 
package:base

-- 

Henrik Pärn
Department of Biology
NTNU
7491 Trondheim
Norway

+47 735 96282 (office)
+47 909 89 255 (mobile)
+47 735 96100 (fax)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html