Re: [R] Dissimilarity Analysis

2007-06-20 Thread stephen cox
Hi Birgit - looks like you have a few issues here.

Birgit Lemcke birgit.lemcke at systbot.uzh.ch writes:

 
 Hello you all!
 
 I am a completely new user of R and I have a problem to solve.
 I am using Mac OS X on a PowerBook.
 
 I have a table that looks like this:
 
 species X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14  
 X15 X16 X17 X18 X19 X20 X21
 1Anth_cap1  1  0  0  1  0  1  0  0  1   0   0   0   0   0
 0   0   1   0   0   0   1
 2   Anth_crin1  1  0  0  1  0  1  0  0  1   0   1   0   0   0
 0   0   0   1   0   0   1
 3Anth_eck1  1  0  0  1  0  1  0  0  1   0   0   0   0   0
 0   0   0   1   0   0   1
 4   Anth_gram1  1  0  0  1  0  1  0  0  1  NA  NA  NA  NA   0
 0   0   0   1   0   0   0
 5   Anth_insi1  1  0  0  1  0  1  0  0  1   0   0   0   1   0
 0   0   0   1   0   0   1
 
 All columns  are binary coded characters.
 The Import was done by this
 
 Test-read.table(TestRFemMalBivariat1.csv,header = TRUE, sep = ;)

First - you need to transpose the matrix to have species as columns.  You can do
this with:

d2 = data.frame(t(Test[,-1]))
colnames(d2) = Test[,1]  #now use d2


 
 Now I try to perform a similarity analysis with the dsvdis function  
 of the labdsv package with the sorensen-Index.
 
 My first question is if all zeros in my table are seen as missing  
 values and if it islike that how can I change without turning zero  
 into other numbers?

no - the zeros are valid observations.  the na's are missing data.


   DisTest-dsvdis(Test, index = sorensen)
 
 But I always get back this error message:
 
 Warnung in symbol.For(dsvdis) :'symbol.For' is not needed: please  
 remove it
 Fehler in dsvdis(Test, index = sorensen) :
   NA/NaN/Inf in externem Funktionsaufruf (arg 1)
 Zusätzlich: Warning message:
 NAs durch Umwandlung erzeugt



Second - you have an issue with missing data.  It looks like dsvdis does not
like the NA's - so you must make a decision about what to do.  Delete that
species, delete that site, or whatever...

Finally - the warning over symbol.For is an issue with the labdsv library itself
- nothing you are doing wrong.  The results will still be valid - but the use of
symbol.For is something that will eventually need to be changed in the labdsv
library.

hth,

stephen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Rearranging Capture History Data in R

2007-06-11 Thread cox
What code can i use to convert a table like this:

Tag#Date
1   1
2   1
3   1
4   1
2   2
4   2
1   3
2   3
4   4

Into one like this:

Tag 1 2 3 4 #Date header
1   1 0 0 1
2   1 1 1 0
3   1 0 0 0
4   1 1 0 1

Thanks,


Ben Cox
Research Assistant (M.S.)
Montana Cooperative Fishery Research Unit
301 Lewis Hall
Montana State University
Bozeman, MT 59717
(406)994-6643

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Qualitative Data??(String command)

2006-10-27 Thread Davia Cox
I am using the read.table function to load an Excel data set into R.
It has a few variables with very long qualitative (free response
typically in sentences) response that I would like to keep, but would
like to limit the length of the response that R shows. Is there some
sort of string or column width command I can include in the read.table
function to limit the length of words used in the variables that have
long responses?? I do know which variable names are the ones with the
qualitative responses. Will the Rcmdr program do this for me? I know
STATA has this function but I don't have it on my computer to use.

Davia


-- 
Davia S. Cox
Image creates desire. You will what you imagine.
~J.G. Gallimore~

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic regression - confidence intervals

2006-02-08 Thread Cox, Stephen
Please forgive a rather naïve question... 

Could someone please give a quick explanation for the differences in conf 
intervals achieved via confint.glm (based on profile liklihoods) and the 
intervals achieved using the Design library.

For example, the intervals in the following two outputs are different.

library(Design)
x = rnorm(100)
y = gl(2,50)
d = data.frame(x = x, y = y)
dd = datadist(d); options(datadist = 'dd')
m1 = lrm(y~x, data = d)
summary(m1)

m2 = glm(y~x, family = binomial, data = d)
confint(m2)

I have spent time trying to figure this out via archives, but have not had much 
luck.

Regards

Stephen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Outputing dataframe or vector from within a user defined function

2005-12-20 Thread Andrew Cox
Hi,
I have written a user defined function that carries
out a monte carlo simulation and outputs various
stats. I would like to access and store the simulation
data (a two column local variable) from outside the
function I have written. How can I output the data
from the function as new variable(not simply print
it), accessible outside
the function? 

I am probably going to find that this is really
simple, yet I have not been able to find the answer in
(usual places, manuals, 2 x Books and mailing lists)

Many Thanks 
Andy Cox

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Outputing dataframe or vector from within a user defined function

2005-12-19 Thread Andrew Cox
Hi,
I have written a user defined function that carries
out a monte carlo simulation and outputs various
stats. I would like to access and store the simulation
data (a two column local variable) from outside the
function I have written. How can I output the data
from the function as new variable, accessible outside
the function? 

I am probably going to find that this is really
simple, yet I have not been able to find the answer in
(usual places, manuals, 2 x Books and mailing lists)

Many Thanks 
Andy Cox

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Question

2005-12-13 Thread Davia Cox
Hello,

I have a problem that I am trying to solve and I am not sure how to  
do it in R.

Suppose, that 16 numbers are choosen at random from 0 to 9, what's  
the probability that their average will be between 4 and 6. I typed  
the following code:

set.seed(100)
sample(0:9, 16, replace =TRUE)
[1] 3 2 5 0 4 4 8 3 5 1 6 8 2 3 7 6

Is what I got, however I realize the set.seed function locks in the  
number I get every time.
My question is in order to run a true random sample, wouldn't I have  
to use the runif function? And then deliminate the sample to show the  
numbers that lie between 4 and 6? If that's the case, how do I do that?


Davia S. Cox
517-575-8031 cell
[EMAIL PROTECTED]

Human potential, though not always apparent, is there waiting to be  
discovered and invited forth. -William W. Purkey

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Question

2005-12-13 Thread Davia Cox
Please disregard this message and don't post it to the web. I found  
the answer.
Thanks

Davia S. Cox
517-575-8031 cell
[EMAIL PROTECTED]

Human potential, though not always apparent, is there waiting to be  
discovered and invited forth. -William W. Purkey




On Dec 13, 2005, at 6:20 AM, Davia Cox wrote:

 Hello,

 I have a problem that I am trying to solve and I am not sure how to  
 do it in R.

 Suppose, that 16 numbers are choosen at random from 0 to 9, what's  
 the probability that their average will be between 4 and 6. I typed  
 the following code:

 set.seed(100)
 sample(0:9, 16, replace =TRUE)
[1] 3 2 5 0 4 4 8 3 5 1 6 8 2 3 7 6

 Is what I got, however I realize the set.seed function locks in the  
 number I get every time.
 My question is in order to run a true random sample, wouldn't I  
 have to use the runif function? And then deliminate the sample to  
 show the numbers that lie between 4 and 6? If that's the case, how  
 do I do that?


 Davia S. Cox
 517-575-8031 cell
 [EMAIL PROTECTED]

 Human potential, though not always apparent, is there waiting to  
 be discovered and invited forth. -William W. Purkey





__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] lmer and glmmPQL

2005-12-07 Thread Cox, Stephen
Thanks for the reply Doug!

A follow up question and comment ...

1) If I understand correctly, looking at a simple situation in which
SITES are nested in ZONES, the following should be similar.  However,
despite the same F values, the p-value from lmer is 1/2 the other
methods.  Why is this true?

 anova(lmer(RICH ~ ZONE + (1|SITE:ZONE), data))
Analysis of Variance Table
 Df Sum Sq Mean Sq  Denom F value  Pr(F)  
ZONE  4   97.824.5 6610.0  2.1046 0.07753 .

 # make the nesting explicit
 data$SinZ = with(data, ZONE:SITE)[drop=TRUE]
 anova(lme(RICH ~ ZONE, data, random = ~1 | SinZ))
numDF denDF   F-value p-value
(Intercept) 1  6600 100.38331  .0001
ZONE410   2.10459   0.155

 summary(aov(RICH ~ ZONE + Error(SITE:ZONE), data))

Error: SITE:ZONE
  Df Sum Sq Mean Sq F value Pr(F)
ZONE   4  296697417  2.1046  0.155
Residuals 10  352433524   


2) I think the anova problems with lmer may also apply to poisson.
Compare the following (which includes a covariate).  Based on the
parameter estimates, the covariate should be significant.  However, its
anova p-value is .998:

 lmer(RICH ~ ZONE + lANPP + (1|SITE:ZONE), family = poisson, data)
Generalized linear mixed model fit using PQL 
Formula: RICH ~ ZONE + lANPP + (1 | SITE:ZONE) 
   Data: data 
 Family: poisson(log link)
  AIC  BIClogLik deviance
 9700.252 9754.628 -4842.126 9684.252
Random effects:
 GroupsNameVarianceStd.Dev. 
  SITE:ZONE (Intercept)0.069493 0.26361 
# of obs: 6615, groups: SITE:ZONE, 15

Estimated scale (compare to 1)  1.183970 

Fixed effects:
  Estimate Std. Error z value Pr(|z|)
(Intercept)  1.5169605  0.1533564  9.8917   2e-16 ***
ZONE20.4034169  0.2156956  1.8703  0.06144 .  
ZONE3   -0.1772011  0.2158736 -0.8209  0.41173
ZONE4   -0.2368290  0.2158431 -1.0972  0.27254
ZONE5   -0.1011186  0.2158114 -0.4686  0.63939
lANPP0.2201926  0.0081857 26.8995   2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

 anova(lmer(RICH ~ ZONE + lANPP + (1|SITE:ZONE), family = poisson,
data))
Analysis of Variance Table
  DfSum Sq   Mean Sq Denom   F value Pr(F)
ZONE   4 2.809e-05 7.022e-06  6609 4.298e-06 1.
lANPP  1 5.229e-06 5.229e-06  6609 3.200e-06 0.9986

Thanks again for any insight you may be able to provide!!



 

 -Original Message-
 From: Douglas Bates [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, December 07, 2005 8:28 AM
 To: Cox, Stephen
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] lmer and glmmPQL
 
 On 12/5/05, Cox, Stephen [EMAIL PROTECTED] wrote:
  I have been looking into both of these approaches to conducting a 
  GLMM, and want to make sure I understand model 
 specification in each.  
  In particular - after looking at Bates' Rnews article and searching 
  through the help archives, I am unclear on the 
 specification of nested 
  factors in lmer.  Do the following statements specify the same mode 
  within each approach?
 
  m1 = glmmPQL(RICH ~ ZONE, family = poisson, data, random = ~ YEAR | 
  SITE / QUADRAT)
  m2 = lmer(RICH ~ ZONE +(YEAR|SITE)+ (YEAR|QUADRAT), family 
 = poisson,
  data)
 
 If you want to ensure that QUADRAT is nested within SITE then 
 use the interaction operator explicitly
 
 m2 - lmer(RICH ~ ZONE +(YEAR|SITE)+ (YEAR|SITE:QUADRAT), 
 family = poisson, data)
 
 For the grouping factors nested versus non-nested depends on 
 the coding.  If QUADRAT has a distinct level for each 
 SITE:QUADRAT combination then the nesting will automatically 
 be detected.  However, if the nesting is implicit (that is, 
 if levels of QUADRAT are repeated at different SITES) then it 
 is necessary to use the interaction operator.  There is no 
 harm in using the interaction operator when the nesting is explicit.
 
  As a follow up - what would be the most appropriate model formula 
  (using glmmPQL syntax) to specify both a nested facor and repeated 
  observations?  Specifically, I am dealing with experimental 
 data with 
  three factors.  ZONE is a fixed effect.  Three sites (SITE) 
 are nested 
  within each ZONE.  Multiple quadrats within each SITE are measured 
  across multiple years.  I want to represent the nesting of 
 SITE within 
  ZONE and allow for repeated observations within each 
 QUADRAT over time 
  (the YEAR | QUADRAT random effect).  -- I am assuming that 
 glmmPQL is 
  the best option at this point because of recent discussion on Rhelp 
  about issues associated with the Matrix package used in lmer (i.e., 
  the anova results do not seem to match parameter tests).
 
 
 I believe the anova problems only occur with a binomial response. 
 They are caused by my failure to use the prior.weights appropriately. 
 For a Poisson model this should not be a problem.
 
  Any information would be very much appreciated!
 
  Regards
 
  Stephen
 
  __
  R-help@stat.math.ethz.ch mailing

[R] lmer and glmmPQL

2005-12-05 Thread Cox, Stephen
I have been looking into both of these approaches to conducting a GLMM,
and want to make sure I understand model specification in each.  In
particular - after looking at Bates' Rnews article and searching through
the help archives, I am unclear on the specification of nested factors
in lmer.  Do the following statements specify the same mode within each
approach?

m1 = glmmPQL(RICH ~ ZONE, family = poisson, data, random = ~ YEAR | SITE
/ QUADRAT)
m2 = lmer(RICH ~ ZONE +(YEAR|SITE)+ (YEAR|QUADRAT), family = poisson,
data)

As a follow up - what would be the most appropriate model formula (using
glmmPQL syntax) to specify both a nested facor and repeated
observations?  Specifically, I am dealing with experimental data with
three factors.  ZONE is a fixed effect.  Three sites (SITE) are nested
within each ZONE.  Multiple quadrats within each SITE are measured
across multiple years.  I want to represent the nesting of SITE within
ZONE and allow for repeated observations within each QUADRAT over time
(the YEAR | QUADRAT random effect).  -- I am assuming that glmmPQL is
the best option at this point because of recent discussion on Rhelp
about issues associated with the Matrix package used in lmer (i.e., the
anova results do not seem to match parameter tests).

Any information would be very much appreciated!

Regards

Stephen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] read.spss problem

2005-11-28 Thread Davia Cox
Hello,
I am having trouble reading an spss file into R. I have reset my  
working directory to the folder where this file is stored. This is  
what I've typed into R and the error message I received:

+ getwd()
[1] /Users/daviacox/Graduate School/PLS 801
  read.spss(norwil.spss)
Error in read.spss(norwil.spss) : error reading portable-file  
dictionary
In addition: Warning message:
Expected variable count record

Please help, I don't have SPSS on my computer (I am re-analyzing some  
data for a statistics project) so I can't alter the file that way.

Thanks again for your help.

Davia

Davia S. Cox
Global Urban Studies Program
Graduate Assistant/Doctoral Student
305 Berkey Hall
East Lansing, MI 48825
517-353-5987 Office
517-353-6680 Fax
[EMAIL PROTECTED]

One of the penalties for refusing to participate in politics is that  
you
end up being governed by your inferiors.
  -Plato

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Fieller's Conf Limits and EC50's

2005-07-13 Thread Stephen B. Cox
Folks

I have modified an existing function to calculate 'ec/ld/lc' 50 values 
and their associated Fieller's confidence limits.  It is based on 
EC50.calc (writtien by John Bailer)  - but also borrows from the dose.p 
(MASS) function.  My goal was to make the original EC50.calc function 
flexible with respect to 1) probability at which to calculate the 
expected dose, and 2) the link function.  I would appreciate comments 
about the validity of doing so!  In particular - I want to make sure 
that the confidence limit calculations are still valid when changing the 
link function.

ec.calc-function(obj,conf.level=.95,p=.5) {

 # calculates confidence interval based upon Fieller's thm.
 # modified version of EC50.calc found in PB Fig 7.22
 # now allows other link functions, using the calculations
 # found in dose.p (MASS)
 # SBC 19 May 05

call - match.call()

 coef = coef(obj)
 vcov = summary.glm(obj)$cov.unscaled
 b0-coef[1]
 b1-coef[2]
 var.b0-vcov[1,1]
 var.b1-vcov[2,2]
 cov.b0.b1-vcov[1,2]
 alpha-1-conf.level
 zalpha.2 - -qnorm(alpha/2)
 gamma - zalpha.2^2 * var.b1 / (b1^2)
 eta = family(obj)$linkfun(p)  #based on calcs in VR's dose.p

 EC50 - (eta-b0)/b1

 const1 - (gamma/(1-gamma))*(EC50 + cov.b0.b1/var.b1)

 const2a - var.b0 + 2*cov.b0.b1*EC50 + var.b1*EC50^2 -
gamma*(var.b0 - cov.b0.b1^2/var.b1)

 const2 - zalpha.2/( (1-gamma)*abs(b1) )*sqrt(const2a)

 LCL - EC50 + const1 - const2
 UCL - EC50 + const1 + const2

 conf.pts - c(LCL,EC50,UCL)
 names(conf.pts) - c(Lower,EC50,Upper)

 return(conf.pts,conf.level,call=call)
 }


Thanks

Stephen Cox

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] To many NA's from mean(..., na.rm=T) when a column is all NA's

2005-06-13 Thread Jim Robison-Cox
Dear R-help folks,

I am seeing unexpected behaviour from the function mean
with option na.rm =TRUE (which is removing a whole column of a data frame
or matrix.

example:

testcase - data.frame( x = 1:3, y = rep(NA,3))

mean(testcase[,1], na.rm=TRUE)
[1] 2
mean(testcase[,2], na.rm = TRUE)
[1] NaN

  OK, so far that seems sensible.  Now I'd like to compute both means at
once:

  lapply(testcase, mean, na.rm=T)   ## this works
$x
[1] 2

$y
[1] NaN

  But I thought that this would also work:

apply(testcase, 2, mean, na.rm=T)
 x  y
NA NA
Warning messages:
1: argument is not numeric or logical: returning NA in:
mean.default(newX[, i], ...)
2: argument is not numeric or logical: returning NA in:
mean.default(newX[, i], ...)

 Summary:
  If I have a data frame or a matrix where one entire column is NA's,
mean(x, na.rm=T) works on that column, returning NaN, but fails using
apply, in that apply returns NA for ALL columns.
  lapply works fine on the data frame.

  If you wonder why I'm building data frames with columns that could be
all missing -- they arise as output of a simulation.  The fact that the
entire column is missing is informative in itself.


  I do wonder if this is a bug.

Thanks,
Jim

Jim Robison-Cox   
Department of Math Sciences  ||   phone: (406)994-5340
2-214 Wilson Hall \   BZN, MT |   FAX:   (406)994-1789
Montana State University   |  *___|
Bozeman, MT 59717-2400  \_|  e-mail: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] To many NA's from mean(..., na.rm=T) when a column is all NA's

2005-06-13 Thread Jim Robison-Cox
I see.
   So apply() first changes the dataframe to a matrix, which made it
a matrix of text values, then mean made no sense for any column, so I got
all NA's.

Thanks, Peter,

Jim

On 13 Jun 2005, Peter Dalgaard wrote:

 Jim Robison-Cox [EMAIL PROTECTED] writes:

   Summary:
If I have a data frame or a matrix where one entire column is NA's,
  mean(x, na.rm=T) works on that column, returning NaN, but fails using
  apply, in that apply returns NA for ALL columns.
lapply works fine on the data frame.
 
If you wonder why I'm building data frames with columns that could be
  all missing -- they arise as output of a simulation.  The fact that the
  entire column is missing is informative in itself.
 
 
I do wonder if this is a bug.

 It isn't...

 Cutting a long story short:

  testcase - data.frame( x = 1:3, y = rep(NA,3))
  as.matrix(testcase)
   x   y
 1 1 NA
 2 2 NA
 3 3 NA
  testcase - data.frame( x = 1:3, y = as.numeric(rep(NA,3)))
  as.matrix(testcase)
   x  y
 1 1 NA
 2 2 NA
 3 3 NA
  apply(testcase,2,mean,na.rm=T)
   x   y
   2 NaN




Jim Robison-Cox   
Department of Math Sciences  ||   phone: (406)994-5340
2-214 Wilson Hall \   BZN, MT |   FAX:   (406)994-1789
Montana State University   |  *___|
Bozeman, MT 59717-2400  \_|  e-mail: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html