Re: [R] Dissimilarity Analysis
Hi Birgit - looks like you have a few issues here. Birgit Lemcke birgit.lemcke at systbot.uzh.ch writes: Hello you all! I am a completely new user of R and I have a problem to solve. I am using Mac OS X on a PowerBook. I have a table that looks like this: species X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 1Anth_cap1 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 2 Anth_crin1 1 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 1 3Anth_eck1 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 4 Anth_gram1 1 0 0 1 0 1 0 0 1 NA NA NA NA 0 0 0 0 1 0 0 0 5 Anth_insi1 1 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 All columns are binary coded characters. The Import was done by this Test-read.table(TestRFemMalBivariat1.csv,header = TRUE, sep = ;) First - you need to transpose the matrix to have species as columns. You can do this with: d2 = data.frame(t(Test[,-1])) colnames(d2) = Test[,1] #now use d2 Now I try to perform a similarity analysis with the dsvdis function of the labdsv package with the sorensen-Index. My first question is if all zeros in my table are seen as missing values and if it islike that how can I change without turning zero into other numbers? no - the zeros are valid observations. the na's are missing data. DisTest-dsvdis(Test, index = sorensen) But I always get back this error message: Warnung in symbol.For(dsvdis) :'symbol.For' is not needed: please remove it Fehler in dsvdis(Test, index = sorensen) : NA/NaN/Inf in externem Funktionsaufruf (arg 1) Zusätzlich: Warning message: NAs durch Umwandlung erzeugt Second - you have an issue with missing data. It looks like dsvdis does not like the NA's - so you must make a decision about what to do. Delete that species, delete that site, or whatever... Finally - the warning over symbol.For is an issue with the labdsv library itself - nothing you are doing wrong. The results will still be valid - but the use of symbol.For is something that will eventually need to be changed in the labdsv library. hth, stephen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rearranging Capture History Data in R
What code can i use to convert a table like this: Tag#Date 1 1 2 1 3 1 4 1 2 2 4 2 1 3 2 3 4 4 Into one like this: Tag 1 2 3 4 #Date header 1 1 0 0 1 2 1 1 1 0 3 1 0 0 0 4 1 1 0 1 Thanks, Ben Cox Research Assistant (M.S.) Montana Cooperative Fishery Research Unit 301 Lewis Hall Montana State University Bozeman, MT 59717 (406)994-6643 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Qualitative Data??(String command)
I am using the read.table function to load an Excel data set into R. It has a few variables with very long qualitative (free response typically in sentences) response that I would like to keep, but would like to limit the length of the response that R shows. Is there some sort of string or column width command I can include in the read.table function to limit the length of words used in the variables that have long responses?? I do know which variable names are the ones with the qualitative responses. Will the Rcmdr program do this for me? I know STATA has this function but I don't have it on my computer to use. Davia -- Davia S. Cox Image creates desire. You will what you imagine. ~J.G. Gallimore~ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Logistic regression - confidence intervals
Please forgive a rather naïve question... Could someone please give a quick explanation for the differences in conf intervals achieved via confint.glm (based on profile liklihoods) and the intervals achieved using the Design library. For example, the intervals in the following two outputs are different. library(Design) x = rnorm(100) y = gl(2,50) d = data.frame(x = x, y = y) dd = datadist(d); options(datadist = 'dd') m1 = lrm(y~x, data = d) summary(m1) m2 = glm(y~x, family = binomial, data = d) confint(m2) I have spent time trying to figure this out via archives, but have not had much luck. Regards Stephen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Outputing dataframe or vector from within a user defined function
Hi, I have written a user defined function that carries out a monte carlo simulation and outputs various stats. I would like to access and store the simulation data (a two column local variable) from outside the function I have written. How can I output the data from the function as new variable(not simply print it), accessible outside the function? I am probably going to find that this is really simple, yet I have not been able to find the answer in (usual places, manuals, 2 x Books and mailing lists) Many Thanks Andy Cox __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Outputing dataframe or vector from within a user defined function
Hi, I have written a user defined function that carries out a monte carlo simulation and outputs various stats. I would like to access and store the simulation data (a two column local variable) from outside the function I have written. How can I output the data from the function as new variable, accessible outside the function? I am probably going to find that this is really simple, yet I have not been able to find the answer in (usual places, manuals, 2 x Books and mailing lists) Many Thanks Andy Cox __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Question
Hello, I have a problem that I am trying to solve and I am not sure how to do it in R. Suppose, that 16 numbers are choosen at random from 0 to 9, what's the probability that their average will be between 4 and 6. I typed the following code: set.seed(100) sample(0:9, 16, replace =TRUE) [1] 3 2 5 0 4 4 8 3 5 1 6 8 2 3 7 6 Is what I got, however I realize the set.seed function locks in the number I get every time. My question is in order to run a true random sample, wouldn't I have to use the runif function? And then deliminate the sample to show the numbers that lie between 4 and 6? If that's the case, how do I do that? Davia S. Cox 517-575-8031 cell [EMAIL PROTECTED] Human potential, though not always apparent, is there waiting to be discovered and invited forth. -William W. Purkey __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Question
Please disregard this message and don't post it to the web. I found the answer. Thanks Davia S. Cox 517-575-8031 cell [EMAIL PROTECTED] Human potential, though not always apparent, is there waiting to be discovered and invited forth. -William W. Purkey On Dec 13, 2005, at 6:20 AM, Davia Cox wrote: Hello, I have a problem that I am trying to solve and I am not sure how to do it in R. Suppose, that 16 numbers are choosen at random from 0 to 9, what's the probability that their average will be between 4 and 6. I typed the following code: set.seed(100) sample(0:9, 16, replace =TRUE) [1] 3 2 5 0 4 4 8 3 5 1 6 8 2 3 7 6 Is what I got, however I realize the set.seed function locks in the number I get every time. My question is in order to run a true random sample, wouldn't I have to use the runif function? And then deliminate the sample to show the numbers that lie between 4 and 6? If that's the case, how do I do that? Davia S. Cox 517-575-8031 cell [EMAIL PROTECTED] Human potential, though not always apparent, is there waiting to be discovered and invited forth. -William W. Purkey __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lmer and glmmPQL
Thanks for the reply Doug! A follow up question and comment ... 1) If I understand correctly, looking at a simple situation in which SITES are nested in ZONES, the following should be similar. However, despite the same F values, the p-value from lmer is 1/2 the other methods. Why is this true? anova(lmer(RICH ~ ZONE + (1|SITE:ZONE), data)) Analysis of Variance Table Df Sum Sq Mean Sq Denom F value Pr(F) ZONE 4 97.824.5 6610.0 2.1046 0.07753 . # make the nesting explicit data$SinZ = with(data, ZONE:SITE)[drop=TRUE] anova(lme(RICH ~ ZONE, data, random = ~1 | SinZ)) numDF denDF F-value p-value (Intercept) 1 6600 100.38331 .0001 ZONE410 2.10459 0.155 summary(aov(RICH ~ ZONE + Error(SITE:ZONE), data)) Error: SITE:ZONE Df Sum Sq Mean Sq F value Pr(F) ZONE 4 296697417 2.1046 0.155 Residuals 10 352433524 2) I think the anova problems with lmer may also apply to poisson. Compare the following (which includes a covariate). Based on the parameter estimates, the covariate should be significant. However, its anova p-value is .998: lmer(RICH ~ ZONE + lANPP + (1|SITE:ZONE), family = poisson, data) Generalized linear mixed model fit using PQL Formula: RICH ~ ZONE + lANPP + (1 | SITE:ZONE) Data: data Family: poisson(log link) AIC BIClogLik deviance 9700.252 9754.628 -4842.126 9684.252 Random effects: GroupsNameVarianceStd.Dev. SITE:ZONE (Intercept)0.069493 0.26361 # of obs: 6615, groups: SITE:ZONE, 15 Estimated scale (compare to 1) 1.183970 Fixed effects: Estimate Std. Error z value Pr(|z|) (Intercept) 1.5169605 0.1533564 9.8917 2e-16 *** ZONE20.4034169 0.2156956 1.8703 0.06144 . ZONE3 -0.1772011 0.2158736 -0.8209 0.41173 ZONE4 -0.2368290 0.2158431 -1.0972 0.27254 ZONE5 -0.1011186 0.2158114 -0.4686 0.63939 lANPP0.2201926 0.0081857 26.8995 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 anova(lmer(RICH ~ ZONE + lANPP + (1|SITE:ZONE), family = poisson, data)) Analysis of Variance Table DfSum Sq Mean Sq Denom F value Pr(F) ZONE 4 2.809e-05 7.022e-06 6609 4.298e-06 1. lANPP 1 5.229e-06 5.229e-06 6609 3.200e-06 0.9986 Thanks again for any insight you may be able to provide!! -Original Message- From: Douglas Bates [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 07, 2005 8:28 AM To: Cox, Stephen Cc: r-help@stat.math.ethz.ch Subject: Re: [R] lmer and glmmPQL On 12/5/05, Cox, Stephen [EMAIL PROTECTED] wrote: I have been looking into both of these approaches to conducting a GLMM, and want to make sure I understand model specification in each. In particular - after looking at Bates' Rnews article and searching through the help archives, I am unclear on the specification of nested factors in lmer. Do the following statements specify the same mode within each approach? m1 = glmmPQL(RICH ~ ZONE, family = poisson, data, random = ~ YEAR | SITE / QUADRAT) m2 = lmer(RICH ~ ZONE +(YEAR|SITE)+ (YEAR|QUADRAT), family = poisson, data) If you want to ensure that QUADRAT is nested within SITE then use the interaction operator explicitly m2 - lmer(RICH ~ ZONE +(YEAR|SITE)+ (YEAR|SITE:QUADRAT), family = poisson, data) For the grouping factors nested versus non-nested depends on the coding. If QUADRAT has a distinct level for each SITE:QUADRAT combination then the nesting will automatically be detected. However, if the nesting is implicit (that is, if levels of QUADRAT are repeated at different SITES) then it is necessary to use the interaction operator. There is no harm in using the interaction operator when the nesting is explicit. As a follow up - what would be the most appropriate model formula (using glmmPQL syntax) to specify both a nested facor and repeated observations? Specifically, I am dealing with experimental data with three factors. ZONE is a fixed effect. Three sites (SITE) are nested within each ZONE. Multiple quadrats within each SITE are measured across multiple years. I want to represent the nesting of SITE within ZONE and allow for repeated observations within each QUADRAT over time (the YEAR | QUADRAT random effect). -- I am assuming that glmmPQL is the best option at this point because of recent discussion on Rhelp about issues associated with the Matrix package used in lmer (i.e., the anova results do not seem to match parameter tests). I believe the anova problems only occur with a binomial response. They are caused by my failure to use the prior.weights appropriately. For a Poisson model this should not be a problem. Any information would be very much appreciated! Regards Stephen __ R-help@stat.math.ethz.ch mailing
[R] lmer and glmmPQL
I have been looking into both of these approaches to conducting a GLMM, and want to make sure I understand model specification in each. In particular - after looking at Bates' Rnews article and searching through the help archives, I am unclear on the specification of nested factors in lmer. Do the following statements specify the same mode within each approach? m1 = glmmPQL(RICH ~ ZONE, family = poisson, data, random = ~ YEAR | SITE / QUADRAT) m2 = lmer(RICH ~ ZONE +(YEAR|SITE)+ (YEAR|QUADRAT), family = poisson, data) As a follow up - what would be the most appropriate model formula (using glmmPQL syntax) to specify both a nested facor and repeated observations? Specifically, I am dealing with experimental data with three factors. ZONE is a fixed effect. Three sites (SITE) are nested within each ZONE. Multiple quadrats within each SITE are measured across multiple years. I want to represent the nesting of SITE within ZONE and allow for repeated observations within each QUADRAT over time (the YEAR | QUADRAT random effect). -- I am assuming that glmmPQL is the best option at this point because of recent discussion on Rhelp about issues associated with the Matrix package used in lmer (i.e., the anova results do not seem to match parameter tests). Any information would be very much appreciated! Regards Stephen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] read.spss problem
Hello, I am having trouble reading an spss file into R. I have reset my working directory to the folder where this file is stored. This is what I've typed into R and the error message I received: + getwd() [1] /Users/daviacox/Graduate School/PLS 801 read.spss(norwil.spss) Error in read.spss(norwil.spss) : error reading portable-file dictionary In addition: Warning message: Expected variable count record Please help, I don't have SPSS on my computer (I am re-analyzing some data for a statistics project) so I can't alter the file that way. Thanks again for your help. Davia Davia S. Cox Global Urban Studies Program Graduate Assistant/Doctoral Student 305 Berkey Hall East Lansing, MI 48825 517-353-5987 Office 517-353-6680 Fax [EMAIL PROTECTED] One of the penalties for refusing to participate in politics is that you end up being governed by your inferiors. -Plato __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Fieller's Conf Limits and EC50's
Folks I have modified an existing function to calculate 'ec/ld/lc' 50 values and their associated Fieller's confidence limits. It is based on EC50.calc (writtien by John Bailer) - but also borrows from the dose.p (MASS) function. My goal was to make the original EC50.calc function flexible with respect to 1) probability at which to calculate the expected dose, and 2) the link function. I would appreciate comments about the validity of doing so! In particular - I want to make sure that the confidence limit calculations are still valid when changing the link function. ec.calc-function(obj,conf.level=.95,p=.5) { # calculates confidence interval based upon Fieller's thm. # modified version of EC50.calc found in PB Fig 7.22 # now allows other link functions, using the calculations # found in dose.p (MASS) # SBC 19 May 05 call - match.call() coef = coef(obj) vcov = summary.glm(obj)$cov.unscaled b0-coef[1] b1-coef[2] var.b0-vcov[1,1] var.b1-vcov[2,2] cov.b0.b1-vcov[1,2] alpha-1-conf.level zalpha.2 - -qnorm(alpha/2) gamma - zalpha.2^2 * var.b1 / (b1^2) eta = family(obj)$linkfun(p) #based on calcs in VR's dose.p EC50 - (eta-b0)/b1 const1 - (gamma/(1-gamma))*(EC50 + cov.b0.b1/var.b1) const2a - var.b0 + 2*cov.b0.b1*EC50 + var.b1*EC50^2 - gamma*(var.b0 - cov.b0.b1^2/var.b1) const2 - zalpha.2/( (1-gamma)*abs(b1) )*sqrt(const2a) LCL - EC50 + const1 - const2 UCL - EC50 + const1 + const2 conf.pts - c(LCL,EC50,UCL) names(conf.pts) - c(Lower,EC50,Upper) return(conf.pts,conf.level,call=call) } Thanks Stephen Cox __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] To many NA's from mean(..., na.rm=T) when a column is all NA's
Dear R-help folks, I am seeing unexpected behaviour from the function mean with option na.rm =TRUE (which is removing a whole column of a data frame or matrix. example: testcase - data.frame( x = 1:3, y = rep(NA,3)) mean(testcase[,1], na.rm=TRUE) [1] 2 mean(testcase[,2], na.rm = TRUE) [1] NaN OK, so far that seems sensible. Now I'd like to compute both means at once: lapply(testcase, mean, na.rm=T) ## this works $x [1] 2 $y [1] NaN But I thought that this would also work: apply(testcase, 2, mean, na.rm=T) x y NA NA Warning messages: 1: argument is not numeric or logical: returning NA in: mean.default(newX[, i], ...) 2: argument is not numeric or logical: returning NA in: mean.default(newX[, i], ...) Summary: If I have a data frame or a matrix where one entire column is NA's, mean(x, na.rm=T) works on that column, returning NaN, but fails using apply, in that apply returns NA for ALL columns. lapply works fine on the data frame. If you wonder why I'm building data frames with columns that could be all missing -- they arise as output of a simulation. The fact that the entire column is missing is informative in itself. I do wonder if this is a bug. Thanks, Jim Jim Robison-Cox Department of Math Sciences || phone: (406)994-5340 2-214 Wilson Hall \ BZN, MT | FAX: (406)994-1789 Montana State University | *___| Bozeman, MT 59717-2400 \_| e-mail: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] To many NA's from mean(..., na.rm=T) when a column is all NA's
I see. So apply() first changes the dataframe to a matrix, which made it a matrix of text values, then mean made no sense for any column, so I got all NA's. Thanks, Peter, Jim On 13 Jun 2005, Peter Dalgaard wrote: Jim Robison-Cox [EMAIL PROTECTED] writes: Summary: If I have a data frame or a matrix where one entire column is NA's, mean(x, na.rm=T) works on that column, returning NaN, but fails using apply, in that apply returns NA for ALL columns. lapply works fine on the data frame. If you wonder why I'm building data frames with columns that could be all missing -- they arise as output of a simulation. The fact that the entire column is missing is informative in itself. I do wonder if this is a bug. It isn't... Cutting a long story short: testcase - data.frame( x = 1:3, y = rep(NA,3)) as.matrix(testcase) x y 1 1 NA 2 2 NA 3 3 NA testcase - data.frame( x = 1:3, y = as.numeric(rep(NA,3))) as.matrix(testcase) x y 1 1 NA 2 2 NA 3 3 NA apply(testcase,2,mean,na.rm=T) x y 2 NaN Jim Robison-Cox Department of Math Sciences || phone: (406)994-5340 2-214 Wilson Hall \ BZN, MT | FAX: (406)994-1789 Montana State University | *___| Bozeman, MT 59717-2400 \_| e-mail: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html