Re: [R] question about capscale (vegan)
Hi Gavin, I have been analyzing real data (sorry but I am not allowed to post these data here) and what I got was this, mydistmat_f.cap - capscale(distmat_f ~ F + L + F:L, mfactors_frame) Warning messages: 1: some of the first 30 eigenvalues are 0 in: cmdscale(X, k = k, eig = TRUE, add = add) 2: Se han producido NaNs in: sqrt(ev) mydistmat_f.cap Call: capscale(formula = distmat_f ~ F + L + F:L, data = mfactors_frame) Inertia Rank Total 0.3758 Constrained0.21104 Unconstrained 0.16484 Inertia is squared distance Some constraints were aliased because they were collinear (redundant) Eigenvalues for constrained axes: CAP1 CAP2 CAP3 CAP4 1.679e-01 2.954e-02 1.349e-02 1.233e-05 Eigenvalues for unconstrained axes: MDS1 MDS2 MDS3 MDS4 1.388e-01 2.601e-02 4.076e-05 2.064e-07 So, by these results I can tell that there are 4 axes that explain 0.1648 of the total variance and another 4 axes that explain 0.2110 of the total variance. But I don't understand the difference between constrained and unconstrained. anova(mydistmat_f.cap) Permutation test for capscale under direct model Model: capscale(formula = distmat_f ~ F + L + F:L, data = mfactors_frame) DfVar F N.Perm Pr(F) Model 4 0.21 1.2798 400.00 0.0875 . Residual 4 0.16 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 summary(anova(mydistmat_f.cap)) Df Var F N.PermPr(F) Min. :4 Min. :0.1648 Min. :1.280 Min. :200 Min. :0.12 1st Qu.:4 1st Qu.:0.1764 1st Qu.:1.280 1st Qu.:200 1st Qu.:0.12 Median :4 Median :0.1879 Median :1.280 Median :200 Median :0.12 Mean :4 Mean :0.1879 Mean :1.280 Mean :200 Mean :0.12 3rd Qu.:4 3rd Qu.:0.1994 3rd Qu.:1.280 3rd Qu.:200 3rd Qu.:0.12 Max. :4 Max. :0.2110 Max. :1.280 Max. :200 Max. :0.12 NA's :1.000 NA's : 1 NA's :1.00 Then, I want to know the sum of squares of anova to check with other analysis that we performed but I can't see them by the output of anova. Besides, I am wondering if there is any manner to identify the main effects, factor effects and interaction in this anova analysis. I would be very grateful if you could help me to understand these results. Thank you very much, Alicia __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about capscale (vegan)
On Mon, 2006-11-27 at 15:37 +0100, Alicia Amadoz wrote: Hi Gavin, I have been analyzing real data (sorry but I am not allowed to post these data here) and what I got was this, mydistmat_f.cap - capscale(distmat_f ~ F + L + F:L, mfactors_frame) I believe you can write that formula as: distmat_f ~ F * L Warning messages: 1: some of the first 30 eigenvalues are 0 in: cmdscale(X, k = k, eig = TRUE, add = add) 2: Se han producido NaNs in: sqrt(ev) Sorry, I don't know enough about this method to know whether this a problem you should worry about or not. You should read up on the method some more to decide if the first warning is something you should be worried about. IIRC, negative eigenvalues are to be expected with this method as they are handled explicitly by capscale, and as this is a warning coming from cmdscale(), I suspect it is a helpful feature of that function, which you don't need to worry about when used in capscale(). mydistmat_f.cap Call: capscale(formula = distmat_f ~ F + L + F:L, data = mfactors_frame) Inertia Rank Total 0.3758 Constrained0.21104 Unconstrained 0.16484 Inertia is squared distance Some constraints were aliased because they were collinear (redundant) Eigenvalues for constrained axes: CAP1 CAP2 CAP3 CAP4 1.679e-01 2.954e-02 1.349e-02 1.233e-05 Eigenvalues for unconstrained axes: MDS1 MDS2 MDS3 MDS4 1.388e-01 2.601e-02 4.076e-05 2.064e-07 So, by these results I can tell that there are 4 axes that explain 0.1648 of the total variance and another 4 axes that explain 0.2110 of the total variance. But I don't understand the difference between constrained and unconstrained. The constrained axes are axes that are linear combinations of your explanatory variables (F, L and F:L), so this is the bit of your genomic data that is explained by those explanatory factors. The unconstrained bit is the remaining variance not explained, and are MDS (PCoord) axes. So you can explain c. 56% of the variance in your genomic data with F, L, and F:L. Note the warning about aliased constraints - this means that at least the variance of one variable in the model (inc interactions) is completely correlated with another variable (or combination of variables?) and is redundant. Type alias(mydistmat_f.cap) to see which coefficients are aliased and ?alias to see what this means. anova(mydistmat_f.cap) Permutation test for capscale under direct model Model: capscale(formula = distmat_f ~ F + L + F:L, data = mfactors_frame) DfVar F N.Perm Pr(F) Model 4 0.21 1.2798 400.00 0.0875 . Residual 4 0.16 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 summary(anova(mydistmat_f.cap)) Df Var F N.PermPr(F) Min. :4 Min. :0.1648 Min. :1.280 Min. :200 Min. :0.12 1st Qu.:4 1st Qu.:0.1764 1st Qu.:1.280 1st Qu.:200 1st Qu.:0.12 Median :4 Median :0.1879 Median :1.280 Median :200 Median :0.12 Mean :4 Mean :0.1879 Mean :1.280 Mean :200 Mean :0.12 3rd Qu.:4 3rd Qu.:0.1994 3rd Qu.:1.280 3rd Qu.:200 3rd Qu.:0.12 Max. :4 Max. :0.2110 Max. :1.280 Max. :200 Max. :0.12 NA's :1.000 NA's : 1 NA's :1.00 Then, I want to know the sum of squares of anova to check with other analysis that we performed but I can't see them by the output of anova. Besides, I am wondering if there is any manner to identify the main effects, factor effects and interaction in this anova analysis. I would be very grateful if you could help me to understand these results. There isn't a summary method for anova.cca, and anyway, this anova isn't working on sums of squares, but on other measures of variance. It is a permutation test, and simply works out with brute force how likely you are to have a model explaining 56% of the total variance given your sample size and model complexity, under a null/random model. It sounds like you haven't grasped fully the fundamentals of the methods you are employing, and I would strongly advise you to do some more reading up on these methods. I can, at best, only guide you as I am not that familiar with the technique myself. A good start would be the refs in ?capscale and then search for papers that cite Anderson Willis and that use the methodology. Thank you very much, Alicia HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC ENSIS, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Re: [R] question about capscale (vegan)
Hello, Thank you for your help. I have tried to perform the analysis I wanted with data of example, I mean not real data because I can't provide it here. So, what I have tried is this, matrix [,1] [,2] [,3] [1,] 0.00 0.13 0.59 [2,] 0.13 0.00 0.55 [3,] 0.59 0.55 0.00 dist_mat 12 2 0.13 3 0.59 0.55 # here, distance matrix is calculated from percentaje of different nucleic acids between two sequences and R is not used to perform it. The original data would be like this: n1 n2 n3 n4 n5 n6 n7 n8 n9 n10 n11 n12 m1 A C G T A G C T A C T A m2 G C T A T G C T A C T A m3 G A G T A G C T A C T A factors_frame time regioncity 1 2006 europe london 2 2005 africa nairobi 3 2005 europe paris my.cap - capscale(dist_mat ~ time + region + time:region + region:city + time:region:city, factors_frame) my.cap Call: capscale(formula = dist_mat ~ time + region + time:region + region:city + time:region:city, data = factors_frame) Inertia Rank Total 0.445 Constrained 0.4452 Inertia is squared distance Some constraints were aliased because they were collinear (redundant) Eigenvalues for constrained axes: CAP1CAP2 0.42978 0.01522 anova(my.cap) Erro en `names-.default`(`*tmp*`, value = Residual) : se intenta especificar un atributo en un NULL Then, I am still concerned about 'comm' argument since I don't understand how important could it be for my type of data and I don't understand to what it referes in my data. Another thing, is that what I am really interested in is to perform a factorial anova with another factor nested (the model I have provided above), and as you can see R gives an error that I don't understand either. Thank you for your help in advance. Regards, Alicia On Thu, 2006-11-16 at 17:25 +0100, Alicia Amadoz wrote: Hello, I am interested in using the capscale function of vegan package of R. I already have a dissimilarity matrix and I am intended to use it as 'distance' argument. But then, I don't know what kind of data must be in 'comm' argument. I don't understand what type of data must be referred as 'species scores' and 'community data frame' since my data refer to nucleic distances between different sequences. No, that is all wrong. Read ?capscale more closely! It says that you need to use the formula to describe the model. distance is used to tell capscale which distance coefficient to use if the LHS of the model formula is a community matrix. Argument comm is used to tell capscale where to find the species matrix that will be used to determine species scores in the analysis, *if* the LHS of the formula is a distance matrix. comm isn't used if the LHS is a data frame, and distance is ignored if the LHS is a distance matrix. As you don't provide a reproducible example of your problem, I will use the inbuilt example from ?capscale ## load some data data(varespec) data(varechem) Now if you want to fit a capscale model using the raw species data, then you would describe the model as so: vare.cap - capscale(varespec ~ N + P + K + Condition(Al), data = varechem, distance = bray) vare.cap In the above, LHS of formula is a data frame so capscale looks to argument distance for the name of the coefficient to turn it into a distance matrix. The terms on the RHS of the formula are variables looked up in the object assigned to the data argument. Now lets alter this to start with a dissimilarity/distance matrix instead. The exact complement of the above would be: dist.mat - vegdist(varespec, method = bray) vare.cap2 - capscale(dist.mat ~ N + P + K + Condition(Al), data = varechem, comm = varespec) vare.cap2 To explain the above example; first create the Bray Curtis distance matrix (dist.mat). Then use this on the LHS of the formula. When capscale now wants to calculate the species scores of the analysis it will look to argument comm to use in the calculation; which in this case we specify is the original species matrix varespec. As for what are species scores, well this is a throw back to the origins of the package and the methods included - all of this is related to ecology and mainly vegetation analysis (hence vegan). For species scores, read variable scores. The distance matrix (however calculated) describes how similar your individual sites (read samples) are to one another. You can also display information about the variables used to determine those distances/similarities, and this is what is meant by species scores. Whatever you used to generate the distance matrix, the columns represent
Re: [R] question about capscale (vegan)
On Fri, 2006-11-17 at 12:18 +0100, Alicia Amadoz wrote: Hello, Thank you for your help. I have tried to perform the analysis I wanted with data of example, I mean not real data because I can't provide it here. So, what I have tried is this, Hi Alicia, It would have been more helpful if you'd included the actual commands to generate each object, but thanks for including an example. dat - matrix(c(0.00,0.13,0.59,0.13,0.00,0.55,0.59,0.55,0.00), ncol = 3) dist.mat - as.dist(dat) dist.mat 12 2 0.13 3 0.59 0.55 time - as.factor(c(2006, 2005, 2005)) region - as.factor(c(europe, africa, europe)) city - as.factor(c(london, nairobi, paris)) factors.frame - data.frame(time, region, city) my.cap - capscale(dist.mat ~ time + region + time:region + region:city + time:region:city, factors.frame) my.cap So, stop here. Look at the output. You can extract 2 constrained axes that explain 100% of the variance in your data. This causes my.cap$CA to be NULL, which is why when you do: anova(my.cap) You get this error message: Error in `names-.default`(`*tmp*`, value = Residual) : attempt to set an attribute on NULL The error has nothing to do with providing comm or not (I think) as I don't see how this would alter my.cap$CA, and anyway, comm is used to generate species scores and if you look at summary(my.cap) you will see that you have species scores (though their meaning may be hard to understand if no comm provided - see ?capscale) I hesitate to call this a bug in capscale() or permutest.cca() (this is where the error comes from by the way: traceback() 5: `names-.default`(`*tmp*`, value = Residual) 4: `names-`(`*tmp*`, value = Residual) 3: permutest.cca(object, step, ...) 2: anova.cca(my.cap) 1: anova(my.cap) ), but anova.cca doesn't seem to handle situations where there isn't an unconstrained component. I've CC'd Jari Oksanen, the author of vegan to insure he sees this. This error is related to the specific dummy problem you sent - do you get this error when you run the analysis on your full data set? If so, you might want to consider removing some constraints as your model isn't really constrained anymore. As number constraints approaches number sites the constraint on the ordination drops away and you are back to a Principal Coordinates Analysis (IIRC) of your dissimilarity matrix. anova(my.cap) Erro en `names-.default`(`*tmp*`, value = Residual) : se intenta especificar un atributo en un NULL Then, I am still concerned about 'comm' argument since I don't understand how important could it be for my type of data and I don't understand to what it referes in my data. Another thing, is that what I am really interested in is to perform a factorial anova with another factor nested (the model I have provided above), and as you can see R gives an error that I don't understand either. As for your original data - by the looks of it, you wouldn't be able to use that as the argument to comm. It would need to be numeric and recoded etc. before you could use it, and how to do that in the best way I'm not sure. But in this instance, if you are interested in the samples and how they relate to one another, constrained by your factors_frame, then you don't need comm and you can proceed without it, and not bother displaying species scores. If you are interested in how the samples relate to one another and how the nucleic acids relate to one another and the samples, constrained by your factors_frame, then you will need to recode that example matrix into something numeric, and even then it may not be possible with the way capscale is written. Hope this helps, G Thank you for your help in advance. Regards, Alicia On Thu, 2006-11-16 at 17:25 +0100, Alicia Amadoz wrote: Hello, I am interested in using the capscale function of vegan package of R. I already have a dissimilarity matrix and I am intended to use it as 'distance' argument. But then, I don't know what kind of data must be in 'comm' argument. I don't understand what type of data must be referred as 'species scores' and 'community data frame' since my data refer to nucleic distances between different sequences. No, that is all wrong. Read ?capscale more closely! It says that you need to use the formula to describe the model. distance is used to tell capscale which distance coefficient to use if the LHS of the model formula is a community matrix. Argument comm is used to tell capscale where to find the species matrix that will be used to determine species scores in the analysis, *if* the LHS of the formula is a distance matrix. comm isn't used if the LHS is a data frame, and distance is ignored if the LHS is a distance matrix. As you don't provide a reproducible example of your problem, I will use the inbuilt example from ?capscale ## load some data data(varespec) data(varechem) Now if you want to fit a capscale model using the raw species data,
Re: [R] question about capscale (vegan)
Hello Gavin, Thank you very much for your help. I'm sorry I forgot to include all commands that I used but next time I will try to write all of them. I will try with my real data and see how it goes. I think I finally have understood how capscale works with this kind of data. Thank you. Regards, Alicia On Fri, 2006-11-17 at 12:18 +0100, Alicia Amadoz wrote: Hello, Thank you for your help. I have tried to perform the analysis I wanted with data of example, I mean not real data because I can't provide it here. So, what I have tried is this, Hi Alicia, It would have been more helpful if you'd included the actual commands to generate each object, but thanks for including an example. dat - matrix(c(0.00,0.13,0.59,0.13,0.00,0.55,0.59,0.55,0.00), ncol = 3) dist.mat - as.dist(dat) dist.mat 12 2 0.13 3 0.59 0.55 time - as.factor(c(2006, 2005, 2005)) region - as.factor(c(europe, africa, europe)) city - as.factor(c(london, nairobi, paris)) factors.frame - data.frame(time, region, city) my.cap - capscale(dist.mat ~ time + region + time:region + region:city + time:region:city, factors.frame) my.cap So, stop here. Look at the output. You can extract 2 constrained axes that explain 100% of the variance in your data. This causes my.cap$CA to be NULL, which is why when you do: anova(my.cap) You get this error message: Error in `names-.default`(`*tmp*`, value = Residual) : attempt to set an attribute on NULL The error has nothing to do with providing comm or not (I think) as I don't see how this would alter my.cap$CA, and anyway, comm is used to generate species scores and if you look at summary(my.cap) you will see that you have species scores (though their meaning may be hard to understand if no comm provided - see ?capscale) I hesitate to call this a bug in capscale() or permutest.cca() (this is where the error comes from by the way: traceback() 5: `names-.default`(`*tmp*`, value = Residual) 4: `names-`(`*tmp*`, value = Residual) 3: permutest.cca(object, step, ...) 2: anova.cca(my.cap) 1: anova(my.cap) ), but anova.cca doesn't seem to handle situations where there isn't an unconstrained component. I've CC'd Jari Oksanen, the author of vegan to insure he sees this. This error is related to the specific dummy problem you sent - do you get this error when you run the analysis on your full data set? If so, you might want to consider removing some constraints as your model isn't really constrained anymore. As number constraints approaches number sites the constraint on the ordination drops away and you are back to a Principal Coordinates Analysis (IIRC) of your dissimilarity matrix. anova(my.cap) Erro en `names-.default`(`*tmp*`, value = Residual) : se intenta especificar un atributo en un NULL Then, I am still concerned about 'comm' argument since I don't understand how important could it be for my type of data and I don't understand to what it referes in my data. Another thing, is that what I am really interested in is to perform a factorial anova with another factor nested (the model I have provided above), and as you can see R gives an error that I don't understand either. As for your original data - by the looks of it, you wouldn't be able to use that as the argument to comm. It would need to be numeric and recoded etc. before you could use it, and how to do that in the best way I'm not sure. But in this instance, if you are interested in the samples and how they relate to one another, constrained by your factors_frame, then you don't need comm and you can proceed without it, and not bother displaying species scores. If you are interested in how the samples relate to one another and how the nucleic acids relate to one another and the samples, constrained by your factors_frame, then you will need to recode that example matrix into something numeric, and even then it may not be possible with the way capscale is written. Hope this helps, G Thank you for your help in advance. Regards, Alicia On Thu, 2006-11-16 at 17:25 +0100, Alicia Amadoz wrote: Hello, I am interested in using the capscale function of vegan package of R. I already have a dissimilarity matrix and I am intended to use it as 'distance' argument. But then, I don't know what kind of data must be in 'comm' argument. I don't understand what type of data must be referred as 'species scores' and 'community data frame' since my data refer to nucleic distances between different sequences. No, that is all wrong. Read ?capscale more closely! It says that you need to use the formula to describe the model. distance is used to tell capscale which distance coefficient to use if the LHS of the model formula is a community matrix. Argument comm is used to tell capscale where to find the species matrix that will be used to determine
Re: [R] question about capscale (vegan)
On Fri, 2006-11-17 at 12:26 +, Gavin Simpson wrote: On Fri, 2006-11-17 at 12:18 +0100, Alicia Amadoz wrote: Hello, Thank you for your help. I have tried to perform the analysis I wanted with data of example, I mean not real data because I can't provide it here. So, what I have tried is this, Hi Alicia, It would have been more helpful if you'd included the actual commands to generate each object, but thanks for including an example. dat - matrix(c(0.00,0.13,0.59,0.13,0.00,0.55,0.59,0.55,0.00), ncol = 3) dist.mat - as.dist(dat) dist.mat 12 2 0.13 3 0.59 0.55 time - as.factor(c(2006, 2005, 2005)) region - as.factor(c(europe, africa, europe)) city - as.factor(c(london, nairobi, paris)) factors.frame - data.frame(time, region, city) my.cap - capscale(dist.mat ~ time + region + time:region + region:city + time:region:city, factors.frame) my.cap So, stop here. Look at the output. You can extract 2 constrained axes that explain 100% of the variance in your data. This causes my.cap$CA to be NULL, which is why when you do: anova(my.cap) You get this error message: Error in `names-.default`(`*tmp*`, value = Residual) : attempt to set an attribute on NULL The error has nothing to do with providing comm or not (I think) as I don't see how this would alter my.cap$CA, and anyway, comm is used to generate species scores and if you look at summary(my.cap) you will see that you have species scores (though their meaning may be hard to understand if no comm provided - see ?capscale) I hesitate to call this a bug in capscale() or permutest.cca() (this is where the error comes from by the way: traceback() 5: `names-.default`(`*tmp*`, value = Residual) 4: `names-`(`*tmp*`, value = Residual) 3: permutest.cca(object, step, ...) 2: anova.cca(my.cap) 1: anova(my.cap) ), but anova.cca doesn't seem to handle situations where there isn't an unconstrained component. I've CC'd Jari Oksanen, the author of vegan to insure he sees this. Dear y'all, I agree with this analysis: you have no residual (unconstrained) variation and this means that you cannot have a significance test. I have always known this, but I haven't cared about this issue: you ask for an impossible analysis and get an error message. The only thing that could be called as a bug is the text of the error message, and I may change that. After this you still cannot perform anova when there is no residual variation, but the error message would change. You have two roads to go if you still want to have an analysis like this: 1. Like Gavin suggested, just reduce the number of constraints so that your model has an unconstrained component, and you will be able to run the tests. 2. Perform an unconstrained analysis (cmdscale, prcomp, princomp, or rda in this case), fit the environmental variables to this solution and analyses the significances of fitted vectors. This all is is doable using envfit() function in vegan. Cheers, jari oksanen This error is related to the specific dummy problem you sent - do you get this error when you run the analysis on your full data set? If so, you might want to consider removing some constraints as your model isn't really constrained anymore. As number constraints approaches number sites the constraint on the ordination drops away and you are back to a Principal Coordinates Analysis (IIRC) of your dissimilarity matrix. anova(my.cap) Erro en `names-.default`(`*tmp*`, value = Residual) : se intenta especificar un atributo en un NULL Then, I am still concerned about 'comm' argument since I don't understand how important could it be for my type of data and I don't understand to what it referes in my data. Another thing, is that what I am really interested in is to perform a factorial anova with another factor nested (the model I have provided above), and as you can see R gives an error that I don't understand either. As for your original data - by the looks of it, you wouldn't be able to use that as the argument to comm. It would need to be numeric and recoded etc. before you could use it, and how to do that in the best way I'm not sure. But in this instance, if you are interested in the samples and how they relate to one another, constrained by your factors_frame, then you don't need comm and you can proceed without it, and not bother displaying species scores. If you are interested in how the samples relate to one another and how the nucleic acids relate to one another and the samples, constrained by your factors_frame, then you will need to recode that example matrix into something numeric, and even then it may not be possible with the way capscale is written. Hope this helps, G Thank you for your help in advance. Regards, Alicia On Thu, 2006-11-16 at 17:25 +0100, Alicia Amadoz wrote: Hello, I am interested in using the capscale function of vegan package
[R] question about capscale (vegan)
Hello, I am interested in using the capscale function of vegan package of R. I already have a dissimilarity matrix and I am intended to use it as 'distance' argument. But then, I don't know what kind of data must be in 'comm' argument. I don't understand what type of data must be referred as 'species scores' and 'community data frame' since my data refer to nucleic distances between different sequences. I would be very grateful if you could help me with this fact in any manner. Thank you in advance for your help. Regards, Alicia __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about capscale (vegan)
Hi Alicia, On 11/16/06, Alicia Amadoz [EMAIL PROTECTED] wrote: 'comm' argument. I don't understand what type of data must be referred as 'species scores' and 'community data frame' since my data refer to nucleic distances between different sequences. comm would be the original data from which you calculated the dissimilarity matrix, so that scores can be calculated for the individual variables. These analyses were designed for use with vegetation data in the form of a matrix with sites as rows and species as columns, and containing some measure of abundance for each species at each site. If you don't have an original data frame, that is, your data come only in the form of distances, you will need a different implementation of constrained ordination. Alternately, you could possibly modify the function to skip the species scores step. Sarah -- Sarah Goslee http://www.stringpage.com http://www.astronomicum.com http://www.functionaldiversity.org __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about capscale (vegan)
Sorry, one additional note: You don't need to specify comm to use capscale. Ignore what I said about modifying the function. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about capscale (vegan)
On Thu, 2006-11-16 at 17:25 +0100, Alicia Amadoz wrote: Hello, I am interested in using the capscale function of vegan package of R. I already have a dissimilarity matrix and I am intended to use it as 'distance' argument. But then, I don't know what kind of data must be in 'comm' argument. I don't understand what type of data must be referred as 'species scores' and 'community data frame' since my data refer to nucleic distances between different sequences. No, that is all wrong. Read ?capscale more closely! It says that you need to use the formula to describe the model. distance is used to tell capscale which distance coefficient to use if the LHS of the model formula is a community matrix. Argument comm is used to tell capscale where to find the species matrix that will be used to determine species scores in the analysis, *if* the LHS of the formula is a distance matrix. comm isn't used if the LHS is a data frame, and distance is ignored if the LHS is a distance matrix. As you don't provide a reproducible example of your problem, I will use the inbuilt example from ?capscale ## load some data data(varespec) data(varechem) Now if you want to fit a capscale model using the raw species data, then you would describe the model as so: vare.cap - capscale(varespec ~ N + P + K + Condition(Al), data = varechem, distance = bray) vare.cap In the above, LHS of formula is a data frame so capscale looks to argument distance for the name of the coefficient to turn it into a distance matrix. The terms on the RHS of the formula are variables looked up in the object assigned to the data argument. Now lets alter this to start with a dissimilarity/distance matrix instead. The exact complement of the above would be: dist.mat - vegdist(varespec, method = bray) vare.cap2 - capscale(dist.mat ~ N + P + K + Condition(Al), data = varechem, comm = varespec) vare.cap2 To explain the above example; first create the Bray Curtis distance matrix (dist.mat). Then use this on the LHS of the formula. When capscale now wants to calculate the species scores of the analysis it will look to argument comm to use in the calculation; which in this case we specify is the original species matrix varespec. As for what are species scores, well this is a throw back to the origins of the package and the methods included - all of this is related to ecology and mainly vegetation analysis (hence vegan). For species scores, read variable scores. The distance matrix (however calculated) describes how similar your individual sites (read samples) are to one another. You can also display information about the variables used to determine those distances/similarities, and this is what is meant by species scores. Whatever you used to generate the distance matrix, the columns represent the info used to generate the species scores. If some of this still isn't clear, email the list with the commands used to generate your distance matrix in R and I'll have a go at explaining this with reference to your data/example. I would be very grateful if you could help me with this fact in any manner. Thank you in advance for your help. Regards, Alicia HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC ENSIS, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about capscale (vegan)
Nice catch, Gavin - I missed that part of the original post. The nucleic distances need to be included as the left-hand-side of the formula, not as the distance argument. comm is still optional, though, but it's not a good idea to omit it if there's any way you can provide the original data. From the help: If this is not supplied, the ``species scores'' are the axes of initial metric scaling ('cmdscale') and may be confusing. I don't know if it's true in this case, but there are applications where there is no data matrix - the distances themselves are the original data. I don't know offhand of any other constrained ordination functions in R that will easily accomodate a precalculated distance matrix, but I expect there are some somewhere. My usual approach is to use metric or nonmetric multidimensional scaling plus vector fitting, but the assumptions behind that are different than those underlying a constrained ordination. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.