[R-sig-eco] Problem in Illustrating ZI GLMM
-0.206450.15959 -1.294 0.1958 basal_area_m2 -0.711710.14941 -4.763 1.90e-06 *** tree_density 0.333220.15064 2.212 0.0270 * das_mm:dif_alt_dossel_m 0.556600.12694 4.385 1.16e-05 *** elevation_m:hand_m -0.334240.13648 -2.449 0.0143 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 -- Dr. Alexandre F. Souza Professor Associado Departamento de Ecologia/CB Universidade Federal do Rio Grande do Norte Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.esferacientifica.com.br https://www.youtube.com/user/alexfadigas http://www.docente.ufrn.br/alexsouza orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza> [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] GLMM: AIC worsens but BIC improves
Dear all, First I would like to say thank you for replying to my last posts, your replies have halped me a lot and complemented the formal sources I have been consulting. I would like very much your opinion on a further issue below, if possible. I am using glmmTMB to find the fixed and random variables that influence whether açai palms produce fruit or not in the Amazon. So far so good, I ended upt with a much simpler model that basically contains the year (2 years only), habitat type (3 levels, Grupos_Finais below) and density of adult palms (R_dens below). This is model 8. When I try to further simplify this model reducing it to only year + habitat or year alone, I get an worsened AIC but an improved BIC... When I fit model 8 with REML, the significance of adult palm density disappear (I have been making model comparison manually with ML untill now). Together, these results make me pend to exclude habitat type and adult density, but I am intrigued with worsened AIC. What do you think? modelo8.glmm = glmmTMB(reproducao_bin ~ 0 + Grupos_Finais + ano_medida + R_dens_total + (1|numero) + (1 |bloco/parcela), family = binomial, data = dados) > summary(modelo8.glmm) Family: binomial ( logit ) Formula: reproducao_bin ~ 0 + Grupos_Finais + ano_medida + R_dens_total + (1 | numero) + (1 | bloco/parcela) Data: dados AIC BIC logLik deviance df.resid 1665.5 1707.6 -824.8 1649.5 1420 Random effects: Conditional model: GroupsNameVariance Std.Dev. numero(Intercept) 3.0944 1.7591 parcela:bloco (Intercept) 0.7850 0.8860 bloco (Intercept) 0.7421 0.8615 Number of obs: 1428, groups: numero, 762; parcela:bloco, 147; bloco, 10 Conditional model: Estimate Std. Error z value Grupos_FinaisDisturbed Unflooded Forests -0.014900.57713 -0.026 Grupos_FinaisFlooded Open Forests 1.923200.46866 4.104 Grupos_FinaisOld Growth Unflooded Forests 1.113600.72798 1.530 ano_medida2017-1.735440.18708 -9.276 R_dens_total -0.037490.01542 -2.430 Pr(>|z|) Grupos_FinaisDisturbed Unflooded Forests0.9794 Grupos_FinaisFlooded Open Forests 4.07e-05 *** Grupos_FinaisOld Growth Unflooded Forests 0.1261 ano_medida2017 < 2e-16 *** R_dens_total0.0151 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 modelo9.glmm = glmmTMB(reproducao_bin ~ 0 + Grupos_Finais + ano_medida + (1|numero) + (1 |bloco/parcela), family = binomial, data = dados) modelo10.glmm = glmmTMB(reproducao_bin ~ ano_medida + (1|numero) + (1 |bloco/parcela), family = binomial, data = dados) > AIC(modelo8.glmm,modelo9.glmm,modelo10.glmm) df AIC modelo8.glmm 8 1665.501 modelo9.glmm 7 1669.163 modelo10.glmm 5 1671.159 > BIC(modelo8.glmm,modelo9.glmm,modelo10.glmm) df BIC modelo8.glmm 8 1707.613 modelo9.glmm 7 1706.011 modelo10.glmm 5 1697.479 Perhaps report bot AIC and BIC and decide for the simpler model?... Thank you very much for any inputs, Cheers, Alexandre -- Dr. Alexandre F. Souza Professor Associado Departamento de Ecologia/CB Universidade Federal do Rio Grande do Norte Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.esferacientifica.com.br https://www.youtube.com/user/alexfadigas http://www.docente.ufrn.br/alexsouza orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza> [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] How to set Manually the Reference Level in glmmTMB
Dear all, I am running a glmm with glmmTMB in which the fixed part has two factor varialbes, year (ano_medida) and habitat type (Grupos_Finais below). Since I have only two years I do not mind that the model takes the first year as a baseline. However, no habitat type is a control or a reference, and I would like to see results for all levels of this variable. I would like to know how to manually do this, but I am only finding in the internet resources to change the reference level, not to cancel it. Could anyone help me with this? modelo2.glmm = glmmTMB(reproducao_bin ~ ano_medida + Basal_Area_m2*Grupos_Finais + R_dens_total*Grupos_Finais + (1|numero) + (1 |bloco/transecto/parcela), family = binomial, data = dados) My data: > str(dados) 'data.frame': 1428 obs. of 12 variables: $ numero : int 4006 4009 4010 4020 4022 4024 4025 4027 4035 4036 ... $ bloco : int 1 1 1 1 1 1 1 1 1 1 ... $ transecto : int 1 1 1 1 1 1 1 1 1 1 ... $ parcela : int 1 1 1 1 1 1 1 1 1 1 ... $ Tree_Height_m_Fuste_Parc: num 8.32 8.32 8.32 8.32 8.32 ... $ Grupos_Finais : Factor w/ 3 levels "Disturbed Unflooded Forests",..: 1 1 1 1 1 1 1 1 1 1 ... $ ano_medida : Factor w/ 2 levels "2016","2017": 1 1 1 1 1 1 1 1 1 1 ... $ Basal_Area_m2 : num 0.182 0.182 0.182 0.182 0.182 ... $ das_mm : num 185 161 167 114 164 ... $ reproducao : int 3 0 0 0 0 0 0 0 0 0 ... $ reproducao_bin : int 1 0 0 0 0 0 0 0 0 0 ... $ R_dens_total: int 31 31 31 31 31 31 31 31 31 31 ... > Many thanks! Alexandre -- Dr. Alexandre F. Souza Professor Associado Departamento de Ecologia/CB Universidade Federal do Rio Grande do Norte Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.esferacientifica.com.br https://www.youtube.com/user/alexfadigas http://www.docente.ufrn.br/alexsouza orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza> [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] GLMM: Should Integers be Levels of Dr. Alexandre F. Souza Prof. Associado a Factor?
Hello all, Judging by some sources, it seems to me that before running a GLMM I should make all categorical variables into factors, including numerical ones that are integers like binary response variables, plot numbers, block levels, sampling year and the ID of individual measured plants, even if these variables are correctly identified as integers as in the example below. Is this understanding correct in your opinion? Thank you very much for any reply, Alexandre > str(dados) 'data.frame': 1428 obs. of 12 variables: $ numero : int 4006 4009 4010 4020 4022 4024 4025 4027 4035 4036 ... $ bloco : int 1 1 1 1 1 1 1 1 1 1 ... $ transecto : int 1 1 1 1 1 1 1 1 1 1 ... $ parcela : int 1 1 1 1 1 1 1 1 1 1 ... $ Tree_Height_m_Fuste_Parc: num 8.32 8.32 8.32 8.32 8.32 ... $ Grupos_Finais : Factor w/ 3 levels "Disturbed Unflooded Forests",..: 1 1 1 1 1 1 1 1 1 1 ... $ ano_medida : int 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ... $ Basal_Area_m2 : num 0.182 0.182 0.182 0.182 0.182 ... $ das_mm : num 185 161 167 114 164 ... $ reproducao : int 3 0 0 0 0 0 0 0 0 0 ... $ reproducao_bin : int 1 0 0 0 0 0 0 0 0 0 ... $ R_dens_total: int 31 31 31 31 31 31 31 31 31 31 ... -- Dr. Alexandre F. Souza Professor Associado Chefe do Departamento de Ecologia Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.esferacientifica.com.br https://www.youtube.com/user/alexfadigas http://www.docente.ufrn.br/alexsouza orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza> [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] ErrError: Non-numeric argument to binary operator, despite values being numeric
Hello all, In the following code I'm attempting to perform a logistic GLMM with a binary dependent variable (whether palms reproduced or not), a fixed part (with habitat types, plot basal area and year or measurement) and a random part with measured individuals (numero) nested within plots (parcelas) nested within blocks (blocos). However, I am getting an error message telling me Error in Formula + (1 | numero) : non-numeric argument to binary operator I tried to repeat the random parte with the entire nested structure and with each of its components alone, and the message is the same (it is repeated for each random component when I try them alone), despite de fact that all of them are numeric/integers. I tried to change them to dados$numero = as.numeric(dados.numero) but got the same message. There are no NAs in the data, I excluded them all: str(dados) 'data.frame': 1428 obs. of 12 variables: $ numero : num 4006 4009 4010 4020 4022 ... $ bloco : int 1 1 1 1 1 1 1 1 1 1 ... $ transecto : int 1 1 1 1 1 1 1 1 1 1 ... $ parcela : int 1 1 1 1 1 1 1 1 1 1 ... $ Tree_Height_m_Fuste_Parc: num 8.32 8.32 8.32 8.32 8.32 ... $ Grupos_Finais : Factor w/ 3 levels "Disturbed Unflooded Forests",..: 1 1 1 1 1 1 1 1 1 1 ... $ ano_medida : int 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ... $ Basal_Area_m2 : num 0.182 0.182 0.182 0.182 0.182 ... $ das_mm : num 185 161 167 114 164 ... $ reproducao : int 3 0 0 0 0 0 0 0 0 0 ... $ reproducao_bin : int 1 0 0 0 0 0 0 0 0 0 ... $ R_dens_total: int 31 31 31 31 31 31 31 31 31 31 ... dados$Grupos_Finais = factor(dados$Grupos_Finais) dados$numero = as.numeric(dados$numero) # Model beyond idea for further selection Formula = formula(reproducao_bin ~ ano_medida + Basal_Area_m2*Grupos_Finais + R_dens_total*Grupos_Finais) # glm model fixed part only runs ok modelo1.glm = glm(Formula, family = binomial, data = dados) modelo2.glmm = glmer(Formula + (1 |blocos/transectos/parcela/numero), family = binomial, data = dados) Error in Formula + (1 | numero) : non-numeric argument to binary operator modelo2.glmm = glmer(Formula + (1 |numero),family = binomial, data = dados) Error in Formula + (1 | numero) : non-numeric argument to binary operator > modelo2.glmm = glmer(Formula + (1 |parcela),family = binomial, data = dados) Error in Formula + (1 | parcela) : non-numeric argument to binary operator > modelo2.glmm = glmer(Formula + (1 |bloco),family = binomial, data = dados) Error in Formula + (1 | bloco) : non-numeric argument to binary operator > modelo2.glmm = glmer(Formula + (1 |transecto),family = binomial, data = dados) Error in Formula + (1 | transecto) : non-numeric argument to binary operator As you can see there are no nonumeric arguments in the random part of the formula... Thank you in advance for any suggestions, Alexandre -- Dr. Alexandre F. Souza Professor Associado Chefe do Departamento de Ecologia Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.esferacientifica.com.br https://www.youtube.com/user/alexfadigas http://www.docente.ufrn.br/alexsouza orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza> [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] On The Choice of a Classification Approach
Hello, I am trying to find a method to cluster species based on their quantitative traits and at the same time obtain threshold value for each node in the decision tree. My difficulty is that my dependent variable is the list of species names, each species appearing as a single line with no repetition. All explanatory variables are quantitative. As far as I understood, classification trees need a dependent variable with repeated levels as in the iris dataset, in which each species appears several times. All the examples employing classification trees I found use a dependent variable, but I do not have one except for the species names. MRT uses a species by location matrix as dependent variable, and traditional hierarchical cluster analysis do cluster species but do not use quantitative data to that aim, nor produce threshold values. I can run a non-hierarquical cluster analysis like kmeans, but these do not generate threshold values. My concern is that without threshold values any classification I produce will be restricted to the studied species and will not be applicable to different species that can be found in the studied region, what would be a strong limitation to the use of such classification. Thank you very much in advance for any ideas. Regards, Alexandre -- Dr. Alexandre F. Souza Professor Associado Chefe do Departamento de Ecologia Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.esferacientifica.com.br https://www.youtube.com/user/alexfadigas http://www.docente.ufrn.br/alexsouza orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza> [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] Do You Have Anything Against Using ISO DATA?
Dear R-Sig Colleagues, I am performing a bioregionalization of the Cerrado woody flora in central Brazil. I am following the steps Produce a dissimilarity matrix from species x site matrix Perform a NMDS on the dissimilarity matrix Interpolate each NMDS axis with Kriging Partition the stacked interpolated rasters with K-means I am willing to change the k-means by the ISO DATA method implemented in ArcGis, which takes an initial number of groups (the same used for the k-means classification). The ISODATA algorithm has some further refinements by splitting and merging of clusters. Clusters are merged if either the number of members (pixel) in a cluster is less than a certain threshold or if the centers of two clusters are closer than a certain threshold. Clusters are split into two different clusters if the cluster standard deviation exceeds a predefined value and the number of members (pixels) is twice the threshold for the minimum number of members. It implements a set of rule-of-thumb procedures that have been incorporated into an iterative classification algorithm. Many of the steps used in the algorithm are based on the experience obtained through experimentation. The ISODATA algorithm is a modification of the k-means clustering algorithm and is supposed to overcome the disadvantages of k-means. The classification produced preserved the main patterns found in kmeans while reducing the number of groups. However, I have not found anything in the literature of bioregionalization using ISO DATA, only kmeans. Do you know of any reasons I should not use ISO DATA for this aim? Thank you very much in advance for any inputs, Best, Alexandre -- Dr. Alexandre F. Souza Professor Associado Chefe do Departamento de Ecologia Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.esferacientifica.com.br https://www.youtube.com/user/alexfadigas http://www.docente.ufrn.br/alexsouza orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza> [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] Indicator Species in Presence-Absence Data
Dear all, I have a large matrix of 680 localities x 6000 species with presence-absence data divided into 42 ecoregions. I used the multipatt function of the indicspecies package to identify indicator species of each ecoregion. indic = multipatt(dados3[,2:ncol(dados3)], dados3[,1], func = "IndVal.g", duleg = TRUE, control = how(nperm = 999)) Results returned many indicator species for some ecoregions but many ecoregions with zero indicator species. I am in doubt wether the function options I made are adequate to run the analyses for presence-absence data since in the tutorial and in the function examples abundance data are used. I consulted the articles by De Cáceres 2009 and 2010 which mention the possibility of indicator species for presence-absence data but it does not become clear to me if they are automatically implemented in the function or if specific options are needed. An important information is that the ecoregions were defined through interpolation of beta dissimilarity metrics (simpson index) and k-means classification. Is it possible that they are different but do not have indicator species? Thanks for any thoughts, Sincerely, Alexandre -- Dr. Alexandre F. Souza Professor Associado Chefe do Departamento de Ecologia Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.esferacientifica.com.br https://www.youtube.com/user/alexfadigas http://www.docente.ufrn.br/alexsouza orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza> [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] How to subsample a community dataset
Hi everyone, I have two community matrices, one which is presence-absence and the other which contains abundances of hundreds of species in hundreds of localities. Because localities had different sizes, I would like to standardize the number of species per locality before performing beta-diversity compositional analyses. Are you aware of any packages or functions that perform this task? Thank you very much in advance, Sincerely, Alexandre -- Dr. Alexandre F. Souza Professor Associado Chefe do Departamento de Ecologia Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.esferacientifica.com.br https://www.youtube.com/user/alexfadigas http://www.docente.ufrn.br/alexsouza orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza> [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] Error in Kriging Procedure
Hello, I am new to interpolating techiques, and I am getting a persistent erro message I have not been able to resolve. I am trying to krig spatialized ordination scores into a raster of the Brazilian Atlantic Forest, as you can see below. I previously run a NMDS with 5 dimensions on species presence-absence data and saved the nmds scores. I send in attachment the atlantic forest shapefile and the nmds scores, to make the code more reproducible. # Convert NMDS-scores to SpatialPoints objects coords = as.data.frame(coords) names(coords) = c("longitude","latitude") nmds_scores<-cbind(coords, nmds_scores) coordinates(nmds_scores)<-~longitude+latitude # Convert 'nmds_scores' dataframe to SpatialPointsDataframe wgs84<-"+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0" # WGS84 Coordinate Reference System (CRS) crs(nmds_scores)<-wgs84 # Set the original CRS (WGS84) new_crs<-"+proj=robin +lon_0=0 +x_0=0 +y_0=0 +a=6371000 +b=6371000 +units=m +no_defs" # Transform CRS to Robinson projection (it is necessary to have projected data for interpolation) nmds_scores<-spTransform(nmds_scores, new_crs) # Read the shapefile of Brazilian Atlantic Forest (BAF) and rasterize it BAF_limit<-readOGR("C:/Users/Augusto/Documents/MESTRADO/Projeto CAATINGA/Analise k-means/shapecaa") BAF_limit<-spTransform(BAF_limit, wgs84) # Change the CRS from SIRGAS2000 to WGS84 r_ext<-extent(BAF_limit) r<-raster() # Create a empty raster to receive data res(r)<-0.0417 # Define raster resolution in decimal degrees (= 10 arc-min) r<-crop(r, r_ext) # Define spatial extent of the raster (same as BAF) BAF_raster<-rasterize(BAF_limit, r, field=1, background=NA) # Get the spatial coordinates of BAF pixels get_coords<-rasterToPoints(BAF_raster) get_coords<-as.data.frame(get_coords[, c(1:2)]) get_coords<-cbind(get_coords, NA, NA, NA) names(get_coords)<-c("longitude", "latitude", "NMDS1", "NMDS2", "NMDS3") coordinates(get_coords)<-~longitude+latitude # Convert 'get_coords' dataframe to SpatialPointsDataframe crs(get_coords)<-wgs84 # Set the original CRS (WGS84) get_coords<-spTransform(get_coords, new_crs) # Transform CRS to Robinson projection gModel_ok <- gstat(NULL, id='NMDS1_kri', formula= NMDS1 ~ 1, locations=coords, model=data.fit) G0y_ok <- interpolate(BAF_raster, gModel_ok, xyOnly=FALSE) Error in as(data, "data.frame") : no method or default for coercing “NULL” to “data.frame” Do anyone has any idea of what may be wrong? I tried to empty the raster changing all vaues to zero but did not work. Thanks a log in advance, Alexandre -- Dr. Alexandre F. Souza Professor Adjunto IV Chefe do Departamento de Ecologia Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.esferacientifica.com.br http://www.docente.ufrn.br/alexsouza orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza> ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] Simpson distances on abundances?
Dear friends, The Simpson dissimilarity index calculated with the function recluster.dist of your package recluster can be calculated based on a matrix of abundances instead of one of presence/absence? I searched for this piece of information on Baselga 2010, Dapporto 2013, Dapporto 2014 and Lennon 2001 but could not find it. Thanks in advance for any ideas, Alexandre -- Dr. Alexandre F. Souza Professor Adjunto IV Chefe do Departamento de Ecologia Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.esferacientifica.com.br http://www.docente.ufrn.br/alexsouza orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza> [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] Is stepacross a must?
-- Forwarded message -- From: Alexandre F. Souza Date: 2018-06-23 7:04 GMT-03:00 Subject: Is stepacross a must? To: Lista de discussao R-sig-ecology Dear friends, I am in deep doubt regarding whether or not to apply vegan'a stepacross function to my dissimilarity matrix. The matrix is a simpson dissimilarity matrix with 665 rows reflecting beta diversity between localities along the brazilian atlantic florest, with over 4000 species most of which are singletons. They were retained for biological reasons. We are appying a community modelling approach with k-means in order to find the best number of clusters this dataset can be divided. The problem is that we are obtaining quite different results with or without applying the stepacross function. According to Tuomisto et al. 2012: ..."the stepacross method or extended dissimilarities (Williamson 1978, De’ath 1999), originally developed to improve the performance of ordination. Th e aim is to fi nd the shortest path between sites that do not share any species by using intermediate sites as stepping-stones. The process replaces all saturated dissimilarity values (or any dissimilarities larger than a specifi ed threshold value) with new values that can exceed unity. The new values can be as large as is necessary to reflect the amount of compositional heterogeneity in the dataset, so using extended dissimilarities instead of the original ones in Mantel tests and multiple regression on dissimilarity matrices can be expected to improve the performance of these methods." I reproduce the code below. Without stepacross we obtain dissimilarities by large dominated by 1, and as many as 25 groups. With stepacross they are more unimodal since the =1 distances were allowed to have higher values, but as few as 6 groups! Furthermore, the final number of groups varies more between runs with stepacross than without it. With stepacross it seems almost to alternate between 6 and 11 groups, with 6 being more frequent. What do you thin it would be wiser to do? I am leaning towards the stepacross but I am not that sure. The help page of the function mentions that in cases with high beta-diversity datasets stepacross must be used. Thanks for any thoughts, Alexandre simpdist = recluster.dist(sp, dist = "simpson") simp.step = stepacross(simpdist) dissimilarity_matrix<-as.matrix(simp.step) # Perform an ordination via Principal Coordinate Analysis (PCoA) with correction for negative eigenvalues [add = T] ordination_output<-cmdscale(dissimilarity_matrix, k=(nrow(dissimilarity_matrix)-1), eig = F, add = T, x.ret = FALSE) pcoa_scores<-as.data.frame(ordination_output[[1]]) # Extract the PCoA scores for each site # Get values of the evaluation criterion (Within-Groups SS) for k varying from 2 to 'number of sites minus 1' sum_of_squares <- foreach(i = 2:(ncol(dissimilarity_matrix)-1), .combine = 'cbind') %dopar% { # cbind the results of foreach classification<-kmeans(pcoa_scores, i, iter.max=1000, nstart=100)#usar inter.max=1000 e nstart=100 classification$tot.withinss} # the Within-Groups SS for k=i y<-t(as.matrix(sum_of_squares)) # convert Within-Group SS to a vector (y for piecewise regressions) x<-c(2:(ncol(dissimilarity_matrix)-1)) # number of clusters (x for piecewise regressions) # Create an empty matrix to receive the values of 'max-k entered a priori' and the respective 'optimal k selected' initial_kmax<-c(4:ncol(dissimilarity_matrix)) # At least 4 points to produce two lines (L-method) cluster_results<-cbind(initial_kmax, c(NA)) # Dataframe to receive values of the breakpoints # Perform multiple piecewise regression varying the max-k entered a priori from 4 to 'number of sites' for (j in 4:ncol(dissimilarity_matrix)){ # the maximum number of 'k' established a priori y_selected<-y[1:j] # Limit the number of starting points (y axis) x_selected<-x[1:j] # Limit the number of starting points (x axis) find_the_knee <- foreach(i = 2:(j-1), # calculate the RSE for all possible breakpoints (i) when starting with k-max = j .combine = 'cbind') %dopar% { # piecewise_k<-lm(y_selected~x_selected*(x_selected=i)) summary(piecewise_k)[6]} # print the RSE for each piecewise regression find_the_knee<-c(do.call("cbind",find_the_knee)) # convert the previous output (list) to vector knee_found<-as.data.frame(cbind(c(2:(j-1)), find_the_knee)) # combine position of breakpoint and RSE value names(knee_found)<-c("breakpoint", "RSE") # rename knee_found<-na.omit(knee_found) # remove NaN values k_number<-knee_found[which(knee_found$RSE==min(knee_found$RSE)),1] # select breakpoint that minimizes RSE cluster_results[(j-3),2]<-k_number} # print the maximum
[R-sig-eco] Deviance in Multinomial Logistic Regression
Hello, I am using the following code to perform model selection on a categorical variable with 9 levels corresponding to floristic regions. Explanatory variables are climatic, soil, and MEMs describing spatial structure. I am using AIC and wAIC to choose models. As far as I understood, the Deviance should be smaller in the worse model, which would be similar to the null model containing just the intercept, and larger in the best model. However, the Deviance statistic, however, is increasing from the best model to the worse model. Do anyone have any idas about what am I doing or reading wrong? Many thanks in advance, any ideas will help # Verify the total deviance (null model) Null_model<-multinom(subregion~1, data=variables) Total_deviance<-Null_model[[25]] # store the total deviance # Prepare the models to be tested formlist <- list() # Empty list to receive nested formulas in forward selection formlist[[1]]<-as.formula("subregion~AMT+APP+CEC+Sand+TAR+MEM2+MEM1+MEM3+MEM4+MEM5+MEM27+MEM24+MEM15+MEM10+MEM26+MEM16") # Climatic/soil Productivity formlist[[2]]<-as.formula("subregion~ElevR+ElevCV+MEM2+MEM1+MEM3+MEM4+MEM5+MEM27+MEM24+MEM15+MEM10+MEM26+MEM16") # Topography formlist[[3]]<-as.formula("subregion~HFP+MEM2+MEM1+MEM3+MEM4+MEM5+MEM27+MEM24+MEM15+MEM10+MEM26+MEM16") # Human footprint formlist[[4]]<-as.formula("subregion~AMT+APP+CEC+Sand+TAR+ElevR+ElevCV+MEM2+MEM1+MEM3+MEM4+MEM5+MEM27+MEM24+MEM15+MEM10+MEM26+MEM16") # Productivity + Topography formlist[[5]]<-as.formula("subregion~AMT+APP+CEC+Sand+TAR+HFP+MEM2+MEM1+MEM3+MEM4+MEM5+MEM27+MEM24+MEM15+MEM10+MEM26+MEM16") # Productivity + HFP formlist[[6]]<-as.formula("subregion~HFP+ElevR+ElevCV+MEM2+MEM1+MEM3+MEM4+MEM5+MEM27+MEM24+MEM15+MEM10+MEM26+MEM16") # HFP + Topography formlist[[7]]<-as.formula("subregion~AMT+APP+CEC+Sand+TAR+ElevR+ElevCV+HFP+MEM2+MEM1+MEM3+MEM4+MEM5+MEM27+MEM24+MEM15+MEM10+MEM26+MEM16") # Total Contemporary Environment formlist[[8]]<-as.formula("subregion~MEM2+MEM1+MEM3+MEM4+MEM5+MEM27+MEM24+MEM15+MEM10+MEM26+MEM16") # Space only # Prepare empty dataframe to receive models' results MLRresults<-as.data.frame(matrix(nrow=8, ncol=4)) colnames(MLRresults) <- (c("Model", "Deviance", "AIC", "AICc")) for (i in 1:8){ # the remaining variables to be forward selected formula<-formlist[[i]] MLR_test<-multinom(formula, data=variables) MLRresults[i,1]<-paste(formlist[i]) # store the 'model formula' MLRresults[i,2]<-(Total_deviance-MLR_test[[19]])/Total_deviance # store the % of deviance explained MLRresults[i,3]<-MLR_test[[26]] # store the AIC value MLRresults[i,4]<-AICc(MLR_test)} # store the AICc value wAICc<-Weights(MLRresults$AICc) # AIC weigth MLRresults<-cbind(MLRresults, wAICc) # combine results in a single table View(MLRresults[order(MLRresults$wAICc, decreasing=T), ]) # print the output table from the model selection procedure The above code yields MODEL DEVIANCE AICAICc wAIC 8 subregion ~ MEM2 + MEM1 + MEM3 + MEM4 + MEM5 + MEM27 + MEM24 + MEM15 + MEM10 + MEM26 + MEM16 0.5025134 738.7133 910.5673 9.07e-01 3 subregion ~ HFP + MEM2 + MEM1 + MEM3 + MEM4 + MEM5 + MEM27 + MEM24 + MEM15 + MEM10 + MEM26 + MEM16 0.5393317 718.0280 933.7467 9.260806e-06 2 subregion ~ ElevR + ElevCV + MEM2 + MEM1 + MEM3 + MEM4 + MEM5 + MEM27 + MEM24 + MEM15 + MEM10 + MEM26 + MEM16 0.5574922 716.9467 985.8879 4.409071e-17 6 subregion ~ HFP + ElevR + ElevCV + MEM2 + MEM1 + MEM3 + MEM4 + MEM5 + MEM27 + MEM24 + MEM15 + MEM10 + MEM26 + MEM16 0.5805342 710.7362 1044.5544 8.036491e-30 1 subregion ~ AMT + APP + CEC + Sand + TAR + MEM2 + MEM1 + MEM3 + MEM4 + MEM5 + MEM27 + MEM24 + MEM15 + MEM10 + MEM26 + MEM16 0.6736414 648.9077 1161.1251 3.909003e-55 5 subregion ~ AMT + APP + CEC + Sand + TAR + HFP + MEM2 + MEM1 + MEM3 + MEM4 + MEM5 + MEM27 + MEM24 + MEM15 + MEM10 + MEM26 + MEM16 0.6818203 658.3141 1294.6032 4.051928e-84 4 subregion ~ AMT + APP + CEC + Sand + TAR + ElevR + ElevCV + MEM2 + MEM1 + MEM3 + MEM4 + MEM5 + MEM27 + MEM24 + MEM15 + MEM10 + MEM26 + MEM16 0.6943008 663.2007 1458.1197 1.260428e-119 7 subregion ~ AMT + APP + CEC + Sand + TAR + ElevR + ElevCV + HFP + MEM2 + MEM1 + MEM3 + MEM4 + MEM5 + MEM27 + MEM24 + MEM15 + MEM10 + MEM26 + MEM16 0.7016101 673.5208 1675.9823 6.197945e-167 -- Dr. Alexandre F. Souza Professor Adjunto IV Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://esferacientifica.wixsite.com/esferacientifica http://www.docente.ufrn.br/alexsouza [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] Error in optim function
Dear friends, I am getting a persistent error message when trying to estimate lotka-volterra competition coefficients from data on two populations of paramecium using the optim function. Do anyone knows how could I solve this? The error message is Error in eval(substitute(expr), data, enclos = parent.frame()) : numeric 'envir' arg not of length one Debug function (data, expr, ...) eval(substitute(expr), data, enclos = parent.frame()) and the code is library(deSolve) LotVolt = function(t,n,parms) { with(parms,{ dn = r*n*(1-LV%*%n) return(list(dn)) }) } LVmse = function(parms) { out = as.data.frame(lsoda(n0, times, LotVolt, parms)) # run ode mse = mean((out[,2:3]-nTrue)^2) # calculate mean squared error between simulated and original data return(mse) # return mean squared error } dados = cbind(c(2,4,7,25,14,43,81,140,180,224,240,204,375,370),c(2,4,6,13,41,74,195,164,160,240,230,215,230,215)) nTrue = as.matrix(dados) s = 2 #number of species r = c(0.7816, 0.6283) #vector of species intrinsic growths K = c(559.686, 202.4931) #vector of species carrying capacities n0 = c(n=nTrue[1,]) alpha0 = diag(1,s,s) #alpha is matrix of species competition coefficients, ones on diagonal a12guess = 0.5 a21guess = 0.5 alpha0[2,1] = a12guess alpha0[1,2] = a21guess tf = dim(nTrue)[1] # time times = 1:tf # vector of times to run over LV0 = diag(1/K)%*%alpha0 #modified lotka-volterra matrix is species competitions divided by carrying capacities optimOut = optim(LV0, LVmse) # give optim initial guess and function parms = optimOut$par # extract parameters nSim = as.data.frame(lsoda(n0,times,logInt,parms)) # run ode to get simulated population Any assistence would help. Many thanks in advance, Alexandre -- Dr. Alexandre F. Souza Professor Adjunto III Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.docente.ufrn.br/alexsouza [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] Number of Groups in SpeciesMix
Dear friends, I am willing to apply the SAM analytical framework to a dataset of plant species in coastal Brazil using the SpeciesMix package. The SpeciesMix package fits Species Archtype Models, a special type of finite mixture of regression model motivated by the analysis of multi-species data. In appying function clusterSelect, which helps in defining the best number of species groups G, I would like to confirm if my understanding is correct: the formula reported there as "obs ~ 1 + x" in clusters <- clusterSelect(obs~1+x,dat1$pa,dat,G=2:5,em.refit=2) is a generic formulation not directly related to the species (dat1$pa) or environmental (dat) data, isn't it? So in principle I should use this same formulation as well, understanding that obs stands for the whole species data matrix, 1 for the presence of a constant, and x for the whole explanatory dataset? I tried to apply obs~1+x but it returns an error message, however. I am kind of blocked here so any thoughts could help... Sincerely, Alexandre -- Dr. Alexandre F. Souza Professor Adjunto III Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.docente.ufrn.br/alexsouza [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] How to fit a logistic discrete function!
Dear friends, Sorry for the somewhat lengthy message, but I would really like to hear your opinion on this. I am organizing an undergraduation population ecology course and would like to work the logistic equation as proposed by Bellows 1981 and presented by Begon et al. 2006 in their Ecology textbook: Nt+1 =(Nt*R)/((1+a*Nt)^b) Where R is lambda, the discrete growth rate (R = Nt+1/Nt), and a and b are constants. The beauty of this equation, in contrast to other more traditional formulatons, is because it bypasses the support capacity K, which is hard to know in advance, and makes it a function of a and R as K = (R - 1)/a. So K becomes a consequence of population properties and not an unrealistic prerequisite. Furthermore, a is meaningful in its own right, measuring the per capita susceptibility to crowding: the larger the value of a, the greater the effect of density on the actual rate of increase in the population. Finally, b is thought to portray undercompensation (b < 1), perfect compensation (b = 1), scramble-like overcompensation (b > 1) or even density independence (b = 0). However, because the the equation is an iterative one, I am having a hard time figuring out how to fit it to an empirical dataset made by a time series of population counts. I believe this is key to make it more concrete for the students, to show how to make it practical and not merely a theoretical construct. I could not find this equation used much often, and could not find an example of how fitting it. Does anyone have any suggestions? Thank your very much in advance, Alexandre -- Dr. Alexandre F. Souza Professor Adjunto III Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.docente.ufrn.br/alexsouza [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] Summary of the Answers I got for my post "How to incorporate spatial autocorrelation in multivariate GLM "
*groundfishes. **James T. Thorson1*, Andrew O. Shelton2, Eric J. Ward2, and Hans J. Skaug. **ICES Journal of Marine Science; doi:10.1093/icesjms/fsu243* ** *The importance of spatial models for estimating the strength of **density dependence.**JAMES T. THORSON,1,6 HANS J. SKAUG,2 KASPER KRISTENSEN,3 ANDREW O., HELTON,4 ERIC J.WARD,4 JOHN H. HARMS,1 **AND JAMES A. BENANTE. **Ecology, 96(5), 2015, pp. 1202–1212. * -- Dr. Alexandre F. Souza Professor Adjunto III Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.docente.ufrn.br/alexsouza [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Problem in Search and Association Loop
Hello Fabrice, hello Hanna, Thank you very much for your responses to my question. Both your suggestions were helpful and worked nicely. Best wishes, Alexandre 2015-09-05 10:52 GMT-03:00 Fabrice Dessaint <fabrice.dessa...@dijon.inra.fr> : > > Hello Alexandre, > > I am writing a loop to compare scientific plant names from one data frame > in a second data frame containing a smaller set of potentially different > plant names, and then to copy the associated abundances of the sp already > present in the first database, and add the new species to the bottom. > > > Perhaps, you can use merge() function ? > > > base <- data.frame( > + > family=c("Burseraceae","Anacardiaceae","Euphorbiaceae","Anacardiaceae","Apocynaceae", > + "Sapotaceae","Malpighiaceae","Polygonaceae","Burseraceae"), > + > genus=c("Coccoloba","Tapirira","Pogonophora","Thyrsodium","Himatanthus","Pouteria", > + "Byrsonima","Coccoloba","Protium"), > + epithet=c("cf. > cordifolia","guianensis","schomburgkiana","spruceanum","phagedaenicus", > + "coriacea","sericea","cf. cordifolia","heptaphyllum"), > + abundance=c(224,112,146,115,47,71,25,29,37)) > > > > new.base <- data.frame( > + genus=c("Coccoloba","Protium","Bowdichia","Sclerolobium","Ocotea"), > + epithet=c("cf. > cordifolia","heptaphyllum","virgilioides","densiflorum","glomerata"), > + abundance=c(29,37,15,15,16)) > > > > merge.data.frame(base,new.base,by=c("genus", "epithet"), all=TRUE) > genusepithetfamily abundance.x abundance.y > 1 Byrsonimasericea Malpighiaceae 25 NA > 2 Coccoloba cf. cordifolia Burseraceae 224 29 > 3 Coccoloba cf. cordifolia Polygonaceae 29 29 > 4 Himatanthus phagedaenicus Apocynaceae 47 NA > 5 Pogonophora schomburgkiana Euphorbiaceae 146 NA > 6 Pouteria coriaceaSapotaceae 71 NA > 7 Protium heptaphyllum Burseraceae 37 37 > 8 Tapirira guianensis Anacardiaceae 112 NA > 9Thyrsodium spruceanum Anacardiaceae 115 NA > 10Bowdichia virgilioidesNA 15 > 11 Ocotea glomerataNA 16 > 12 SclerolobiumdensiflorumNA 15 > > > Fabrice > > --- > Fabrice Dessaint > > UMR 1347 Agroécologie > INRA/AgroSup/uB > 17 rue Sully, > BP 86510 F-21065 Dijon Cedex > Tel: +33 (0)3 80 69 31 83 > ——— > ResearchID: http://www.researcherid.com/rid/F-1102-2014 > ORCID: http://orcid.org/-0001-9135-598X > ResearchGate: http://www.researchgate.net/profile/Fabrice_Dessaint > > > > > > > -- Dr. Alexandre F. Souza Professor Adjunto III Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.docente.ufrn.br/alexsouza [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] Problem in Search and Association Loop
Dear friends, I am writing a loop to compare scientific plant names from one data frame in a second data frame containing a smaller set of potentially different plant names, and then to copy the associated abundances of the sp already present in the first database, and add the new species to the bottom. I am receiving, however, an error message saying that factor levels are different. Can someone give me a hint on what can be happening? Here you have the code The base dataframe > base family genus epithetabundance 1 Burseraceae Coccoloba cf. cordifolia 224 2 AnacardiaceaeTapirira guianensis 112 3 Euphorbiaceae Pogonophora schomburgkiana 146 4 Anacardiaceae Thyrsodium spruceanum 115 5 Apocynaceae Himatanthus phagedaenicus 47 6SapotaceaePouteria coriacea 71 7 Malpighiaceae Byrsonimasericea 25 8 Polygonaceae Coccoloba cf. cordifolia 29 9 Burseraceae Protium heptaphyllum 37 The new data data.frame > new.data genussp.newabundance 1Coccoloba cf. cordifolia 29 2 Protium heptaphyllum 37 3Bowdichia virgilioides 15 4 Sclerolobiumdensiflorum 15 5 Ocotea glomerata 16 The Output table output= matrix(nrow = 10, ncol = 3) colnames(output) = c("genus", "epithet", "abundance") output for (i in 1:nrow(base)){ for (j in 1:nrow(new.data)){ if ((base[i,2] == new.data[j,1]) & (base[i,3] == new.data[j,2])){ output[i,1] = base[i,2] output[i,2] = base[i,3] output[i,3] = new.data[j,2] } } } This is just the beginning of the code I plan. I stopped due to the error message. Thank you very much in advance, Alexandre -- Dr. Alexandre F. Souza Professor Adjunto III Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.docente.ufrn.br/alexsouza [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] Understanding of P-values from rankings
Hi, For a simulation of 1000 runs and a threshold of 0.05: For a two-tailed test all ranks 25 or 975 would be significant For a one-tailed test all ranks 50 would be significant (considering a less-than one-tailed hypothesis). I think I got it now. Thank you, Best, Alexandre 2015-04-08 14:50 GMT-03:00 Alexandre F. Souza alexsouza.cb.ufrn...@gmail.com: Dear friends, I have fun a series of null models of tree species traits (e.g., height) in permanent forest plots using the code found in Kraft's web page ( http://life.umd.edu/biology/kraftlab/Code_files/trait_tests.R). Null distributions of trait metrics were generated by creating 999 null communities of equal richness to the sample quadrat by drawing species at random from our entire trait database, weighted by plot-wide occurrence (the number of quadrats in which each species is found). For each of 245 plots, the output consists in the following output: test.richness 11 reps 999 mean_rank 972 obs_mean 0.850482 null_mean_mean 0.816427 null_mean_sd 0.019774 Where mean_rank means the ranking of the observed mean. Now I need to obtain a P-value from the ranking information. I understand that a one-tailed P-value would be = observed rank/(number of runs + 1), and two-tailed P values should be twice this value, as found in the manual of Phylocom. Having 999 runs this gives almost equal Ps as the ranks of the means, with P = 0.972 for a one-tailed test in this case. However, a two-tailed test would yield a P-value 1.0 in this case. How do I use the rank of the mean with the number of runs to obtain a one-tailed and two-tailed value? Sorry for the introductory level of the question, but it is for a good cause. Best, Alexandre -- Dr. Alexandre F. Souza Professor Ajunto III Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.docente.ufrn.br/alexsouza -- Dr. Alexandre F. Souza Professor Ajunto III Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.docente.ufrn.br/alexsouza [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] Problem with Capscale and Phylobetadiversity distance matrix
Hello everyone, I have a problem trying to use vegan's capscale() on a phylobetadiversity distance matrix, reproduced below using the example: library(psych) # For the PCA test involved # Opens a full matrix of 245 x 245 phylobetadiversity distances bnti.ses = read.csv(file.choose(), header = T, row.names=1); dim(bnti.ses); # this gives the dimensions bnti.ses[1:5,1:5]; # this gives a look at the first 5 rows and columns # Test PCA scores test = data.frame(replicate(10,sample(0:100,245,rep=TRUE))) test pca = principal(teste, nfactors = 3, covar = FALSE, scores = TRUE) meta.pca.scores = pca$scores rownames(meta.pca.scores)=rownames(bnti.ses) ## dbRDA # betaNTI standard effect size bnti.ses = bnti.ses + abs(min(bnti.ses)); bnti.ses = bnti.ses/max(bnti.ses); range(bnti.ses); identical(rownames(bnti.ses),colnames(bnti.ses)); # must be TRUE to proceed identical(rownames(bnti.ses),rownames(meta.pca.scores)); # must be TRUE to proceed # Here is where the error occurs bnti.ses.dbrda.int = capscale(as.dist(bnti.ses) ~ 1,data=meta.pca.scores); bnti.ses.dbrda.full = capscale(as.dist(bnti.ses) ~ .,data=meta.pca.scores); bnti.ses.select.rsq = ordiR2step(bnti.ses.dbrda.int, scope = formula(bnti.ses.dbrda.full),direction = both,pstep=1, perm.max = 10); bnti.ses.select.rsq; I get the message: Error in formula.default(object, env = baseenv()) : invalid formula I tried the same code using the varespec and varechem example files present in capscale's help and it worked. I checked the dimensions of the phylobetadiversity matrix and they are ok (245 x 245). I tried to omit the row names but the problem persists. For me the very last test that should have worked was bnti.ses = data.frame(replicate(245,sample(0:100,245,rep=TRUE))) rownames(bnti.ses)=rownames(meta.pca.scores)colnames(bnti.ses)=rownames(bnti.ses) and then working this artificial bnti.ses through the subsequent standardization and capscale steps. However, I got the same error mesage with this example. Am I missing something, or is there a workaround? Thanks, Alexandre -- Dr. Alexandre F. Souza Professor Ajunto II Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.docente.ufrn.br/alexsouza -- Dr. Alexandre F. Souza Professor Ajunto II Universidade Federal do Rio Grande do Norte CB, Departamento de Ecologia Campus Universitário - Lagoa Nova 59072-970 - Natal, RN - Brasil lattes: lattes.cnpq.br/7844758818522706 http://www.docente.ufrn.br/alexsouza [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology