[R-sig-eco] Problem in Illustrating ZI GLMM

2024-05-14 Thread Alexandre F. Souza
   -0.206450.15959  -1.294   0.1958

basal_area_m2   -0.711710.14941  -4.763 1.90e-06 ***

tree_density 0.333220.15064   2.212   0.0270 *

das_mm:dif_alt_dossel_m  0.556600.12694   4.385 1.16e-05 ***

elevation_m:hand_m  -0.334240.13648  -2.449   0.0143 *

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

-- 
Dr. Alexandre F. Souza
Professor Associado
Departamento de Ecologia/CB
Universidade Federal do Rio Grande do Norte
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.esferacientifica.com.br
https://www.youtube.com/user/alexfadigas
http://www.docente.ufrn.br/alexsouza
orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] GLMM: AIC worsens but BIC improves

2024-03-28 Thread Alexandre F. Souza
Dear all,

First I would like to say thank you for replying to my last posts, your
replies have halped me a lot and complemented the formal sources I have
been consulting.

I would like very much your opinion on a further issue below, if possible.

I am using glmmTMB to find the fixed and random variables that influence
whether açai palms produce fruit or not in the Amazon. So far so good, I
ended upt with a much simpler model that basically contains the year (2
years only), habitat type (3 levels, Grupos_Finais below) and density of
adult palms (R_dens below). This is model 8.

When I try to further simplify this model reducing it to only year +
habitat or year alone, I get an worsened AIC but an improved BIC... When I
fit model 8 with REML, the significance of adult palm density disappear (I
have been making model comparison manually with ML untill now). Together,
these results make me pend to exclude habitat type and adult density, but I
am intrigued with worsened AIC.

What do you think?

modelo8.glmm = glmmTMB(reproducao_bin ~ 0 + Grupos_Finais + ano_medida +
R_dens_total + (1|numero) + (1 |bloco/parcela), family = binomial, data =
dados)

> summary(modelo8.glmm)
 Family: binomial  ( logit )
Formula:
reproducao_bin ~ 0 + Grupos_Finais + ano_medida + R_dens_total +
(1 | numero) + (1 | bloco/parcela)
Data: dados

 AIC  BIC   logLik deviance df.resid
  1665.5   1707.6   -824.8   1649.5 1420

Random effects:

Conditional model:
 GroupsNameVariance Std.Dev.
 numero(Intercept) 3.0944   1.7591
 parcela:bloco (Intercept) 0.7850   0.8860
 bloco (Intercept) 0.7421   0.8615
Number of obs: 1428, groups:  numero, 762; parcela:bloco, 147; bloco, 10

Conditional model:
  Estimate Std. Error z value
Grupos_FinaisDisturbed Unflooded Forests  -0.014900.57713  -0.026
Grupos_FinaisFlooded Open Forests  1.923200.46866   4.104
Grupos_FinaisOld Growth Unflooded Forests  1.113600.72798   1.530
ano_medida2017-1.735440.18708  -9.276
R_dens_total  -0.037490.01542  -2.430
  Pr(>|z|)
Grupos_FinaisDisturbed Unflooded Forests0.9794
Grupos_FinaisFlooded Open Forests 4.07e-05 ***
Grupos_FinaisOld Growth Unflooded Forests   0.1261
ano_medida2017 < 2e-16 ***
R_dens_total0.0151 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

modelo9.glmm = glmmTMB(reproducao_bin ~ 0 + Grupos_Finais + ano_medida +
(1|numero) + (1 |bloco/parcela), family = binomial, data = dados)

modelo10.glmm = glmmTMB(reproducao_bin ~ ano_medida + (1|numero) + (1
|bloco/parcela), family = binomial, data = dados)

> AIC(modelo8.glmm,modelo9.glmm,modelo10.glmm)  df  AIC
modelo8.glmm   8 1665.501
modelo9.glmm   7 1669.163
modelo10.glmm  5 1671.159


> BIC(modelo8.glmm,modelo9.glmm,modelo10.glmm)  df  BIC
modelo8.glmm   8 1707.613
modelo9.glmm   7 1706.011
modelo10.glmm  5 1697.479


Perhaps report bot AIC and BIC and decide for the simpler model?... Thank
you very much for any inputs,

Cheers,

Alexandre

-- 
Dr. Alexandre F. Souza
Professor Associado
Departamento de Ecologia/CB
Universidade Federal do Rio Grande do Norte
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.esferacientifica.com.br
https://www.youtube.com/user/alexfadigas
http://www.docente.ufrn.br/alexsouza
orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] How to set Manually the Reference Level in glmmTMB

2024-03-28 Thread Alexandre F. Souza
Dear all,

I am running a glmm with glmmTMB in which the fixed part has two factor
varialbes, year (ano_medida) and habitat type (Grupos_Finais below). Since
I have only two years I do not mind that the model takes the first year as
a baseline. However, no habitat type is a control or a reference, and I
would like to see results for all levels of this variable. I would like to
know how to manually do this, but I am only finding in the internet
resources to change the reference level, not to cancel it. Could anyone
help me with this?

modelo2.glmm = glmmTMB(reproducao_bin ~ ano_medida +
Basal_Area_m2*Grupos_Finais + R_dens_total*Grupos_Finais + (1|numero) + (1
|bloco/transecto/parcela), family = binomial, data = dados)

My data:
> str(dados)
'data.frame': 1428 obs. of  12 variables:
 $ numero  : int  4006 4009 4010 4020 4022 4024 4025 4027
4035 4036 ...
 $ bloco   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ transecto   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ parcela : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Tree_Height_m_Fuste_Parc: num  8.32 8.32 8.32 8.32 8.32 ...
 $ Grupos_Finais   : Factor w/ 3 levels "Disturbed Unflooded
Forests",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ ano_medida  : Factor w/ 2 levels "2016","2017": 1 1 1 1 1 1
1 1 1 1 ...
 $ Basal_Area_m2   : num  0.182 0.182 0.182 0.182 0.182 ...
 $ das_mm  : num  185 161 167 114 164 ...
 $ reproducao  : int  3 0 0 0 0 0 0 0 0 0 ...
 $ reproducao_bin  : int  1 0 0 0 0 0 0 0 0 0 ...
 $ R_dens_total: int  31 31 31 31 31 31 31 31 31 31 ...
>
Many thanks!

Alexandre

-- 
Dr. Alexandre F. Souza
Professor Associado
Departamento de Ecologia/CB
Universidade Federal do Rio Grande do Norte
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.esferacientifica.com.br
https://www.youtube.com/user/alexfadigas
http://www.docente.ufrn.br/alexsouza
orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] GLMM: Should Integers be Levels of Dr. Alexandre F. Souza Prof. Associado a Factor?

2024-03-26 Thread Alexandre F. Souza
 Hello all,

Judging by some sources, it seems to me that before running a GLMM I should
make all categorical variables into factors, including numerical ones that
are integers like binary response variables, plot numbers, block levels,
sampling year and the ID of individual measured plants, even if these
variables are correctly identified as integers as in the example below. Is
this understanding correct in your opinion?

Thank you very much for any reply,

Alexandre


> str(dados)
'data.frame': 1428 obs. of  12 variables:
 $ numero  : int  4006 4009 4010 4020 4022 4024 4025 4027
4035 4036 ...
 $ bloco   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ transecto   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ parcela : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Tree_Height_m_Fuste_Parc: num  8.32 8.32 8.32 8.32 8.32 ...
 $ Grupos_Finais   : Factor w/ 3 levels "Disturbed Unflooded
Forests",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ ano_medida  : int  2016 2016 2016 2016 2016 2016 2016 2016
2016 2016 ...
 $ Basal_Area_m2   : num  0.182 0.182 0.182 0.182 0.182 ...
 $ das_mm  : num  185 161 167 114 164 ...
 $ reproducao  : int  3 0 0 0 0 0 0 0 0 0 ...
 $ reproducao_bin  : int  1 0 0 0 0 0 0 0 0 0 ...
 $ R_dens_total: int  31 31 31 31 31 31 31 31 31 31 ...

-- 
Dr. Alexandre F. Souza
Professor Associado
Chefe do Departamento de Ecologia
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.esferacientifica.com.br
https://www.youtube.com/user/alexfadigas
http://www.docente.ufrn.br/alexsouza
orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] ErrError: Non-numeric argument to binary operator, despite values being numeric

2024-03-26 Thread Alexandre F. Souza
 Hello all,

In the following code I'm attempting to perform a logistic GLMM with a
binary dependent variable (whether palms reproduced or not), a fixed part
(with habitat types, plot basal area and year or measurement) and a random
part with measured individuals (numero) nested within plots (parcelas)
nested within blocks (blocos). However, I am getting an error message
telling me

Error in Formula + (1 | numero) : non-numeric argument to binary operator

I tried to repeat the random parte with the entire nested structure and
with each of its components alone, and the message is the same (it is
repeated for each random component when I try them alone), despite de fact
that all of them are numeric/integers. I tried to change them to
dados$numero = as.numeric(dados.numero) but got the same message.

There are no NAs in the data, I excluded them all:

str(dados)
'data.frame': 1428 obs. of  12 variables:
 $ numero  : num  4006 4009 4010 4020 4022 ...
 $ bloco   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ transecto   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ parcela : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Tree_Height_m_Fuste_Parc: num  8.32 8.32 8.32 8.32 8.32 ...
 $ Grupos_Finais   : Factor w/ 3 levels "Disturbed Unflooded
Forests",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ ano_medida  : int  2016 2016 2016 2016 2016 2016 2016 2016
2016 2016 ...
 $ Basal_Area_m2   : num  0.182 0.182 0.182 0.182 0.182 ...
 $ das_mm  : num  185 161 167 114 164 ...
 $ reproducao  : int  3 0 0 0 0 0 0 0 0 0 ...
 $ reproducao_bin  : int  1 0 0 0 0 0 0 0 0 0 ...
 $ R_dens_total: int  31 31 31 31 31 31 31 31 31 31 ...

dados$Grupos_Finais = factor(dados$Grupos_Finais)

dados$numero = as.numeric(dados$numero)

# Model beyond idea for further selection
Formula = formula(reproducao_bin ~ ano_medida + Basal_Area_m2*Grupos_Finais
+ R_dens_total*Grupos_Finais)

# glm model fixed part only runs ok
modelo1.glm = glm(Formula, family = binomial, data = dados)

modelo2.glmm = glmer(Formula + (1 |blocos/transectos/parcela/numero),
family = binomial, data = dados)
Error in Formula + (1 | numero) : non-numeric argument to binary operator
modelo2.glmm = glmer(Formula + (1 |numero),family = binomial, data = dados)
Error in Formula + (1 | numero) : non-numeric argument to binary operator
> modelo2.glmm = glmer(Formula + (1 |parcela),family = binomial, data =
dados)
Error in Formula + (1 | parcela) :
  non-numeric argument to binary operator
> modelo2.glmm = glmer(Formula + (1 |bloco),family = binomial, data = dados)
Error in Formula + (1 | bloco) : non-numeric argument to binary operator
> modelo2.glmm = glmer(Formula + (1 |transecto),family = binomial, data =
dados)
Error in Formula + (1 | transecto) :
  non-numeric argument to binary operator


As you can see there are no nonumeric arguments in the random part of the
formula...

Thank you in advance for any suggestions,

Alexandre

-- 
Dr. Alexandre F. Souza
Professor Associado
Chefe do Departamento de Ecologia
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.esferacientifica.com.br
https://www.youtube.com/user/alexfadigas
http://www.docente.ufrn.br/alexsouza
orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] On The Choice of a Classification Approach

2021-11-01 Thread Alexandre F. Souza
 Hello,

I am trying to find a method to cluster species based on their quantitative
traits and at the same time obtain threshold value for each node in the
decision tree. My difficulty is that my dependent variable is the list of
species names, each species appearing as a single line with no repetition.
All explanatory variables are quantitative. As far as I understood,
classification trees need a dependent variable with repeated levels as in
the iris dataset, in which each species appears several times. All the
examples employing classification trees I found use a dependent variable,
but I do not have one except for the species names. MRT uses a species by
location matrix as dependent variable, and traditional hierarchical cluster
analysis do cluster species but do not use quantitative data to that aim,
nor produce threshold values. I can run a non-hierarquical cluster analysis
like kmeans, but these do not generate threshold values. My concern is that
without threshold values any classification I produce will be restricted to
the studied species and will not be applicable to different species that
can be found in the studied region, what would be a strong limitation to
the use of such classification.

Thank you very much in advance for any ideas.

Regards,

Alexandre


-- 
Dr. Alexandre F. Souza
Professor Associado
Chefe do Departamento de Ecologia
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.esferacientifica.com.br
https://www.youtube.com/user/alexfadigas
http://www.docente.ufrn.br/alexsouza
orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] Do You Have Anything Against Using ISO DATA?

2021-01-20 Thread Alexandre F. Souza
Dear R-Sig Colleagues,

I am performing a bioregionalization of the Cerrado woody flora in central
Brazil.

I am following the steps

Produce a dissimilarity matrix from species x site matrix
Perform a NMDS on the dissimilarity matrix
Interpolate each NMDS axis with Kriging
Partition the stacked interpolated rasters with K-means

I am willing to change the k-means by the ISO DATA method implemented in
ArcGis, which takes an initial number of groups (the same used for the
k-means classification). The ISODATA algorithm has some further refinements
by splitting and merging of clusters. Clusters are merged if either the
number of members (pixel) in a cluster is less than a certain threshold or
if the centers of two clusters are closer than a certain threshold.
Clusters are split into two different clusters if the cluster standard
deviation exceeds a predefined value and the number of members (pixels) is
twice the threshold for the minimum number of members. It implements a set
of rule-of-thumb procedures that have been incorporated into an iterative
classification algorithm. Many of the steps used in the algorithm are based
on the experience obtained through experimentation. The ISODATA algorithm
is a modification of the k-means clustering algorithm and is supposed to
overcome the disadvantages of k-means.

The classification produced preserved the main patterns found in kmeans
while reducing the number of groups.

However, I have not found anything in the literature of bioregionalization
using ISO DATA, only kmeans. Do you know of any reasons I should not use
ISO DATA for this aim?

Thank you very much in advance for any inputs,

Best,

Alexandre

-- 
Dr. Alexandre F. Souza
Professor Associado
Chefe do Departamento de Ecologia
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.esferacientifica.com.br
https://www.youtube.com/user/alexfadigas
http://www.docente.ufrn.br/alexsouza
orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] Indicator Species in Presence-Absence Data

2019-11-23 Thread Alexandre F. Souza
Dear all,

I have a large matrix of 680 localities x 6000 species with
presence-absence data divided into 42 ecoregions.  I used the multipatt
function of the indicspecies package to identify indicator species of each
ecoregion.

indic = multipatt(dados3[,2:ncol(dados3)], dados3[,1], func = "IndVal.g",
duleg = TRUE, control = how(nperm = 999))

Results returned many indicator species for some ecoregions but many
ecoregions with zero indicator species. I am in doubt wether the function
options I made are adequate to run the analyses for presence-absence data
since in the tutorial and in the function examples abundance data are used.
I consulted the articles by De Cáceres 2009 and 2010 which mention the
possibility of indicator species for presence-absence data but it does not
become clear to me if they are automatically implemented in the function or
if specific options are needed.

An important information is that the ecoregions were defined through
interpolation of beta dissimilarity metrics (simpson index) and k-means
classification. Is it possible that they are different but do not have
indicator species?

Thanks for any thoughts,

Sincerely,

Alexandre

-- 
Dr. Alexandre F. Souza
Professor Associado
Chefe do Departamento de Ecologia
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.esferacientifica.com.br
https://www.youtube.com/user/alexfadigas
http://www.docente.ufrn.br/alexsouza
orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] How to subsample a community dataset

2019-11-19 Thread Alexandre F. Souza
Hi everyone,

I have two community matrices, one which is presence-absence and the other
which contains abundances of hundreds of species in hundreds of localities.
Because localities had different sizes, I would like to standardize the
number of species per locality before performing beta-diversity
compositional analyses. Are you aware of any packages or functions that
perform this task?

Thank you very much in advance,

Sincerely,

Alexandre

-- 
Dr. Alexandre F. Souza
Professor Associado
Chefe do Departamento de Ecologia
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.esferacientifica.com.br
https://www.youtube.com/user/alexfadigas
http://www.docente.ufrn.br/alexsouza
orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] Error in Kriging Procedure

2018-12-14 Thread Alexandre F. Souza
Hello,

I am new to interpolating techiques, and I am getting a persistent erro
message I have not been able to resolve. I am trying to krig spatialized
ordination scores into a raster of the Brazilian Atlantic Forest, as you
can see below. I previously run a NMDS with 5 dimensions on species
presence-absence data and saved the nmds scores. I send in attachment the
atlantic forest shapefile and the nmds scores, to make the code more
reproducible.

# Convert NMDS-scores to SpatialPoints objects
coords = as.data.frame(coords)
names(coords) = c("longitude","latitude")
nmds_scores<-cbind(coords, nmds_scores)
coordinates(nmds_scores)<-~longitude+latitude # Convert 'nmds_scores'
dataframe to SpatialPointsDataframe
wgs84<-"+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0" #
WGS84 Coordinate Reference System (CRS)
crs(nmds_scores)<-wgs84 # Set the original CRS (WGS84)
new_crs<-"+proj=robin +lon_0=0 +x_0=0 +y_0=0 +a=6371000 +b=6371000 +units=m
+no_defs" # Transform CRS to Robinson projection (it is necessary to have
projected data for interpolation)
nmds_scores<-spTransform(nmds_scores, new_crs)


# Read the shapefile of Brazilian Atlantic Forest (BAF) and rasterize it
BAF_limit<-readOGR("C:/Users/Augusto/Documents/MESTRADO/Projeto
CAATINGA/Analise k-means/shapecaa")
BAF_limit<-spTransform(BAF_limit, wgs84) # Change the CRS from SIRGAS2000
to WGS84
r_ext<-extent(BAF_limit)
r<-raster() # Create a empty raster to receive data
res(r)<-0.0417 # Define raster resolution in decimal degrees (= 10
arc-min)
r<-crop(r, r_ext) # Define spatial extent of the raster (same as BAF)
BAF_raster<-rasterize(BAF_limit, r, field=1, background=NA)

# Get the spatial coordinates of BAF pixels
get_coords<-rasterToPoints(BAF_raster)
get_coords<-as.data.frame(get_coords[, c(1:2)])
get_coords<-cbind(get_coords, NA, NA, NA)
names(get_coords)<-c("longitude", "latitude", "NMDS1", "NMDS2", "NMDS3")
coordinates(get_coords)<-~longitude+latitude # Convert 'get_coords'
dataframe to SpatialPointsDataframe
crs(get_coords)<-wgs84 # Set the original CRS (WGS84)
get_coords<-spTransform(get_coords, new_crs) # Transform CRS to Robinson
projection

gModel_ok <- gstat(NULL, id='NMDS1_kri',
   formula= NMDS1 ~ 1,
   locations=coords, model=data.fit)

G0y_ok <- interpolate(BAF_raster, gModel_ok, xyOnly=FALSE)

Error in as(data, "data.frame") :
  no method or default for coercing “NULL” to “data.frame”


Do anyone has any idea of what may be wrong? I tried to empty the raster
changing all vaues to zero but did not work.

Thanks a log in advance,

Alexandre
-- 
Dr. Alexandre F. Souza
Professor Adjunto IV
Chefe do Departamento de Ecologia
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.esferacientifica.com.br
http://www.docente.ufrn.br/alexsouza
orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza>
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] Simpson distances on abundances?

2018-07-26 Thread Alexandre F. Souza
 Dear friends,

The Simpson dissimilarity index calculated with the function recluster.dist
of your package recluster can be calculated based on a matrix of abundances
instead of one of presence/absence? I searched for this piece of
information on Baselga 2010, Dapporto 2013, Dapporto 2014 and Lennon 2001
but could not find it.

Thanks in advance for any ideas,

Alexandre


-- 
Dr. Alexandre F. Souza
Professor Adjunto IV
Chefe do Departamento de Ecologia
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.esferacientifica.com.br
http://www.docente.ufrn.br/alexsouza
orcid.org/-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] Is stepacross a must?

2018-06-23 Thread Alexandre F. Souza
-- Forwarded message --
From: Alexandre F. Souza 
Date: 2018-06-23 7:04 GMT-03:00
Subject: Is stepacross a must?
To: Lista de discussao R-sig-ecology 


Dear friends,

I am in deep doubt regarding whether or not to apply vegan'a stepacross
function to my dissimilarity matrix. The matrix is a simpson dissimilarity
matrix with 665 rows reflecting beta diversity between localities along the
brazilian atlantic florest, with over 4000 species most of which are
singletons. They were retained for biological reasons.

We are appying a community modelling approach with k-means in order to find
the best number of clusters this dataset can be divided. The problem is
that we are obtaining quite different results with or without applying the
stepacross function. According to Tuomisto et al. 2012:

..."the stepacross
method or extended dissimilarities (Williamson 1978,
De’ath 1999), originally developed to improve the performance
of ordination. Th e aim is to fi nd the shortest path
between sites that do not share any species by using intermediate
sites as stepping-stones. The process replaces all saturated
dissimilarity values (or any dissimilarities larger than
a specifi ed threshold value) with new values that can exceed
unity. The new values can be as large as is necessary to reflect the amount
of compositional heterogeneity in the dataset,
so using extended dissimilarities instead of the original
ones in Mantel tests and multiple regression on dissimilarity
matrices can be expected to improve the performance of
these methods."

I reproduce the code below. Without stepacross we obtain dissimilarities by
large dominated by 1, and as many as 25 groups.
With stepacross they are more unimodal since the =1 distances were allowed
to have higher values, but as few as 6 groups! Furthermore, the final
number of groups varies more between runs with stepacross than without it.
With stepacross it seems almost to alternate between 6 and 11 groups, with
6 being more frequent.

What do you thin it would be wiser to do? I am leaning towards the
stepacross but I am not that sure. The help page of the function
mentions that in cases with high beta-diversity datasets stepacross must be
used.

Thanks for any thoughts,

Alexandre

simpdist = recluster.dist(sp, dist = "simpson")
simp.step = stepacross(simpdist)
dissimilarity_matrix<-as.matrix(simp.step)

# Perform an ordination via Principal Coordinate Analysis (PCoA) with
correction for negative eigenvalues [add = T]
ordination_output<-cmdscale(dissimilarity_matrix,
k=(nrow(dissimilarity_matrix)-1), eig = F, add = T, x.ret = FALSE)
pcoa_scores<-as.data.frame(ordination_output[[1]]) # Extract the PCoA
scores for each site

# Get values of the evaluation criterion (Within-Groups SS) for k varying
from 2 to 'number of sites minus 1'
sum_of_squares <- foreach(i = 2:(ncol(dissimilarity_matrix)-1),
  .combine = 'cbind')  %dopar% { # cbind the
results of foreach
classification<-kmeans(pcoa_scores, i,
iter.max=1000, nstart=100)#usar inter.max=1000 e nstart=100
classification$tot.withinss} # the
Within-Groups SS for k=i
y<-t(as.matrix(sum_of_squares)) # convert Within-Group SS to a vector (y
for piecewise regressions)
x<-c(2:(ncol(dissimilarity_matrix)-1)) # number of clusters (x for
piecewise regressions)

# Create an empty matrix to receive the values of 'max-k entered a priori'
and the respective 'optimal k selected'
initial_kmax<-c(4:ncol(dissimilarity_matrix)) # At least 4 points to
produce two lines (L-method)
cluster_results<-cbind(initial_kmax, c(NA)) # Dataframe to receive values
of the breakpoints

# Perform multiple piecewise regression varying the max-k entered a priori
from 4 to 'number of sites'
for (j in 4:ncol(dissimilarity_matrix)){ # the maximum number of 'k'
established a priori
  y_selected<-y[1:j] # Limit the number of starting points (y axis)
  x_selected<-x[1:j] # Limit the number of starting points (x axis)
  find_the_knee <- foreach(i = 2:(j-1), # calculate the RSE for all
possible breakpoints (i) when starting with k-max = j
   .combine = 'cbind')  %dopar% { #

piecewise_k<-lm(y_selected~x_selected*(x_selected=i))
 summary(piecewise_k)[6]} # print the RSE for
each piecewise regression
  find_the_knee<-c(do.call("cbind",find_the_knee)) # convert the previous
output (list) to vector
  knee_found<-as.data.frame(cbind(c(2:(j-1)), find_the_knee)) # combine
position of breakpoint and RSE value
  names(knee_found)<-c("breakpoint", "RSE") # rename
  knee_found<-na.omit(knee_found) # remove NaN values
  k_number<-knee_found[which(knee_found$RSE==min(knee_found$RSE)),1] #
select breakpoint that minimizes RSE
  cluster_results[(j-3),2]<-k_number} # print the maximum

[R-sig-eco] Deviance in Multinomial Logistic Regression

2017-06-01 Thread Alexandre F. Souza
Hello,

I am using the following code to perform model selection on a categorical
variable with 9 levels corresponding to floristic regions. Explanatory
variables are climatic, soil, and MEMs describing spatial structure. I am
using AIC and wAIC to choose models. As far as I understood, the Deviance
should be smaller in the worse model, which would be similar to the null
model containing just the intercept, and larger in the best model. However,
the Deviance statistic, however, is increasing from the best model to the
worse model. Do anyone have any idas about what am I doing or reading wrong?

Many thanks in advance, any ideas will help


# Verify the total deviance (null model)

Null_model<-multinom(subregion~1, data=variables)
Total_deviance<-Null_model[[25]] # store the total deviance

# Prepare the models to be tested
formlist <- list() # Empty list to receive nested formulas in forward
selection
formlist[[1]]<-as.formula("subregion~AMT+APP+CEC+Sand+TAR+MEM2+MEM1+MEM3+MEM4+MEM5+MEM27+MEM24+MEM15+MEM10+MEM26+MEM16")
# Climatic/soil Productivity
formlist[[2]]<-as.formula("subregion~ElevR+ElevCV+MEM2+MEM1+MEM3+MEM4+MEM5+MEM27+MEM24+MEM15+MEM10+MEM26+MEM16")
# Topography
formlist[[3]]<-as.formula("subregion~HFP+MEM2+MEM1+MEM3+MEM4+MEM5+MEM27+MEM24+MEM15+MEM10+MEM26+MEM16")
# Human footprint
formlist[[4]]<-as.formula("subregion~AMT+APP+CEC+Sand+TAR+ElevR+ElevCV+MEM2+MEM1+MEM3+MEM4+MEM5+MEM27+MEM24+MEM15+MEM10+MEM26+MEM16")
# Productivity + Topography
formlist[[5]]<-as.formula("subregion~AMT+APP+CEC+Sand+TAR+HFP+MEM2+MEM1+MEM3+MEM4+MEM5+MEM27+MEM24+MEM15+MEM10+MEM26+MEM16")
# Productivity + HFP
formlist[[6]]<-as.formula("subregion~HFP+ElevR+ElevCV+MEM2+MEM1+MEM3+MEM4+MEM5+MEM27+MEM24+MEM15+MEM10+MEM26+MEM16")
# HFP + Topography
formlist[[7]]<-as.formula("subregion~AMT+APP+CEC+Sand+TAR+ElevR+ElevCV+HFP+MEM2+MEM1+MEM3+MEM4+MEM5+MEM27+MEM24+MEM15+MEM10+MEM26+MEM16")
# Total Contemporary Environment
formlist[[8]]<-as.formula("subregion~MEM2+MEM1+MEM3+MEM4+MEM5+MEM27+MEM24+MEM15+MEM10+MEM26+MEM16")
# Space only


# Prepare empty dataframe to receive models' results

MLRresults<-as.data.frame(matrix(nrow=8, ncol=4))
colnames(MLRresults) <- (c("Model", "Deviance", "AIC", "AICc"))
for (i in 1:8){ # the remaining variables to be forward selected
  formula<-formlist[[i]]
  MLR_test<-multinom(formula, data=variables)
  MLRresults[i,1]<-paste(formlist[i]) # store the 'model formula'
  MLRresults[i,2]<-(Total_deviance-MLR_test[[19]])/Total_deviance # store
the % of deviance explained
  MLRresults[i,3]<-MLR_test[[26]] # store the AIC value
  MLRresults[i,4]<-AICc(MLR_test)} # store the AICc value

wAICc<-Weights(MLRresults$AICc) # AIC weigth
MLRresults<-cbind(MLRresults, wAICc) # combine results in a single table
View(MLRresults[order(MLRresults$wAICc, decreasing=T), ]) # print the
output table from the model selection procedure

The above code yields

MODEL
  DEVIANCE  AICAICc wAIC





8
subregion ~ MEM2 + MEM1 + MEM3 + MEM4 + MEM5 + MEM27 + MEM24 + MEM15 +
MEM10 + MEM26 + MEM16
0.5025134
738.7133
910.5673
9.07e-01
3
subregion ~ HFP + MEM2 + MEM1 + MEM3 + MEM4 + MEM5 + MEM27 + MEM24 + MEM15
+ MEM10 + MEM26 + MEM16
0.5393317
718.0280
933.7467
9.260806e-06
2
subregion ~ ElevR + ElevCV + MEM2 + MEM1 + MEM3 + MEM4 + MEM5 + MEM27 +
MEM24 + MEM15 + MEM10 + MEM26 + MEM16
0.5574922
716.9467
985.8879
4.409071e-17
6
subregion ~ HFP + ElevR + ElevCV + MEM2 + MEM1 + MEM3 + MEM4 + MEM5 + MEM27
+ MEM24 + MEM15 + MEM10 + MEM26 + MEM16
0.5805342
710.7362
1044.5544
8.036491e-30
1
subregion ~ AMT + APP + CEC + Sand + TAR + MEM2 + MEM1 + MEM3 + MEM4 + MEM5
+ MEM27 + MEM24 + MEM15 + MEM10 + MEM26 + MEM16
0.6736414
648.9077
1161.1251
3.909003e-55
5
subregion ~ AMT + APP + CEC + Sand + TAR + HFP + MEM2 + MEM1 + MEM3 + MEM4
+ MEM5 + MEM27 + MEM24 + MEM15 + MEM10 + MEM26 + MEM16
0.6818203
658.3141
1294.6032
4.051928e-84
4
subregion ~ AMT + APP + CEC + Sand + TAR + ElevR + ElevCV + MEM2 + MEM1 +
MEM3 + MEM4 + MEM5 + MEM27 + MEM24 + MEM15 + MEM10 + MEM26 + MEM16
0.6943008
663.2007
1458.1197
1.260428e-119
7
subregion ~ AMT + APP + CEC + Sand + TAR + ElevR + ElevCV + HFP + MEM2 +
MEM1 + MEM3 + MEM4 + MEM5 + MEM27 + MEM24 + MEM15 + MEM10 + MEM26 + MEM16
0.7016101
673.5208
1675.9823
6.197945e-167

-- 
Dr. Alexandre F. Souza
Professor Adjunto IV
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://esferacientifica.wixsite.com/esferacientifica
http://www.docente.ufrn.br/alexsouza

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

[R-sig-eco] Error in optim function

2016-01-19 Thread Alexandre F. Souza
Dear friends,

I am getting a persistent error message when trying to estimate
lotka-volterra competition coefficients from data on two populations of
paramecium using the optim function. Do anyone knows how could I solve
this? The error message is

Error in eval(substitute(expr), data, enclos = parent.frame()) :
  numeric 'envir' arg not of length one

Debug
function (data, expr, ...)
eval(substitute(expr), data, enclos = parent.frame())

and the code is

library(deSolve)

LotVolt = function(t,n,parms) {
  with(parms,{
dn = r*n*(1-LV%*%n)
return(list(dn))
  })
}

LVmse = function(parms) {
  out = as.data.frame(lsoda(n0, times, LotVolt, parms)) # run ode
  mse = mean((out[,2:3]-nTrue)^2) # calculate mean squared error between
simulated and original data
  return(mse) # return mean squared error
}

dados =
cbind(c(2,4,7,25,14,43,81,140,180,224,240,204,375,370),c(2,4,6,13,41,74,195,164,160,240,230,215,230,215))
nTrue = as.matrix(dados)
s = 2 #number of species
r = c(0.7816, 0.6283) #vector of species intrinsic growths
K = c(559.686, 202.4931) #vector of species carrying capacities
n0 = c(n=nTrue[1,])
alpha0 = diag(1,s,s)  #alpha is matrix of species competition coefficients,
ones on diagonal
a12guess = 0.5
a21guess = 0.5
alpha0[2,1] = a12guess
alpha0[1,2] = a21guess
tf = dim(nTrue)[1] # time
times = 1:tf # vector of times to run over
LV0 = diag(1/K)%*%alpha0 #modified lotka-volterra matrix is species
competitions divided by carrying capacities

optimOut = optim(LV0, LVmse) # give optim initial guess and function
parms = optimOut$par # extract parameters
nSim = as.data.frame(lsoda(n0,times,logInt,parms)) # run ode to get
simulated population

Any assistence would help.

Many thanks in advance,

Alexandre

-- 
Dr. Alexandre F. Souza
Professor Adjunto III
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.docente.ufrn.br/alexsouza

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

[R-sig-eco] Number of Groups in SpeciesMix

2016-01-11 Thread Alexandre F. Souza
Dear friends,

I am willing to apply the SAM analytical framework to a dataset of plant
species in coastal Brazil using the SpeciesMix package. The SpeciesMix
package fits Species Archtype Models, a special type of finite mixture of
regression model motivated by the analysis of multi-species data.

In appying function clusterSelect, which helps in defining the best number
of species groups G, I would like to confirm if my understanding is
correct: the formula reported there as "obs ~ 1 + x" in

clusters <- clusterSelect(obs~1+x,dat1$pa,dat,G=2:5,em.refit=2)

is a generic formulation not directly related to the species (dat1$pa) or
environmental (dat) data, isn't it? So in principle I should use this same
formulation as well, understanding that obs stands for the whole species
data matrix, 1 for the presence of a constant, and x for the whole
explanatory dataset?

I tried to apply obs~1+x but it returns an error message, however.

I am kind of blocked here so any thoughts could help...

Sincerely,

Alexandre

-- 
Dr. Alexandre F. Souza
Professor Adjunto III
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.docente.ufrn.br/alexsouza

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

[R-sig-eco] How to fit a logistic discrete function!

2016-01-09 Thread Alexandre F. Souza
Dear friends,

Sorry for the somewhat lengthy message, but I would really like to hear
your opinion on this.

I am organizing an undergraduation population ecology course and would like
to work the logistic equation as proposed by Bellows 1981 and presented by
Begon et al. 2006 in their Ecology textbook:

Nt+1 =(Nt*R)/((1+a*Nt)^b)

Where R is lambda, the discrete growth rate (R = Nt+1/Nt), and a and b are
constants.

The beauty of this equation, in contrast to other more traditional
formulatons, is because it bypasses the support capacity K, which is hard
to know in advance, and makes it a function of a and R as K = (R - 1)/a. So
K becomes a consequence of population properties and not an unrealistic
prerequisite. Furthermore, a is meaningful in its own right, measuring the
per capita susceptibility to crowding: the larger the value of a, the
greater the effect of density on the actual rate of increase in the
population. Finally, b is thought to portray undercompensation (b < 1),
perfect compensation (b = 1), scramble-like overcompensation (b > 1) or
even density independence (b = 0).

However, because the the equation is an iterative one, I am having a hard
time figuring out how to fit it to an empirical dataset made by a time
series of population counts. I believe this is key to make it more concrete
for the students, to show how to make it practical and not merely a
theoretical construct.

I could not find this equation used much often, and could not find an
example of how fitting it.

Does anyone have any suggestions?

Thank your very much in advance,

Alexandre




-- 
Dr. Alexandre F. Souza
Professor Adjunto III
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.docente.ufrn.br/alexsouza

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

[R-sig-eco] Summary of the Answers I got for my post "How to incorporate spatial autocorrelation in multivariate GLM "

2015-09-12 Thread Alexandre F. Souza

*groundfishes. **James T. Thorson1*, Andrew O. Shelton2, Eric J. Ward2, and
Hans J. Skaug. **ICES Journal of Marine Science; doi:10.1093/icesjms/fsu243*

**

*The importance of spatial models for estimating the strength of **density
dependence.**JAMES T. THORSON,1,6 HANS J. SKAUG,2 KASPER KRISTENSEN,3
ANDREW O., HELTON,4 ERIC J.WARD,4 JOHN H. HARMS,1 **AND JAMES A.
BENANTE. **Ecology,
96(5), 2015, pp. 1202–1212. *


-- 
Dr. Alexandre F. Souza
Professor Adjunto III
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.docente.ufrn.br/alexsouza

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Problem in Search and Association Loop

2015-09-05 Thread Alexandre F. Souza
Hello Fabrice, hello Hanna,

Thank you very much for your responses to my question. Both your
suggestions were helpful and worked nicely.

Best wishes,

Alexandre

2015-09-05 10:52 GMT-03:00 Fabrice Dessaint <fabrice.dessa...@dijon.inra.fr>
:

>
> Hello Alexandre,
>
> I am writing a loop to compare scientific plant names from one data frame
> in a second data frame containing a smaller set of potentially different
> plant names, and then to copy the associated abundances of the sp already
> present in the first database, and add the new species to the bottom.
>
>
> Perhaps, you can use merge() function ?
>
> > base <- data.frame(
> +
> family=c("Burseraceae","Anacardiaceae","Euphorbiaceae","Anacardiaceae","Apocynaceae",
> + "Sapotaceae","Malpighiaceae","Polygonaceae","Burseraceae"),
> +
> genus=c("Coccoloba","Tapirira","Pogonophora","Thyrsodium","Himatanthus","Pouteria",
> + "Byrsonima","Coccoloba","Protium"),
> + epithet=c("cf.
> cordifolia","guianensis","schomburgkiana","spruceanum","phagedaenicus",
> + "coriacea","sericea","cf. cordifolia","heptaphyllum"),
> + abundance=c(224,112,146,115,47,71,25,29,37))
> >
> > new.base <- data.frame(
> + genus=c("Coccoloba","Protium","Bowdichia","Sclerolobium","Ocotea"),
> + epithet=c("cf.
> cordifolia","heptaphyllum","virgilioides","densiflorum","glomerata"),
> + abundance=c(29,37,15,15,16))
> >
> > merge.data.frame(base,new.base,by=c("genus", "epithet"), all=TRUE)
>   genusepithetfamily abundance.x abundance.y
> 1 Byrsonimasericea Malpighiaceae  25  NA
> 2 Coccoloba cf. cordifolia   Burseraceae 224  29
> 3 Coccoloba cf. cordifolia  Polygonaceae  29  29
> 4   Himatanthus  phagedaenicus   Apocynaceae  47  NA
> 5   Pogonophora schomburgkiana Euphorbiaceae 146  NA
> 6  Pouteria   coriaceaSapotaceae  71  NA
> 7   Protium   heptaphyllum   Burseraceae  37  37
> 8  Tapirira guianensis Anacardiaceae 112  NA
> 9Thyrsodium spruceanum Anacardiaceae 115  NA
> 10Bowdichia   virgilioidesNA          15
> 11   Ocotea  glomerataNA  16
> 12 SclerolobiumdensiflorumNA  15
>
>
> Fabrice
>
> ---
> Fabrice Dessaint
>
> UMR 1347 Agroécologie
> INRA/AgroSup/uB
> 17 rue Sully,
> BP 86510 F-21065 Dijon Cedex
> Tel: +33 (0)3 80 69 31 83
> ———
> ResearchID: http://www.researcherid.com/rid/F-1102-2014
> ORCID: http://orcid.org/-0001-9135-598X
> ResearchGate: http://www.researchgate.net/profile/Fabrice_Dessaint
>
>
>
>
>
>
>


-- 
Dr. Alexandre F. Souza
Professor Adjunto III
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.docente.ufrn.br/alexsouza

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

[R-sig-eco] Problem in Search and Association Loop

2015-09-04 Thread Alexandre F. Souza
Dear friends,

I am writing a loop to compare scientific plant names from one data frame
in a second data frame containing a smaller set of potentially different
plant names, and then to copy the associated abundances of the sp already
present in the first database, and add the new species to the bottom.

I am receiving, however, an error message saying that factor levels are
different. Can someone give me a hint on what can be happening?

Here you have the code

The base dataframe

> base
family   genus epithetabundance
1   Burseraceae   Coccoloba cf. cordifolia 224
2 AnacardiaceaeTapirira guianensis 112
3 Euphorbiaceae Pogonophora schomburgkiana 146
4 Anacardiaceae  Thyrsodium spruceanum 115
5   Apocynaceae Himatanthus  phagedaenicus  47
6SapotaceaePouteria   coriacea  71
7 Malpighiaceae   Byrsonimasericea  25
8  Polygonaceae   Coccoloba cf. cordifolia  29
9   Burseraceae Protium   heptaphyllum  37

The new data data.frame

> new.data
genussp.newabundance
1Coccoloba cf. cordifolia   29
2  Protium   heptaphyllum   37
3Bowdichia   virgilioides   15
4 Sclerolobiumdensiflorum   15
5   Ocotea  glomerata   16

The Output table


output= matrix(nrow = 10, ncol = 3)
colnames(output) = c("genus", "epithet", "abundance")
output


for (i in 1:nrow(base)){
  for (j in 1:nrow(new.data)){
if ((base[i,2] == new.data[j,1]) & (base[i,3] == new.data[j,2])){
  output[i,1] = base[i,2]
  output[i,2] = base[i,3]
  output[i,3] = new.data[j,2]
}
  }
}

This is just the beginning of the code I plan. I stopped due to the error
message.

Thank you very much in advance,

Alexandre

-- 
Dr. Alexandre F. Souza
Professor Adjunto III
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.docente.ufrn.br/alexsouza

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

[R-sig-eco] Understanding of P-values from rankings

2015-04-08 Thread Alexandre F. Souza
Hi,

For a simulation of 1000 runs and a threshold of 0.05:

For a two-tailed test all ranks  25 or  975 would be significant

For a one-tailed test all ranks  50 would be significant (considering a
less-than one-tailed hypothesis).

I think I got it now.

Thank you,

Best,

Alexandre



2015-04-08 14:50 GMT-03:00 Alexandre F. Souza 
alexsouza.cb.ufrn...@gmail.com:


 Dear friends,

 I have fun a series of null models of tree species traits (e.g., height)
 in permanent forest plots using the code found in Kraft's web page (
 http://life.umd.edu/biology/kraftlab/Code_files/trait_tests.R). Null
 distributions of trait metrics were generated by creating 999 null
 communities of equal richness to the sample quadrat by drawing species at
 random from our entire trait database, weighted by plot-wide occurrence
 (the number of quadrats in which each species is found).

 For each of 245 plots, the output consists in the following output:

 test.richness 11 reps 999 mean_rank 972 obs_mean 0.850482 null_mean_mean
 0.816427 null_mean_sd 0.019774
 Where mean_rank means the ranking of the observed mean.

 Now I need to obtain a P-value from the ranking information. I understand
 that a one-tailed P-value would be = observed rank/(number of runs + 1),
 and two-tailed P values should be twice this value, as found in the manual
 of Phylocom. Having 999 runs this gives almost equal Ps as the ranks of the
 means, with P = 0.972 for a one-tailed test in this case. However, a
 two-tailed test would yield a P-value  1.0 in this case.

 How do I use the rank of the mean with the number of runs to obtain a
 one-tailed and two-tailed value?

 Sorry for the introductory level of the question, but it is for a good
 cause.

 Best,

 Alexandre

 --
 Dr. Alexandre F. Souza
 Professor Ajunto III
 Universidade Federal do Rio Grande do Norte
 CB, Departamento de Ecologia
 Campus Universitário - Lagoa Nova
 59072-970 - Natal, RN - Brasil
 lattes: lattes.cnpq.br/7844758818522706
 http://www.docente.ufrn.br/alexsouza




-- 
Dr. Alexandre F. Souza
Professor Ajunto III
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.docente.ufrn.br/alexsouza

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

[R-sig-eco] Problem with Capscale and Phylobetadiversity distance matrix

2015-03-18 Thread Alexandre F. Souza
Hello everyone,

I have a problem trying to use vegan's capscale() on a phylobetadiversity
distance matrix, reproduced below using the example:

library(psych)  # For the PCA test involved

# Opens a full matrix of 245 x 245 phylobetadiversity distances
bnti.ses = read.csv(file.choose(), header = T, row.names=1);
dim(bnti.ses); # this gives the dimensions
bnti.ses[1:5,1:5]; # this gives a look at the first 5 rows and columns

# Test PCA scores
test = data.frame(replicate(10,sample(0:100,245,rep=TRUE)))
test
pca = principal(teste, nfactors = 3, covar = FALSE, scores = TRUE)
meta.pca.scores = pca$scores
rownames(meta.pca.scores)=rownames(bnti.ses)


## dbRDA

# betaNTI standard effect size


bnti.ses = bnti.ses + abs(min(bnti.ses));

bnti.ses = bnti.ses/max(bnti.ses);

range(bnti.ses);


identical(rownames(bnti.ses),colnames(bnti.ses)); # must be TRUE to proceed

identical(rownames(bnti.ses),rownames(meta.pca.scores)); # must be TRUE to
proceed


# Here is where the error occurs
bnti.ses.dbrda.int = capscale(as.dist(bnti.ses) ~ 1,data=meta.pca.scores);

bnti.ses.dbrda.full = capscale(as.dist(bnti.ses) ~ .,data=meta.pca.scores);

bnti.ses.select.rsq = ordiR2step(bnti.ses.dbrda.int, scope =
formula(bnti.ses.dbrda.full),direction = both,pstep=1, perm.max =
10); bnti.ses.select.rsq;

I get the message:

Error in formula.default(object, env = baseenv()) : invalid formula


I tried the same code using the varespec and varechem example files present
in capscale's help and it worked. I checked the dimensions of the
phylobetadiversity matrix and they are ok (245 x 245). I tried to omit the
row names but the problem persists.

For me the very last test that should have worked was

bnti.ses = data.frame(replicate(245,sample(0:100,245,rep=TRUE)))

rownames(bnti.ses)=rownames(meta.pca.scores)colnames(bnti.ses)=rownames(bnti.ses)


and then working this artificial bnti.ses through the subsequent
standardization and capscale steps. However, I got the same error mesage
with this example.

Am I missing something, or is there a workaround?

Thanks,

Alexandre
-- 
Dr. Alexandre F. Souza
Professor Ajunto II
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.docente.ufrn.br/alexsouza



-- 
Dr. Alexandre F. Souza
Professor Ajunto II
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.docente.ufrn.br/alexsouza

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology