Re: [R-sig-eco] Package bio.infer: get.taxonomic() hangs R after editing taxa list

2012-12-12 Thread Eduard Szöcs

Hai Rich,

Have a look at our recently released taxize-package.
However we some some problems with the ITIS-API, too. (Maybe some server 
issues?)

See https://github.com/ropensci/taxize_/issues/72

However with taxize you can also query the NCBI taxonomy browser as 
alternative:


df - read.table(header = TRUE, as.is = TRUE, text = 'SVN  
Taxon CountValue

1 WP220110711  Zaitzevia.parvula484
2 WP220110711   Tvetenia109
3 WP220110711Tubificidae   1054
4 WP220110711Sweltsa 11
5 WP220110711 Suwallia.pallidula 32
6 WP220110711 Stempellinella 11 ')
df$Taxon - gsub(\\.,  , df$Taxon)

require(taxize)

#query itis
classification(get_tsn(df$Taxon))
### This may fail sometimes. 'Connection reset by peer'
### we are working at it:
### see https://github.com/ropensci/taxize_/issues/72

# query ncbi taxonomy-browser instead
classification(get_uid(df$Taxon))


Hope this helps,

Eduard



On 12/11/2012 11:08 PM, Rich Shepard wrote:

On Mon, 10 Dec 2012, Sarah Goslee wrote:


... get.taxonomic() no longer has an outputFile argument.


  I've read the get.taxonomic() description in bio.infer.pdf and it works
... to a point. There are 3 taxa currently not in the ITIS database (a
freshwater worm, a midge, and one chironomid for which I can't track down
the genus). The first two are going through the validation process and 
will

be added to ITIS Real Soon Now. In the meantime, I don't know how to
proceed. So I've copied Lester Yuan on this message.

  I tell get.taxonomic() that I'm finished editing and it flashes a 
Tcl/Tk

window then ists the taxa not in the data base, but does not return the R
prompt:


jerall - get.taxonomic(bioinfer)

The following taxa are not in ITIS:
EISENIELLA
RADOTANYPUS
ZIATZEVIA

  I've not found a key combination that lets me gracefully exit R (or 
emacs
for that matter); I can only kill the process. Nothing I've seen in 
the help

files appears relevant.

  Any suggestions?

Rich

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Vegan metaMDS: unusual first run stress values with large data set

2012-12-12 Thread Jari Oksanen
Hello R-Community,

First my thanks to Ewan Isherwood who turned our attention to this issue and 
sent his data file to us for analysing the situation. 

It seems that the default convergence criteria are too slack in monoMDS() that 
was the ordination engine of metaMDS() in this case. Good news are that you can 
change those criteria by adding argument 'sfgrmin' to the metaMDS() call (this 
is documented in ?monoMDS). The following command seems to work:

 PSU.NMDS - metaMDS(PSU.sp, k=2, sfgrmin = 1e-7, distance = jaccard)


The default was 'sfgrmin = 1e-5' which was so slack the iteration stopped early 
and did not really converge close to the solution. With this option you can 
find that the correct stress is of magnitude 0.029 which is much lower than 
reported below. Moreover, the stresses of one-dimensional and two-dimensional 
solutions are very close to each other. (There was one outlier (P1763E) which 
only had one species (CHICRA) that occurred only in four other sites and 
distorted the results.)

I advice *against* using 'zerodist = add': it is not needed with monoMDS. 
Identical (distance = 0) sites will have identical scores if you do not use 
this argument. Using 'zerodist = add' is only necessary with MASS::isoMDS() 
that is unable to handle zero distances.

We have changed the default of 'sfgrmin' in http://www.r-forge.r-project.org/ 
so that you should not see this problem in the next vegan releases.

Cheers, Jari Oksanen

On 05/12/2012, at 21:15 PM, Ewan Isherwood wrote:

 Hello, R-Community! This is the first time writing to this group and
 indeed the first time using a mailing list, so please bear with me if
 I’ve done something wrong.
 
 I have a large species x site matrix (89 x 4831) that I want to
 ordinate using metaMDS in the Vegan (2.0-5) package in R (2.15.2). If
 I run this data frame using the Jaccard index in two or more
 dimensions (k1), the first run (run=0) has a relatively low stress
 value and the other 20 runs are much higher and have very low
 deviation. However, k=1 seems to work fine. Furthermore, a
 stress/scree plot reveals a pyramid-like shape, where the k=1 lowest
 stress value is low, increases rapidly for k=2 then decreases slowly
 as k increases.
 
 DimensionsStress
 1 0.1382185
 2 0.1939509
 3 0.1695375
 4 0.155221
 5 0.1406408
 6 0.1294149
 
 I’ve tried this with a small iteration of this data and this issue
 arises at k2 rather than at k1 as it is here. Anyway, this is the
 input and output:
 
 library(vegan)
 library(MASS)
 PSU - read.table(PSU.txt, header = TRUE, sep = )
 PSU.sp - PSU[, 22:110]
 PSU.NMDS - metaMDS(PSU.sp, k=4, zerodist = add, distance = jaccard)
 
 Square root transformation
 Wisconsin double standardization
 Zero dissimilarities changed into  0.0006657301
 Run 0 stress 0.155221
 Run 1 stress 0.2548103
 Run 2 stress 0.255434
 Run 3 stress 0.2551382
 … (Up to run 20 where run 1 through run 20 have all very similar stress 
 values.)
 
 Call:
 metaMDS(comm = PSU.sp, distance = jaccard, k = 4, zerodist = add)
 
 global Multidimensional Scaling using monoMDS
 
 Data: wisconsin(sqrt(PSU.sp))
 Distance: jaccard
 
 Dimensions: 4
 Stress: 0.155221
 Stress type 1, weak ties
 No convergent solutions - best solution after 20 tries
 Scaling: centring, PC rotation, halfchange scaling
 Species: expanded scores based on ‘wisconsin(sqrt(PSU.sp))’
 
 Now, again, with k=1 this does not happen – the solution looks like
 any other regular NMDS run. There are no blank values in the data as
 they are all numbers between 0 and 100 corresponding to % cover, and
 every row and column sum is greater than 0. There are many sites with
 the same species configurations, hence the zerodist, but omitting this
 makes no difference to the problem at hand. The NMDS works fine if I
 use a subset of the data, but I have not subsetted and tested all of
 it. Other metric (Euclidean) and nonmetric (Bray) dissimilarity
 indices result in the same effect. I’ve chosen k=4 here because of the
 (marginal) elbow in the stress plot, but the data itself actually
 looks pretty good at any k value. Even though the output is
 reasonable, I am concerned that hitting the best solution by a large
 amount on the first run means something is messing up, and this
 concern is amplified by the strange pyramid shaped stress plot.
 Because metaMDS uses random starts, I don't see how this output is
 possible. I've scoured the help files and archives of this list and I
 am really now at a loss to explain this.
 
 Thank you in advance for your time and consideration!
 
 Ewan
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org