Re: [R-sig-eco] The final result of TWINSPAN

2011-04-27 Thread Dave Roberts

Dear Zoltan,

   Thanks for the note.  The R function I wrote does in fact follow the 
Roleček et al protocol, and that's partly what motivated the idea to 
write it up. Lubomír Tichý, Petr Smilauer, and Laco Mucina have all 
contributed information in the development, but I've still been stymied 
by the lack of solid information on the actual algorithm.


   I think it is quite possible to write a function that operates on 
the principle of TWINSPAN, following Roleček et al, but writing a 
function that exactly matches the output from the commercial package may 
prove to be too much trouble.


Thanks, Dave

Zoltan Botta-Dukat wrote:

Dear Dave,

This modified version of TWINSPAN may be interesting for you when you 
compare methods:


Modified TWINSPAN classification in which the hierarchy respects cluster 
heterogeneity


Jan Roleček, Lubomír Tichý, David Zelený, Milan Chytrý 2009 Modified 
TWINSPAN classification in which the hierarchy respects cluster 
heterogeneity Journal of Vegetation Science 20(4): 596–602
http://onlinelibrary.wiley.com/doi/10./j.1654-1103.2009.01062.x/abstract 



Zoltan

2011.04.26. 23:40 keltezéssel, Dave Roberts írta:

Dear List,

Earlier this year on an (undoubtedly ill-advised) lark I coded up an R 
version of TWINSPAN. It's far from a polished package at this point, 
but the code does run. One of the interesting features is that you can 
partition a PCO or NMDS in addition to the traditional CA. To be 
clear, I am not a TWINSPAN fan either, but I wanted it for a methods 
paper I was working on.


The problem is that I based the code on Hill, Bunch  Shaw (1975,
J of Ecol 63:597-613) which is what I had available. Apparently the 
algorithm in the commercial TWINSPAN is significantly modified from 
the original, but I couldn't find a description of the actual 
algorithm anywhere in the literature. It is probably described in the 
User Manual of the software, but I was not sufficiently motivated to 
chase down a copy. I do have a copy of the FORTRAN code, but it was 
apparently written in FORTRAN II, and is basically inscrutable, even 
to an old FORTRAN dog like me.


So, if somebody has a clear description of the actual algorithm (and I 
think it is disturbing that I could not find one), it would be 
possible to code it up in native R. The alternative, to write a 
wrapper for the original FORTRAN code is not a trivial task. I gave it 
a couple of days and gave up.




___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] The final result of TWINSPAN

2011-04-26 Thread Dave Roberts

Dear List,

Earlier this year on an (undoubtedly ill-advised) lark I coded up 
an R version of TWINSPAN.  It's far from a polished package at this 
point, but the code does run.  One of the interesting features is that 
you can partition a PCO or NMDS in addition to the traditional CA. To be 
clear, I am not a TWINSPAN fan either, but I wanted it for a methods 
paper I was working on.


The problem is that I based the code on Hill, Bunch  Shaw (1975,
J of  Ecol  63:597-613) which is what I had available.  Apparently the 
algorithm in the commercial TWINSPAN is significantly modified from the 
original, but I couldn't find a description of the actual algorithm 
anywhere in the literature.  It is probably described in the User Manual 
of the software, but I was not sufficiently motivated to chase down a 
copy.  I do have a copy of the FORTRAN code, but it was apparently 
written in FORTRAN II, and is basically inscrutable, even to an old 
FORTRAN dog like me.


So, if somebody has a clear description of the actual algorithm 
(and I think it is disturbing that I could not find one), it would be 
possible to code it up in native R.  The alternative, to write a wrapper 
for the original FORTRAN code is not a trivial task.  I gave it a couple 
of days and gave up.


--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

On 04/14/2011 01:57 AM, Jari Oksanen wrote:

On 14/04/11 10:37 AM, Yong Zhang2010202...@njau.edu.cn  wrote:


Dear all,

I conducted the two-way indicator species analysis using TWINSPAN program, and
following is the final result:

  0111
  00011011
  011000111
   01001001

I have to certify my analysis, I want to classify the above 24 sampling sites
into 3 major groups based on 7 biotic metrics. The name of my 24 samples could
be site1 to site24, from the left to the right, and I set the cut levels 0, 2,
5, 10, 20,  the maximum level of divisions: 6, and maximum group size for
division:3 .

Now, my question is whether my setting is correct? And how should I classify
these sites into 3 groups accoding to this final result?

Dear Yong Zhang,

This is not an R issue, because there is no TWINSPAN in R. However, the
answer to your question is that strictly speaking you cannot group your data
into three major groups with TWINSPAN. TWINSPAN is a bisection method so
that first division gives you two groups, and second splits each of these
into two groups so that the next choice is to have four groups. However, in
this case one of the groups was so small (3 plots were split off from other
in the first division, and then these were split into groups of 2 plots and
1 plot) that you probably can ignore the second division of the small group.

If your goal was as vague as wanting to classify 24 sites into 3 major
groups you could do better than use TWINSPAN: what's the problem with proper
classification methods in R? Moreover, have you checked that your biotic
metrics suit to the pseudospecies cut level concept of TWINSPAN?

Cheers, jari oksanen

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] The final result of TWINSPAN

2011-04-26 Thread Jari Oksanen
On 27/04/11 00:40 AM, Dave Roberts dvr...@ecology.msu.montana.edu wrote:
 
  Earlier this year on an (undoubtedly ill-advised) lark I coded up
 an R version of TWINSPAN.  It's far from a polished package at this
 point, but the code does run.  One of the interesting features is that
 you can partition a PCO or NMDS in addition to the traditional CA. To be
 clear, I am not a TWINSPAN fan either, but I wanted it for a methods
 paper I was working on.
 
  The problem is that I based the code on Hill, Bunch  Shaw (1975,
 J of  Ecol  63:597-613) which is what I had available.  Apparently the
 algorithm in the commercial TWINSPAN is significantly modified from the
 original, but I couldn't find a description of the actual algorithm
 anywhere in the literature.  It is probably described in the User Manual
 of the software, but I was not sufficiently motivated to chase down a
 copy.  I do have a copy of the FORTRAN code, but it was apparently
 written in FORTRAN II, and is basically inscrutable, even to an old
 FORTRAN dog like me.
 
  So, if somebody has a clear description of the actual algorithm
 (and I think it is disturbing that I could not find one), it would be
 possible to code it up in native R.  The alternative, to write a wrapper
 for the original FORTRAN code is not a trivial task.  I gave it a couple
 of days and gave up.

Dave,

Hill, Bunch  Shaw describe the general idea of TWINSPAN, but the
implementation is more complicated. Martin Kent and Paddy Coker do a great
job of explaining the twists in their book (vegetation description and
analysis: a practical approach). If I remember correctly, the TWINSPAN
manual also was more detailed, but I lost it somewhere when I moved around
(for the kids: it was a bunch of paper: pdf was not yet invented when
TWINSPAN was published).

I don't think that the actual TWINSPAN is easily extended beyond CA. Each
step is a two-stage one-dimensional ordination on a current subset, where
the first stage selects indicators and the second stage is polarized for the
indicator species. The final split is based on site ordination and
indicators are secondary (which we see in misclassifications if you try to
use the provided key for the data that was classified in TWINSPAN). The
polarization stage is particularly challenging when working with
dissimilarities (PCO, NMDS).

I don't think that the FORTRAN I have is completely impenetrable. I think
the largest problem is the design principle: R code should run silently and
return a result, but TWINSPAN prints when it goes on and returns only a part
of the result. Incorporating that in R would need stripping most PRINT and
WRITE and have subroutines to return useful data directly.

I also wrote a small funny test on TWINSPAN principle, where the splitting
and pre-defined pseudospecies where replaced with regression tree split.
I'll send you a copy of that and the FORTRAN (IV, I think) code I have in a
separate message.

Cheers, Jari Oksanen

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] The final result of TWINSPAN

2011-04-14 Thread Gavin Simpson
Dear Yong,

This *is* a list about R. Your question has *nothing* to do with R.
Please ask such questions elsewhere, like the ORDNEWS list.

On Thu, 2011-04-14 at 15:37 +0800, Yong Zhang wrote:
snip /
 I conducted the two-way indicator species analysis using TWINSPAN
 program, and following is the final result:

Being painfully aware of the output TWINSPAN generates, I'm certain this
isn't all that TWINSPAN output, but I presume it is the binary indicator
for the groups/splits from the output?

  0111
  00011011
  011000111   
   01001001 
 
 I have to certify my analysis, I want to classify the above 24
 sampling sites into 3 major groups based on 7 biotic metrics. The name
 of my 24 samples could be site1 to site24, from the left to the right,
 and I set the cut levels 0, 2, 5, 10, 20,  the maximum level of
 divisions: 6, and maximum group size for division:3 .  

Then you are out of luck, without using some other means of pruning back
divisions. TWINSPAN implements a binary split process, and without other
intervention you get 0, 2, 4, 8, 16 groups etc. You /can/ post-process
the results of TWINSPAN using another DOS application to merge lower
nodes of certain, specific branches into higher nodes to get different
numbers of groups than 2, 4, 8, ..., but I forget the name of this DOS
application at the moment - I used to teach a computer class using this
so I have the details somewhere so will see if I can hunt those old
notes out.

My interpretation of the above would be that you could just ignore the
split that cuts the 3 extreme right samples into two groups so you have
groups consisting of the first 11 samples, the next 10 in another group,
and the final 3 samples in a group. But that is without seeing any of
the other output, so I don't know if the CA clustering technique used is
doing silly things splitting your main group of samples - i.e. are there
samples close to the origin but on opposite sides that are similar to
one another but which have been pushed into separate groups?

Hopefully the above helps, but please direct further and future requests
for help with non-R applications to more appropriate lists.

G

 Now, my question is whether my setting is correct? And how should I
 classify these sites into 3 groups accoding to this final result?
 
 Thanks in advance for your time and suggestion.
 
 Kind wishes,
 
 Yong 
 
 
 2011-04-14 
 
 
 
 ZHANG Yong
 Lab of aquatic insects  stream ecology
 Dept.of Entonology, Nanjing Agricultural University
 Nanjing, 210095,China 
 Phone number:  (+86) -25-84395241
 E-mail:2010202...@njau.edu.cn
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology