On 27/04/11 00:40 AM, "Dave Roberts" <dvr...@ecology.msu.montana.edu> wrote: > > Earlier this year on an (undoubtedly ill-advised) lark I coded up > an R version of TWINSPAN. It's far from a polished package at this > point, but the code does run. One of the interesting features is that > you can partition a PCO or NMDS in addition to the traditional CA. To be > clear, I am not a TWINSPAN fan either, but I wanted it for a methods > paper I was working on. > > The problem is that I based the code on Hill, Bunch & Shaw (1975, > J of Ecol 63:597-613) which is what I had available. Apparently the > algorithm in the commercial TWINSPAN is significantly modified from the > original, but I couldn't find a description of the actual algorithm > anywhere in the literature. It is probably described in the User Manual > of the software, but I was not sufficiently motivated to chase down a > copy. I do have a copy of the FORTRAN code, but it was apparently > written in FORTRAN II, and is basically inscrutable, even to an old > FORTRAN dog like me. > > So, if somebody has a clear description of the actual algorithm > (and I think it is disturbing that I could not find one), it would be > possible to code it up in native R. The alternative, to write a wrapper > for the original FORTRAN code is not a trivial task. I gave it a couple > of days and gave up.
Dave, Hill, Bunch & Shaw describe the general idea of TWINSPAN, but the implementation is more complicated. Martin Kent and Paddy Coker do a great job of explaining the twists in their book ("vegetation description and analysis: a practical approach"). If I remember correctly, the TWINSPAN manual also was more detailed, but I lost it somewhere when I moved around (for the kids: it was a bunch of paper: pdf was not yet invented when TWINSPAN was published). I don't think that the actual TWINSPAN is easily extended beyond CA. Each step is a two-stage one-dimensional ordination on a current subset, where the first stage selects indicators and the second stage is polarized for the indicator species. The final split is based on site ordination and indicators are secondary (which we see in misclassifications if you try to use the provided key for the data that was classified in TWINSPAN). The polarization stage is particularly challenging when working with dissimilarities (PCO, NMDS). I don't think that the FORTRAN I have is completely impenetrable. I think the largest problem is the design principle: R code should run silently and return a result, but TWINSPAN prints when it goes on and returns only a part of the result. Incorporating that in R would need stripping most PRINT and WRITE and have subroutines to return useful data directly. I also wrote a small funny test on TWINSPAN principle, where the splitting and pre-defined pseudospecies where replaced with regression tree split. I'll send you a copy of that and the FORTRAN (IV, I think) code I have in a separate message. Cheers, Jari Oksanen _______________________________________________ R-sig-ecology mailing list Rfirstname.lastname@example.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology