Re: [R-sig-eco] The final result of TWINSPAN
Thanks Jari, My original thought was to write a wrapper for the original FORTRAN code by replacing the file read of data with data passed from R, and then bringing in the results in a list. That would allow sizing the arrays at run-time and eliminating fixed array sizes. I have a copy of the FORTRAN code that Petr Smilauer modified for simplified input/outputand that helped. Still, it ultimately appeared pretty messy (but still might be the best route), so I tried separating out the subroutines and calling them individually from R. From there I tried replacing some of the subroutines with native R to lower overhead. But in the end I just couldn't understand the code well enough to make it work. So then I thought I should write write a totally transparent version in native R, even if it doesn't replicate the original. On the down side people can say it's not correct; on the upside it's open source and people can evaluate it and modify it as they see fit. So, if there is interest I might post the code and examples on my web page and let somebody else have it to run with. Dave Jari Oksanen wrote: On 27/04/11 00:40 AM, "Dave Roberts" wrote: Earlier this year on an (undoubtedly ill-advised) lark I coded up an R version of TWINSPAN. It's far from a polished package at this point, but the code does run. One of the interesting features is that you can partition a PCO or NMDS in addition to the traditional CA. To be clear, I am not a TWINSPAN fan either, but I wanted it for a methods paper I was working on. The problem is that I based the code on Hill, Bunch & Shaw (1975, J of Ecol 63:597-613) which is what I had available. Apparently the algorithm in the commercial TWINSPAN is significantly modified from the original, but I couldn't find a description of the actual algorithm anywhere in the literature. It is probably described in the User Manual of the software, but I was not sufficiently motivated to chase down a copy. I do have a copy of the FORTRAN code, but it was apparently written in FORTRAN II, and is basically inscrutable, even to an old FORTRAN dog like me. So, if somebody has a clear description of the actual algorithm (and I think it is disturbing that I could not find one), it would be possible to code it up in native R. The alternative, to write a wrapper for the original FORTRAN code is not a trivial task. I gave it a couple of days and gave up. Dave, Hill, Bunch & Shaw describe the general idea of TWINSPAN, but the implementation is more complicated. Martin Kent and Paddy Coker do a great job of explaining the twists in their book ("vegetation description and analysis: a practical approach"). If I remember correctly, the TWINSPAN manual also was more detailed, but I lost it somewhere when I moved around (for the kids: it was a bunch of paper: pdf was not yet invented when TWINSPAN was published). I don't think that the actual TWINSPAN is easily extended beyond CA. Each step is a two-stage one-dimensional ordination on a current subset, where the first stage selects indicators and the second stage is polarized for the indicator species. The final split is based on site ordination and indicators are secondary (which we see in misclassifications if you try to use the provided key for the data that was classified in TWINSPAN). The polarization stage is particularly challenging when working with dissimilarities (PCO, NMDS). I don't think that the FORTRAN I have is completely impenetrable. I think the largest problem is the design principle: R code should run silently and return a result, but TWINSPAN prints when it goes on and returns only a part of the result. Incorporating that in R would need stripping most PRINT and WRITE and have subroutines to return useful data directly. I also wrote a small funny test on TWINSPAN principle, where the splitting and pre-defined pseudospecies where replaced with regression tree split. I'll send you a copy of that and the FORTRAN (IV, I think) code I have in a separate message. Cheers, Jari Oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] The final result of TWINSPAN
Dear Zoltan, Thanks for the note. The R function I wrote does in fact follow the Roleček et al protocol, and that's partly what motivated the idea to write it up. Lubomír Tichý, Petr Smilauer, and Laco Mucina have all contributed information in the development, but I've still been stymied by the lack of solid information on the actual algorithm. I think it is quite possible to write a function that operates on the principle of TWINSPAN, following Roleček et al, but writing a function that exactly matches the output from the commercial package may prove to be too much trouble. Thanks, Dave Zoltan Botta-Dukat wrote: Dear Dave, This modified version of TWINSPAN may be interesting for you when you compare methods: Modified TWINSPAN classification in which the hierarchy respects cluster heterogeneity Jan Roleček, Lubomír Tichý, David Zelený, Milan Chytrý 2009 Modified TWINSPAN classification in which the hierarchy respects cluster heterogeneity Journal of Vegetation Science 20(4): 596–602 http://onlinelibrary.wiley.com/doi/10./j.1654-1103.2009.01062.x/abstract Zoltan 2011.04.26. 23:40 keltezéssel, Dave Roberts írta: Dear List, Earlier this year on an (undoubtedly ill-advised) lark I coded up an R version of TWINSPAN. It's far from a polished package at this point, but the code does run. One of the interesting features is that you can partition a PCO or NMDS in addition to the traditional CA. To be clear, I am not a TWINSPAN fan either, but I wanted it for a methods paper I was working on. The problem is that I based the code on Hill, Bunch & Shaw (1975, J of Ecol 63:597-613) which is what I had available. Apparently the algorithm in the commercial TWINSPAN is significantly modified from the original, but I couldn't find a description of the actual algorithm anywhere in the literature. It is probably described in the User Manual of the software, but I was not sufficiently motivated to chase down a copy. I do have a copy of the FORTRAN code, but it was apparently written in FORTRAN II, and is basically inscrutable, even to an old FORTRAN dog like me. So, if somebody has a clear description of the actual algorithm (and I think it is disturbing that I could not find one), it would be possible to code it up in native R. The alternative, to write a wrapper for the original FORTRAN code is not a trivial task. I gave it a couple of days and gave up. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] The final result of TWINSPAN
Dear Dave, This modified version of TWINSPAN may be interesting for you when you compare methods: Modified TWINSPAN classification in which the hierarchy respects cluster heterogeneity Jan Roleček, Lubomír Tichý, David Zelený, Milan Chytrý 2009 Modified TWINSPAN classification in which the hierarchy respects cluster heterogeneity Journal of Vegetation Science 20(4): 596–602 http://onlinelibrary.wiley.com/doi/10./j.1654-1103.2009.01062.x/abstract Zoltan 2011.04.26. 23:40 keltezéssel, Dave Roberts írta: Dear List, Earlier this year on an (undoubtedly ill-advised) lark I coded up an R version of TWINSPAN. It's far from a polished package at this point, but the code does run. One of the interesting features is that you can partition a PCO or NMDS in addition to the traditional CA. To be clear, I am not a TWINSPAN fan either, but I wanted it for a methods paper I was working on. The problem is that I based the code on Hill, Bunch & Shaw (1975, J of Ecol 63:597-613) which is what I had available. Apparently the algorithm in the commercial TWINSPAN is significantly modified from the original, but I couldn't find a description of the actual algorithm anywhere in the literature. It is probably described in the User Manual of the software, but I was not sufficiently motivated to chase down a copy. I do have a copy of the FORTRAN code, but it was apparently written in FORTRAN II, and is basically inscrutable, even to an old FORTRAN dog like me. So, if somebody has a clear description of the actual algorithm (and I think it is disturbing that I could not find one), it would be possible to code it up in native R. The alternative, to write a wrapper for the original FORTRAN code is not a trivial task. I gave it a couple of days and gave up. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] The final result of TWINSPAN
On 27/04/11 00:40 AM, "Dave Roberts" wrote: > > Earlier this year on an (undoubtedly ill-advised) lark I coded up > an R version of TWINSPAN. It's far from a polished package at this > point, but the code does run. One of the interesting features is that > you can partition a PCO or NMDS in addition to the traditional CA. To be > clear, I am not a TWINSPAN fan either, but I wanted it for a methods > paper I was working on. > > The problem is that I based the code on Hill, Bunch & Shaw (1975, > J of Ecol 63:597-613) which is what I had available. Apparently the > algorithm in the commercial TWINSPAN is significantly modified from the > original, but I couldn't find a description of the actual algorithm > anywhere in the literature. It is probably described in the User Manual > of the software, but I was not sufficiently motivated to chase down a > copy. I do have a copy of the FORTRAN code, but it was apparently > written in FORTRAN II, and is basically inscrutable, even to an old > FORTRAN dog like me. > > So, if somebody has a clear description of the actual algorithm > (and I think it is disturbing that I could not find one), it would be > possible to code it up in native R. The alternative, to write a wrapper > for the original FORTRAN code is not a trivial task. I gave it a couple > of days and gave up. Dave, Hill, Bunch & Shaw describe the general idea of TWINSPAN, but the implementation is more complicated. Martin Kent and Paddy Coker do a great job of explaining the twists in their book ("vegetation description and analysis: a practical approach"). If I remember correctly, the TWINSPAN manual also was more detailed, but I lost it somewhere when I moved around (for the kids: it was a bunch of paper: pdf was not yet invented when TWINSPAN was published). I don't think that the actual TWINSPAN is easily extended beyond CA. Each step is a two-stage one-dimensional ordination on a current subset, where the first stage selects indicators and the second stage is polarized for the indicator species. The final split is based on site ordination and indicators are secondary (which we see in misclassifications if you try to use the provided key for the data that was classified in TWINSPAN). The polarization stage is particularly challenging when working with dissimilarities (PCO, NMDS). I don't think that the FORTRAN I have is completely impenetrable. I think the largest problem is the design principle: R code should run silently and return a result, but TWINSPAN prints when it goes on and returns only a part of the result. Incorporating that in R would need stripping most PRINT and WRITE and have subroutines to return useful data directly. I also wrote a small funny test on TWINSPAN principle, where the splitting and pre-defined pseudospecies where replaced with regression tree split. I'll send you a copy of that and the FORTRAN (IV, I think) code I have in a separate message. Cheers, Jari Oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] The final result of TWINSPAN
Dear List, Earlier this year on an (undoubtedly ill-advised) lark I coded up an R version of TWINSPAN. It's far from a polished package at this point, but the code does run. One of the interesting features is that you can partition a PCO or NMDS in addition to the traditional CA. To be clear, I am not a TWINSPAN fan either, but I wanted it for a methods paper I was working on. The problem is that I based the code on Hill, Bunch & Shaw (1975, J of Ecol 63:597-613) which is what I had available. Apparently the algorithm in the commercial TWINSPAN is significantly modified from the original, but I couldn't find a description of the actual algorithm anywhere in the literature. It is probably described in the User Manual of the software, but I was not sufficiently motivated to chase down a copy. I do have a copy of the FORTRAN code, but it was apparently written in FORTRAN II, and is basically inscrutable, even to an old FORTRAN dog like me. So, if somebody has a clear description of the actual algorithm (and I think it is disturbing that I could not find one), it would be possible to code it up in native R. The alternative, to write a wrapper for the original FORTRAN code is not a trivial task. I gave it a couple of days and gave up. -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 On 04/14/2011 01:57 AM, Jari Oksanen wrote: On 14/04/11 10:37 AM, "Yong Zhang"<2010202...@njau.edu.cn> wrote: Dear all, I conducted the two-way indicator species analysis using TWINSPAN program, and following is the final result: 0111 00011011 011000111 01001001 I have to certify my analysis, I want to classify the above 24 sampling sites into 3 major groups based on 7 biotic metrics. The name of my 24 samples could be site1 to site24, from the left to the right, and I set the cut levels 0, 2, 5, 10, 20, the maximum level of divisions: 6, and maximum group size for division:3 . Now, my question is whether my setting is correct? And how should I classify these sites into 3 groups accoding to this final result? Dear Yong Zhang, This is not an R issue, because there is no TWINSPAN in R. However, the answer to your question is that strictly speaking you cannot group your data into three major groups with TWINSPAN. TWINSPAN is a bisection method so that first division gives you two groups, and second splits each of these into two groups so that the next choice is to have four groups. However, in this case one of the groups was so small (3 plots were split off from other in the first division, and then these were split into groups of 2 plots and 1 plot) that you probably can ignore the second division of the small group. If your goal was as vague as wanting to classify 24 sites into 3 major groups you could do better than use TWINSPAN: what's the problem with proper classification methods in R? Moreover, have you checked that your "biotic metrics" suit to the pseudospecies cut level concept of TWINSPAN? Cheers, jari oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] The final result of TWINSPAN
Dear Yong, This *is* a list about R. Your question has *nothing* to do with R. Please ask such questions elsewhere, like the ORDNEWS list. On Thu, 2011-04-14 at 15:37 +0800, Yong Zhang wrote: > I conducted the two-way indicator species analysis using TWINSPAN > program, and following is the final result: Being painfully aware of the output TWINSPAN generates, I'm certain this isn't all that TWINSPAN output, but I presume it is the binary indicator for the groups/splits from the output? > 0111 > 00011011 > 011000111 > 01001001 > > I have to certify my analysis, I want to classify the above 24 > sampling sites into 3 major groups based on 7 biotic metrics. The name > of my 24 samples could be site1 to site24, from the left to the right, > and I set the cut levels 0, 2, 5, 10, 20, the maximum level of > divisions: 6, and maximum group size for division:3 . Then you are out of luck, without using some other means of pruning back divisions. TWINSPAN implements a binary split process, and without other intervention you get 0, 2, 4, 8, 16 groups etc. You /can/ post-process the results of TWINSPAN using another DOS application to merge lower nodes of certain, specific branches into higher nodes to get different numbers of groups than 2, 4, 8, ..., but I forget the name of this DOS application at the moment - I used to teach a computer class using this so I have the details somewhere so will see if I can hunt those old notes out. My interpretation of the above would be that you could just ignore the split that cuts the 3 extreme right samples into two groups so you have groups consisting of the first 11 samples, the next 10 in another group, and the final 3 samples in a group. But that is without seeing any of the other output, so I don't know if the CA clustering technique used is doing silly things splitting your main group of samples - i.e. are there samples close to the origin but on opposite sides that are similar to one another but which have been pushed into separate groups? Hopefully the above helps, but please direct further and future requests for help with non-R applications to more appropriate lists. G > Now, my question is whether my setting is correct? And how should I > classify these sites into 3 groups accoding to this final result? > > Thanks in advance for your time and suggestion. > > Kind wishes, > > Yong > > > 2011-04-14 > > > > ZHANG Yong > Lab of aquatic insects & stream ecology > Dept.of Entonology, Nanjing Agricultural University > Nanjing, 210095,China > Phone number: (+86) -25-84395241 > E-mail:2010202...@njau.edu.cn > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] The final result of TWINSPAN
On 14/04/11 10:37 AM, "Yong Zhang" <2010202...@njau.edu.cn> wrote: > Dear all, > > I conducted the two-way indicator species analysis using TWINSPAN program, and > following is the final result: > > 0111 > 00011011 > 011000111 > 01001001 > > I have to certify my analysis, I want to classify the above 24 sampling sites > into 3 major groups based on 7 biotic metrics. The name of my 24 samples could > be site1 to site24, from the left to the right, and I set the cut levels 0, 2, > 5, 10, 20, the maximum level of divisions: 6, and maximum group size for > division:3 . > > Now, my question is whether my setting is correct? And how should I classify > these sites into 3 groups accoding to this final result? Dear Yong Zhang, This is not an R issue, because there is no TWINSPAN in R. However, the answer to your question is that strictly speaking you cannot group your data into three major groups with TWINSPAN. TWINSPAN is a bisection method so that first division gives you two groups, and second splits each of these into two groups so that the next choice is to have four groups. However, in this case one of the groups was so small (3 plots were split off from other in the first division, and then these were split into groups of 2 plots and 1 plot) that you probably can ignore the second division of the small group. If your goal was as vague as wanting to classify 24 sites into 3 major groups you could do better than use TWINSPAN: what's the problem with proper classification methods in R? Moreover, have you checked that your "biotic metrics" suit to the pseudospecies cut level concept of TWINSPAN? Cheers, jari oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology