Hi Aidan, I think you are on the right email list. I'm not R-core, but this looks like an interesting/meaningful/significant contribution to base R. I'm not sure what the original dendrapply looks like in terms of code style (variable names/white space formatting/etc) but in my experience it is important that your code contribution makes minimal changes in that area. Did you hear about the R project sprint 2023? https://contributor.r-project.org/r-project-sprint-2023/ Your work falls into the "new developments" category so I think you could apply for that funding to participate. Toby
On Fri, Feb 24, 2023 at 3:47 AM Lakshman, Aidan H <ah...@pitt.edu> wrote: > Hi everyone, > > My apologies if this isn’t the right place to submit this—I’m new to the > R-devel community and still figuring out what is where. > > If people want to skip my writeup and just look at the code, I’ve made a > repository for it here: > https://github.com/ahl27/new_dendrapply/tree/master. I’m not quite sure > how to integrate it into a fork of R-devel; the package structure is > different from what I’m used to. > > I had written a slightly improved version of dendrapply for one of my > research projects, and my advisor encouraged me to submit it to the R > project. It took me longer than I expected, but I’ve finally gotten my > implementation to be a drop-in replacement for `stats::dendrapply`. The man > page for `stats::dendrapply` says “The implementation is somewhat > experimental and suggestions for enhancements (or nice examples of usage) > are very welcome,” so I figured this had the potential to be a worthwhile > contribution. I wanted to send it out to R-devel to see if this was > something worth pursuing as an enhancement to R. > > The implementation I have is based in C, which I understand implies an > increased burden of maintenance over pure R code. However, it does come > with the following benefits: > > - Completely eliminates recursion, so no memory overhead from function > calls or possibility of stack overflows (this was a major issue reported on > some of the functions in one of our Bioconductor packages that previously > used `dendrapply`). > - Modest runtime improvement, around 2x on my computer (2021 MBP, 32GB > RAM). I’m relatively confident this could be optimized more. > - Seemingly significant reduction in memory reduction, still working on a > robust benchmark. Suggestions for the best way to do that are welcome. > - Support for applying functions with an inorder traversal (as in > `stats::dendrapply`) as well as using a postorder traversal. > > This implementation was tested manually as well as running all the unit > tests in `dendextend`, which comprises a lot of applications of > `dendrapply`. > > The postorder traversal would be a significant new functionality to > dendrapply, as it would allow for functions that use the child nodes to > correctly execute. A toy example of this is something like: > ``` > exFunc <- function(x){ > attr(x, 'newA') <- 'a' > if(is.null(attr(x, 'leaf'))){ > cat(attr(x[[1]], 'newA'), attr(x[[2]], 'newA')) > cat('\n') > } > x > }) > > dendrapply(dend, exFunc) > ``` > > With the current version of dendrapply, this prints nothing, but the > postorder traversal version will print ‘a’ twice for each internal branch. > If this would be a worthwhile addition, I can refactor the code for brevity > and add a `how=c("in.order", "post.order")`, with the default value > “in.order” to maintain backwards compatibility. A preorder traversal > version should also be possible, I just haven’t gotten to it yet. > > I think the runtime could be optimized more as well. > > Thank you in advance for looking at my code and offering feedback; I’m > excited at the possibility of helping contribute to the R project! I’m > happy to discuss more either here, on GitHub, or on the R Contributors > Slack. > > Sincerely, > Aidan Lakshman > > ----------------------- > Aidan Lakshman (he/him)<https://www.ahl27.com/> > Doctoral Candidate, Wright Lab<https://www.wrightlabscience.com/> > University of Pittsburgh School of Medicine > Department of Biomedical Informatics > ah...@pitt.edu > (724) 612-9940 > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel