Thanks for your reply!

> To answer your implicit question, VECTOR_ELT() unclasses the nodes
> because it doesn't go through the stats:::`[[.dendrogram` method,
> instead dereferencing the data pointer directly.

That�s roughly what I had suspected�I appreciate the clarification.

To your point on other *apply functions, I wasn�t actually aware of that 
implementation, but it�s definitely a smarter way to do it. I�ll try later 
today/tomorrow to incorporate that method; it seems much better and more 
future-proof than my approach. Definitely agree with you with respect to cases 
where unclass(node)[[i]] is invalid. It may be slightly slower due to having to 
rely on R method dispatch, but I think the benefits outweigh the drawbacks in 
this case.

> Would you mind telling me more about the following case?

> > if(!(inherits(res,c('dendrogram', 'list')))){
> >   res1 <- lapply(unclass(node), \(x) x)
> > }

> If you're looking to improve the performance, there might be a way to
> avoid the wrapper and this lapply(unclass(node), identity) call in it.

This was a product of trying to get performance to be the same as in the 
current method�I agree that it�s probably not the best way to do this. The 
use-case is when you apply a function to the dendrogram that doesn�t return a 
dendrogram object. One example is the one from reg-tests-1c.R:

```
D <- as.dendrogram(hclust(dist(cbind(setNames(c(0,1,4), LETTERS[1:3])))))

dendrapply(D, labels))



# Expected result:

#
# [[1]]

# �C�
#
# [[2]]
# [[2]][[1]]
# �A�

#

# [[2]][[2]]

# �B�

#

# [[3]]

# �C�
```

Applying labels to the root node returns c(�C�, �A�, �B�), and if we convert 
that to a list, we get a length 3 list of length 1 character vectors. However, 
when traversing the dendrogram pre-order, this would break things, since then 
the first entry of the node is no longer a dendrogram object, it�s been 
replaced by a character vector. I had written it this way with the unclass so 
that I could replace entries that needed to be evaluated at child nodes with 
child nodes. For example, in this instance, after evaluating the function at 
the root, the tree would look like:

```
[[1]]
<unclassed D[[1]]>

[[2]]
<unclassed D[[2]]>

[[3]]
�B�
```

To answer the question on why there�s an lapply(�, identity) call, I think I 
ended up doing it this way because I was having some issues with not getting 
the elements to populate correctly from the dendrogram. Looking back on it now, 
there�s definitely an easier way to do this that isn�t so hard to understand 
code-wise�.
```
if(!is.leaf(node)){
      if(!is.list(res)){
        res <- as.list(res)
      }
      res[seq_along(node)] <- node
    }
```
That should perform almost identically and make more sense, with the added 
benefit that it doesn�t unclass the child nodes, so (when I also incorporate 
the other fix you suggested) we shouldn�t have any unexpected performance from 
functions relying on a hypothetical `subclass-of-dendrogram`. This 
implementation is also slightly faster due to no lapply call and is.list() over 
inherits(�).

Result after applying to root node with this approach:
```
[[1]]
D[[1]]

[[2]]
D[[2]]

[[3]]
�B�
```
Classes of D[[1]] and D[[2]] are preserved for future evaluations.

Thanks for pointing this out, I�ll incorporate this into the code when I check 
the `[[` case later. If you have any other questions/comments/suggestions I 
would love to hear them! Happy to clarify further as well if I didn�t answer 
your questions fully.

Sincerely,
Aidan

-----------------------
Aidan Lakshman (he/him)<https://www.ahl27.com/>
Doctoral Candidate, Wright Lab<https://www.wrightlabscience.com/>
University of Pittsburgh School of Medicine
Department of Biomedical Informatics
ah...@pitt.edu
(724) 612-9940


From: Ivan Krylov <krylov.r...@gmail.com>
Date: Thursday, March 2, 2023 at 09:47
To: Lakshman, Aidan H <ah...@pitt.edu>
Cc: R-devel@r-project.org <R-devel@r-project.org>
Subject: Re: [Rd] `dendrapply` Enhancements
Dear Aidan Lakshman,

To answer your implicit question, VECTOR_ELT() unclasses the nodes
because it doesn't go through the stats:::`[[.dendrogram` method,
instead dereferencing the data pointer directly.

Other *apply functions in base R create a call to the `[[` operator,
letting the language dispatch the generic call, allowing the method to
assign a class to the return value. The following example is taken from
src/main/apply.c:do_lapply():

// prepare a call to FUN(X[[i]], ...)

    SEXP isym = install("i");
    SEXP tmp = PROTECT(lang3(R_Bracket2Symbol, X, isym));
    SEXP R_fcall = PROTECT(lang3(FUN, tmp, R_DotsSymbol));
    MARK_NOT_MUTABLE(R_fcall);

// inside the loop: evaluate the call

        tmp = R_forceAndCall(R_fcall, 1, rho);

Not sure which way is faster, but it may make sense to try, and it's
probably more correct in (contrived) cases where unclass(node)[[i]] is
invalid because it relies on a hypothetical `[[.subclass-of-dendrogram`
to restore some invariants.

Would you mind telling me more about the following case?

> if(!(inherits(res,c('dendrogram', 'list')))){
>  res1 <- lapply(unclass(node), \(x) x)
> }

If you're looking to improve the performance, there might be a way to
avoid the wrapper and this lapply(unclass(node), identity) call in it.

--
Best regards,
Ivan

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to