We need to fond the right compromise between bloating Rcpp (which is already quite huge:

wc src/* inst/include/**/** inst/include/* 2> /dev/null | tail -n1
   66180  784183 6425152 total

and support generic enough things.

I can see things like union and setdiff being generic enough (we already have unique btw).


Then for other things, being in another package is not that bad.
An after all, this is what Rcpp really is about: give others the tools.

Romain

Le 15/11/12 20:07, Søren Højsgaard a écrit :
Dear list

[>>] I am not sure if Hadleys remark below was an invitation to make a 
"wishlish", but I'll take the risk:

1) I have made several packages related to graphical models for multivariate data. Much 
of these packages deals with "book keeping": operations on sets of subsets of a 
finite set of variables, so in these packages there is much use of union(), setdiff(), 
etc and these function all heavily use match(). The same applies to unique() which is 
also based on match(). It would be very nice to have these in c++ form. Hence, with a c++ 
version of match() these should be low-hanging apples.

2) Also of relevance to the graphical model packages is a c++ version of 
aperm() for permuting an array.

3) There are operations on such arrays which I imagine could be conveniently 
made in the Rcpp-framwork. Consider a 2x2x2 contingency table with dimnames 
a,b,c. Call this table n(a,b,c). The all-two-factor log-linear model will have 
generators (a,b)(a,c)(c,b). Iterative proportional fitting works as follows: 
Let m(a,b,c) denotes the array of fitted values (at the current iteration). 
Then the update for the (c,b) generator is

  m(a,b,c) <- m(a,b,c) n(c,b)/m(c,b)

To do this one must have
  marginalization: n(a,b,c) -> n(b,c)
  permutation: n(b,c) -> n(c,b)
  division: n(c,b)/m(c,b)
  multiplication: m(a,b,c) * ( n(c,b)/m(c,b) )

I am aware that iterative proportional fitting is already implemented in 
loglin, but there are other kind of (graphical) models where similar updates 
are needed. In connection with message passing in Bayesian networks, one 
operation often needed is

  m(a,b,c) <- n(a,b) * n(c,b)

which will result in an array with dimensions (a,b,c). All of this stuff is 
implemented in the gRbase backage as R functions, and it would be very 
convenient to have these operations as c++ functions. In the gRbase 
implementation it is required that the arrays do have dimnames, and I guess it 
must be so also in c++.

I am perfectly aware that I should program these facilities in c++ using Rcpp, but I just 
can't resist to mention these wishes, in case they are "almost there" in c++.

Best regards
Søren










Hmmm - see http://cran.r-project.org/web/packages/fastmatch/index.html

Hadley

PS.  Would you be interested in a set of R functions that from a quick skim of 
the R sources that I think could be much much faster if implemented in Rcpp?


--
RStudio / Rice University
http://had.co.nz/



--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

R Graph Gallery: http://gallery.r-enthusiasts.com
`- http://bit.ly/SweN1Z : SuperStorm Sandy

blog:            http://romainfrancois.blog.free.fr
|- http://bit.ly/RE6sYH : OOP with Rcpp modules
`- http://bit.ly/Thw7IK : Rcpp modules more flexible

_______________________________________________
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Reply via email to