On Tue, Sep 27, 2011 at 8:52 PM, Dirk Eddelbuettel <[email protected]> wrote: > > Hi Michael, > > Thanks for posting here. There is a lot of meat in this post, so I'll try to > be brief. It is also late, so my concentration may not be at full tilt. > > I think in principle dlopen() should work. I never looked at exactly how R's > own dyn.load() is implemented but I suspect it uses dlopen() so you may get > this to work. dlopen() can have issues across platforms. Do you need to > this to work "whereever", ie Linux, OS X and even on that other unspeakable > platform? Much harder -- but as R has done the abstracting of it, I'd try as > much as possible to lean on R and its dynamic extensions. That simply > works... > > Also, it is sometimes good to remember that Rcpp is after all 'just' glue > between R and C++. It alters neither R not C++, it 'just' gets them a little > closer together which is A Good Thing (TM) in my book. With that, I often > attempt to build proofs of concept in C++ alone (ie work out header creation, > compile, link, and here for you dlopen() ...) before trying to do it from R. > > On 27 September 2011 at 20:18, Michael Malecki wrote: > | Already a colleague distills my long post: > | > | What we want is three R functions, that > | > | A. call a C++ program to read a Stan graphical model specification > | and generate C++ code for a class extending an abstract base class > | as well as factory methods to create/destroy instances, > | > | B. compile the C++ code into a dynamically linkable library, using > | header libraries Stan, Boost and Eigen, the versions of which we need > | to control, and > | > | C. create an an instance of the class implemented in the generated > | C++ code (using a pre-specified factory method returning the base > | class type), pass it data from R, run it, return the results to R. > > Looks good. > > | Our current prototype is set up to > | > | 1. use RCpp to call the C++ program that reads the model and > | generates C++ code, > > "use Rcpp to call the C++ program" is not correct language. > > Rcpp is an interface between C++ and R data structures. It is not a running / > working data broker, or compiler, or ... > > You can use Rcpp to interface your Stan, Boost, Eigen, ... headers. Rcpp > modules can help with interfaces (but is not yet the most robust solution, > though very promising) otherwise you can do it by hand too. > > | 2. exec an external compiler like g++ or clang++ on the generated code, and > | > | 3. use RCpp to call a C++ program that uses dlopen() to load the lib > | created in 2, pass data from R to the C++ program and pass data > | from C++ back to R. > > 2. and 3. are pretty close to what inline does for Rcpp. inline can be > extended to other headers as we have done for Armadillo, (parts of) GSL, > and Eigen. > > | Is there a cleaner way than dlopen() to link from within R cross > | platform? > > R's own dyn.load() which is what library() does, and which is what inline's > cfunction() and cxxfunction() do. > > | If we use dlopen(), can we dlopen()/dlclose() multiple times > | without leaking memory to support repeated A/B/C steps during > | model development and fitting? > > I see no reason why not. > > | Can we dlopen()/dlclose() and then dlopen() a function with > | the same name? Or should we generate new factory/class namespaces > | each time? > > Isn't dlopen() a C function? Protecting namespaces is good practice anyway. > > | On Tue, Sep 27, 2011 at 7:18 PM, Michael Malecki <[email protected]> wrote: > | > | > | 1 Some questions about R and dynamic libraries. What I describe below > | appears quite nonstandard, but I think makes sense. We could be way > off, in > | which case we'd like to know now; or there could just be some pitfalls > that > | others might have encountered that we would benefit from knowing about. > | > | 1.1 Overview: we are developing a package (Stan, as in Ulam) to perform > | Hamiltonian MCMC sampling from densities we can write down in C++, or > can > | use a graphical modeling language (something that looks somewhat like > BUGS > | describing nodes and their distributions). > | > | The graphical model is used to generate c++ code that depends on Boost, > | Eigen, and Stan, which implements automatic differentiation of these > | high-dimension hard-to-sample densities. A stan::mcmc::hmc sampler is > | instantiated on a model, which is a stan::mcmc::prob_grad_ad, and a > | stan::mcmc::sample contains a std::vector<double> real_params, a > | std::vector<int> int_params, and a double log_prob. > | > | That's all well and good, but this is definitely an unusual thing to be > | doing from R. > | > | Inline has been suggested, but this would mean that each model would > have > | to contain Rcpp hooks into R, to get data in and samples out. We would > | rather have a standalone stan model compiled as a shared object, that > Rstan > | would interact with. > > inline is a hood to get C, C++ or Fortran code into R. We adapted it to also > be workable with Rcpp, but inline does not impose Rcpp if you do not want > Rcpp. So in that sense what you write is not correct. > > | Rstan would implement, for a concrete example, an R_data_reader class, > with > | a public virtual method values("varname") returning a > std::vector<double> > | via the handy Rcpp::environment::env["varname"] operator (and > dimensions of > | real and integer parameters, using the sometimes-tricky but slick > | Rcpp::RObjects). > | > | 1.2 Rstan would interact with compiled stan shared objects via dlopen > (or > | its windowsy friend), with Rstan::sample being a call to the model's > sample > | method loaded with sample exposed to Rstan by dlsym(obj, "sample"). Is > this > | not done in R pacakge compiled code because it is especially > OS-dependent, > | (and how much are we talking?) > > I would always use packages as a first instance. They work. End of story. > Proof to the world why you need more ... > > | Is there some function in an R header that we don't know about that > wraps > | dlopen in a less fragile way? We do not intend to expose any of the > | compiled stan .so methods directly to R – they would all be stan objects > | (like a vector of samples described above), and we'd use Rcpp to wrap > the > | returns to R. > > dlopen is not referenced in the 'Writing R Extension' manual, and AFAIK not > part of the API. So the question is malformed. > > | We want to do this because it means stan shared objects can then > interact > | with python or something else (or dump files in and out) through reader > | classes. > | > | 2 Some Questions for the Rcpp crowd > | > | 2.1 We find no instances of this being done because most estimation is > | static model-fitting where only the data change. But if we want to > change a > | model, we have a new sampler to compile, but as far as R is concerned, > the > | only thing new is its name. The methods R needs to interact with are > going > | to be the same (see the reader class above), and separating this way > keeps > | R headers out of stan proper and only in the Rstan io. So first: does > this > | sound reasonable given the description above? > > I think you want to watch a bit more closely what Doug Bates is doing in > lme4eigen which seems to me (as an innocent bystander) to be related so > sophisticated model updates.
It seems to me that this is an overly complex approach but that may just be my only having had one cup of coffee so far. To me this feels like trying to take a particular approach that works for separately compiled C++ programs and wedging it into R in some way. It may be possible but what you will end up with may not be pretty. What Dirk is referring to in the lme4Eigen (capitalization is slightly different from what he wrote) is the combination of reference class objects in R and C++ objects from particular classes. The S4 classes in R and the more commonly used S3 method dispatch mechanism are a different design from what C++ or Java programmers are accustomed to. The recently introduced reference classes, which are not overly well documented - start with ?setRefClass in an R session, store references to data members and incorporate methods as part of the class. If you change a data member in an object from such a class the data member is changed in the object itself, not in a copy of that object. In lme4Eigen I have objects representing statistical models for which the parameter estimates are those that optimize a criterion, say maximizing the likelihood or the posterior density. When the reference class instance is constructed it does little more than generate a corresponding object in a C++ class and return an "external pointer" to that object which is one of the data members of the R reference class. So, in other words, there is an R object that, for all intents and purposes, just holds a pointer to an instance of a C++ class. All the action - setting a new value of the parameters, evaluating the objective function, etc., - takes place in the C++ class instance but becomes visible to R through methods defined on the R reference class object. There is a native mechanism in Rcpp, called Rcpp modules, that does this implicitly, but I ended up rolling my own mechanism for this because the internals of Rcpp modules got too complicated for me and there are some subtle issues related to serializing/unserializing R objects that, for me at least, were difficult to address in Rcpp modules. When you speak of graphical models I begin to think of MCMC. If that is indeed your goal, I think this would be a good mechanism because you can have hooks in R to query and update the state of an instance of a C++ class representing the state of the chain. However, as I said, I am just starting on the second cup of coffee and it is not unlikely that I missed the point entirely. > | 2.2 The inline package has been suggested, which would take our stan c++ > | code and compile it with R CMD SHLIB. If we generated stan-c++ code that > | contained Rcpp headers and methods directly, we could inline it, right? > > Inline is orthogonal to how your lay out organise your code. It is a helper. > It doesn't impose anything, really -- it "merely" makes Rcpp easier for > experimentation as it already did for C, C++ and Fortran without the added > Rcpp API. > > | 2.3 R CMD SHLIB would have to be aware of our other headers (Eigen, > boost, > | stan). For certain version dependencies (eigen has incremented 7 version > | numbers, sometimes breaking backward compatibility, since we started), > we > | plan to distribute stan with these libraries local. But more generally, > is > | there a reason for or (as we are inclined) against using R CMD SHLIB > | (inline) to build stan shared objects? > > Just distribute source and people rebuild. Boost has changed may more often, > and people have gotten used to the need for rebuilds. > > Dirk > > | 3 I (or one of my much more c++ savvy colleagues) will be happy to > provide > | more details. > | > | > | Thanks for your input! > | > | Michael Malecki > | > | > | > | ---------------------------------------------------------------------- > | _______________________________________________ > | Rcpp-devel mailing list > | [email protected] > | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel > -- > New Rcpp master class for R and C++ integration is scheduled for > San Francisco (Oct 8), more details / reg.info available at > http://www.revolutionanalytics.com/products/training/public/rcpp-master-class.php > _______________________________________________ > Rcpp-devel mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel _______________________________________________ Rcpp-devel mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
