Hi Dirk, thanks for the quick answer and to the many suggestions and correction you gave! I have now a better idea how to design the package.
On Sep 6, 2013, at 2:20 PM, Dirk Eddelbuettel <[email protected]> wrote: > > On 6 September 2013 at 13:46, Simon Zehnder wrote: > | Dear Rcpp-Users and Rcpp-Devels, > | > | this goes especially to Dirk and Romain, the developers of RcppBDT. > > Well its's mostly me for the scope of it, with numerous invaluable assists > from Romain. The released version is far behind the SVN version; > unfortunately the SVN version is far from release-ready. > For the next time I know better. Looking forward to the release. > | I am right now writing on a package for market microstructure data - > | usually large tick datasets with trade times and security symbols. > > Interesting. I do that for a living too. > Well, when doing research with the tick data it is sometimes a pain in the ass to match trades with the last quotes or match spot prices with future prices. In MM, research relies a lot on this and it consumes almost the most time. So I try to construct a package that can do most of it - and fast (my idea is to use openmp in C++ for ordering and filtering). Furthermore the most used tick data for research are either NYSE/wrds or for Bonds TRACE (regarding the spot markets). So the package should also deal with the special format of these to make it easier. There is a package 'highfrequency' which does something similar but for TRACE data it is not appropriate. > | I read the Rcpp Book about Modules and when starting as usual with S4 > | classes in R, the Modules came into my mind. As I am operating on datasets > | with usually around 1 Mio. rows I am wondering, if maybe the implementation > | via Modules is the better (better in regard to performance) one - in > > That is not usually the motivation for modules. > > "Straight up" functions, coded via inline or attributes, will be as fast. > With that I make my decision -> in R S4 classes. I do know these very well now. > | comparison to the usual S4 class implementation directly in R. With Modules > > "The usual S4 class implementation"? > > I have done R for over a decade and I still hardly use S4, so "the usual" is, > errmm, "unusual". > That is true. the S4 class system is not very near to OOP in C++ or Java and there are a lot of limitations, etc. It gives me though a good way to structure my code. With usual I meant: writing S4 classes in R - not defining them in C++: as far as I understood from the Modules chapter of your book - S4 classes are build automatically with Modules defined? Please correct me, if I am wrong. > | I am able to define all functions on the datasets in C++ - which I expect > | to be faster. Sorting the data and filtering the data in regard to > | dates/times are of course one of the main tasks to be covered. > > I have some trouble with the logic of your argument, but accept the end > result that Boost Date.Time is good for dates and times. :) > It's all about performance. Sorry for being imprecise. I expect sorting and filtering data in regard to dates/times in C++ is faster than doing it in R relying on POSIXlt/POSIXct (at least for datasets of larger size). > | In RcppBDT I read in the DESCRIPTION file, that the Boost Header Files for > | Date.Time must be included. > > "On the system on which RcppBDT is to be compiled" -- different from where it > is used (Windows, say). _No run-time depends_ Ah, the binaries that can be loaded for each system ... > | As I have to choose one library for Date/Time formats in C++, boost just > | seems so appropriate. But for usage in the Market Microstructure community > | it is impossible to expect them to install Boost on their system. > > Sorry but one has nothing to do with the other. > True, I just want my colleagues and other researchers in the field to be able to use it very easily. You give me the answer below. > Also please look at the CRAN package BH -- it _provides_ Boost headers for > this very purpose. Several packages already use it. > > | So, I would like to provide Boost already within the package. > > Just don't do it. Seriously. Use a "Depends: BH" > That is perfect! Thanks for this valuable information. > | As everything what you two do makes sense, I think I haven't grabbed yet the > | reason, why Boost is not provided in the RcppBDT right alongside. Is there > | something which restricts me from doing this? > > It's inefficient. We don't ship the headers of the C library either. > > It's just a Depends. > > Better to hand-off to the system, and with R, we can (at least for pure > template headers) via the BH package we created. > > | I am very thankful for thoughts and opinions on my idea and my question. > > Sure, no problem. > > Dirk > > -- > Dirk Eddelbuettel | [email protected] | http://dirk.eddelbuettel.com So, at the end: Thanks again for your valuable comments and tips. Best Simon _______________________________________________ Rcpp-devel mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
