On 2/26/2015 18:59, Dirk Eddelbuettel wrote:
On 26 February 2015 at 18:35, Matt D. wrote:
| Which incidentally brings me to the advice I usually give in these situations:
| unless you're absolutely dependent on the "features" of `Rcpp::NumericVector`
| just forget about it and replace all uses with the standard container
| `std::vector<double>`.

Note that this means you will always force a copy on the way in, and on the
way out.  That is a guaranteed performance penalty.

So with this you guarantee that someone else will always be able to write
faster code.  That said, I too like std::vector<>, but I also like arma::vec,
and those are (in the recent versions) lightweight.
Sure!
In the realm of all possible general cases with a particular focus on the use-cases not running into the discussed problem and not having to use `clone`: Fair enough -- this is the usual point made when discussing the advantages of shallow (over deep) copy semantics (or even the copy-on-write in-between). (In the case under consideration: Not avoidable, since `clone` already does the copy.)

In general, it's certainly a reasonable point that there is a trade-off to be made -- user-friendliness against the potential extra copies (not sure whether this has ever been measured -- as in counting the cases of `clone`-less existing code-bases where this was the actual performance bottleneck).

However, it still violates the POLS even for the users coming from pure R:
> f = function(v) { u = w; if (length(u) > 1) { u[1] = 123 }; u }
> w = rep(1, 3)
> f(w)
[1] 123   1   1
> w
[1] 1 1 1

In the "general" scenario it's not really user-friendly to abandon R (and well as C++) semantics by default.
Perhaps there's another solution -- continuing with the proxy aspect:

| found" at the moment -- and, as mentioned in another reply, you're apparently
| expected to Google around to find methods for solving problems you wouldn't

We have called these object "proxy models" since almost certainly 2010.  This
is referenced in the standard introductory paper (published peer-reviewed in
JSS in 2011) and included as a vignette in the package.

If you ignore the avilable documentation, then you may indeed have to "google
at random" as you claim.  I'd call that a self-inflicted wound.
Sure: At the same time, to give a somewhat related example, `std::vector<bool>` has been known to be a proxied container since at least 1998 -- when the original (pre-standard) STL's implementation has been partially adopted with the choice to specialize to what used to be a `bit_vector` instead of following the usual container requirements (with some earlier / pre-standard implementations available): https://www.sgi.com/tech/stl/bit_vector.html

Just the same, as of 2013 programmers still weren't 100% clear on the implications:
http://stackoverflow.com/questions/17794569/why-is-vectorbool-not-a-stl-container
I imagine blaming these programmers for "ignoring" ISO/IEC 14882 and advising them to use a search engine after the failure to read it in its entirety is certainly _an_ approach. After all, this design has been also chosen with optimization in mind (albeit with space-efficient allocation as the goal).

At the same time, nowadays the design choice made for `std::vector<bool>` is referred to (variably) as "totally broken", a "defect", a "mistake", or an "abomination":
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2160.html
https://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=98
http://isocpp.org/blog/2012/11/on-vectorbool
http://www.gotw.ca/publications/mill09.htm

Today `std::vector<bool>` is a textbook example of premature optimization in design (http://www.gotw.ca/gotw/050.htm) -- with all the usual caveats:

"std::vector<bool> forces a specific optimization on all users by enshrining it in the standard. That's not a good idea; different users have different requirements, and now all users of vector<bool> must pay the performance penalty even if they don't want or need the space savings.

Bottom line: If you care more about speed than you do about size, you shouldn't use std::vector<bool>. Instead, you should hack around this optimization by using a std::vector<char> or the like instead, which is unfortunate but still the best you can do."

Perhaps a better self-documenting code could attempt to help the users by having, say, `Rcpp::NumericVectorView` (or `Rcpp::NumericVectorProxy`) used for view (proxy) purposes -- and sticking with the default (expected by R -- as well as C++ -- programmers) for `Rcpp::NumericVector`?

(Alternatively, making `f(Rcpp::NumericVector & v)` signify the need for mutation, while keeping the expected copied-value behavior for `f(Rcpp::NumericVector v)`; or is implementing this inherently blocked by the way RCpp has to interoperate with R through SEXPs? Similarly for `f(std::vector<double> & v)` vs `f(std::vector<double> v)` vs `f(const std::vector<double> & v)`?).

As it stands, despite its name, `Rcpp::NumericVector` isn't really a numeric vector. As you rightly point out, it is a view (or a proxy). This is surprising for a type named `Rcpp::NumericVector`. I don't think it's unreasonable for the users to ask questions given the source of the astonishment. Just as it isn't surprising to see users confused about `std::vector<bool>` some decades after its behavior has been standardized. Rcpp is a relatively young project, perhaps this will change over time...

The trade-off in general case of "what's the good default" seems to be pitting copy-optimization against user-friendliness; the current (reference semantics) approach presumes an unstated assumption that the regular users will know about `Rcpp::NumericVector` being different (and the need to `clone`) and that performance experts won't be capable of optimizing their code if this isn't done for them.

Perhaps leaving regular users with a regularly behaving `Rcpp::NumericVector` by default -- while leaving performance experts the option to use `Rcpp::NumericVectorView` on as-needed basis -- would cut the amount of help required in the first place?

Granted: to an extent this is all academic -- chances are there is some code somewhere relying on this and this ship has sailed (http://xkcd.com/1172/) (unless there's a potential for redesign / backward incompatibility in the future).

That being said, as for the "what to do by default" advice, for anyone finding themselves in a need to `clone` -- `std::vector<double>` seems like the safer, better documented option.

Best,

Matt

Dirk


_______________________________________________
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Reply via email to