On 2/26/2015 18:59, Dirk Eddelbuettel wrote:
On 26 February 2015 at 18:35, Matt D. wrote:
| Which incidentally brings me to the advice I usually give in these situations:
| unless you're absolutely dependent on the "features" of `Rcpp::NumericVector`
| just forget about it and replace all uses with the standard container
| `std::vector<double>`.
Note that this means you will always force a copy on the way in, and on the
way out. That is a guaranteed performance penalty.
So with this you guarantee that someone else will always be able to write
faster code. That said, I too like std::vector<>, but I also like arma::vec,
and those are (in the recent versions) lightweight.
Sure!
In the realm of all possible general cases with a particular focus on
the use-cases not running into the discussed problem and not having to
use `clone`: Fair enough -- this is the usual point made when discussing
the advantages of shallow (over deep) copy semantics (or even the
copy-on-write in-between).
(In the case under consideration: Not avoidable, since `clone` already
does the copy.)
In general, it's certainly a reasonable point that there is a trade-off
to be made -- user-friendliness against the potential extra copies (not
sure whether this has ever been measured -- as in counting the cases of
`clone`-less existing code-bases where this was the actual performance
bottleneck).
However, it still violates the POLS even for the users coming from pure R:
> f = function(v) { u = w; if (length(u) > 1) { u[1] = 123 }; u }
> w = rep(1, 3)
> f(w)
[1] 123 1 1
> w
[1] 1 1 1
In the "general" scenario it's not really user-friendly to abandon R
(and well as C++) semantics by default.
Perhaps there's another solution -- continuing with the proxy aspect:
| found" at the moment -- and, as mentioned in another reply, you're apparently
| expected to Google around to find methods for solving problems you wouldn't
We have called these object "proxy models" since almost certainly 2010. This
is referenced in the standard introductory paper (published peer-reviewed in
JSS in 2011) and included as a vignette in the package.
If you ignore the avilable documentation, then you may indeed have to "google
at random" as you claim. I'd call that a self-inflicted wound.
Sure: At the same time, to give a somewhat related example,
`std::vector<bool>` has been known to be a proxied container since at
least 1998 -- when the original (pre-standard) STL's implementation has
been partially adopted with the choice to specialize to what used to be
a `bit_vector` instead of following the usual container requirements
(with some earlier / pre-standard implementations available):
https://www.sgi.com/tech/stl/bit_vector.html
Just the same, as of 2013 programmers still weren't 100% clear on the
implications:
http://stackoverflow.com/questions/17794569/why-is-vectorbool-not-a-stl-container
I imagine blaming these programmers for "ignoring" ISO/IEC 14882 and
advising them to use a search engine after the failure to read it in its
entirety is certainly _an_ approach.
After all, this design has been also chosen with optimization in mind
(albeit with space-efficient allocation as the goal).
At the same time, nowadays the design choice made for
`std::vector<bool>` is referred to (variably) as "totally broken", a
"defect", a "mistake", or an "abomination":
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2160.html
https://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=98
http://isocpp.org/blog/2012/11/on-vectorbool
http://www.gotw.ca/publications/mill09.htm
Today `std::vector<bool>` is a textbook example of premature
optimization in design (http://www.gotw.ca/gotw/050.htm) -- with all the
usual caveats:
"std::vector<bool> forces a specific optimization on all users by
enshrining it in the standard. That's not a good idea; different users
have different requirements, and now all users of vector<bool> must pay
the performance penalty even if they don't want or need the space savings.
Bottom line: If you care more about speed than you do about size, you
shouldn't use std::vector<bool>. Instead, you should hack around this
optimization by using a std::vector<char> or the like instead, which is
unfortunate but still the best you can do."
Perhaps a better self-documenting code could attempt to help the users
by having, say, `Rcpp::NumericVectorView` (or
`Rcpp::NumericVectorProxy`) used for view (proxy) purposes -- and
sticking with the default (expected by R -- as well as C++ --
programmers) for `Rcpp::NumericVector`?
(Alternatively, making `f(Rcpp::NumericVector & v)` signify the need for
mutation, while keeping the expected copied-value behavior for
`f(Rcpp::NumericVector v)`; or is implementing this inherently blocked
by the way RCpp has to interoperate with R through SEXPs? Similarly for
`f(std::vector<double> & v)` vs `f(std::vector<double> v)` vs `f(const
std::vector<double> & v)`?).
As it stands, despite its name, `Rcpp::NumericVector` isn't really a
numeric vector. As you rightly point out, it is a view (or a proxy).
This is surprising for a type named `Rcpp::NumericVector`. I don't think
it's unreasonable for the users to ask questions given the source of the
astonishment. Just as it isn't surprising to see users confused about
`std::vector<bool>` some decades after its behavior has been
standardized. Rcpp is a relatively young project, perhaps this will
change over time...
The trade-off in general case of "what's the good default" seems to be
pitting copy-optimization against user-friendliness; the current
(reference semantics) approach presumes an unstated assumption that the
regular users will know about `Rcpp::NumericVector` being different (and
the need to `clone`) and that performance experts won't be capable of
optimizing their code if this isn't done for them.
Perhaps leaving regular users with a regularly behaving
`Rcpp::NumericVector` by default -- while leaving performance experts
the option to use `Rcpp::NumericVectorView` on as-needed basis -- would
cut the amount of help required in the first place?
Granted: to an extent this is all academic -- chances are there is some
code somewhere relying on this and this ship has sailed
(http://xkcd.com/1172/) (unless there's a potential for redesign /
backward incompatibility in the future).
That being said, as for the "what to do by default" advice, for anyone
finding themselves in a need to `clone` -- `std::vector<double>` seems
like the safer, better documented option.
Best,
Matt
Dirk
_______________________________________________
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel