That is similar to a path i've followed in Rcpp11/Rcpp14. What's really missing in R is api access to strings, e.g testing for equality of two CHARSXP, comparing them, ...
This causes all sorts of problems with dplyr. Romain > Le 16 déc. 2014 à 06:00, Jeroen Ooms <[email protected]> a écrit : > >> On Thu, Dec 11, 2014 at 12:24 PM, Jeroen Ooms <[email protected]> >> wrote: >> I'm interfacing a c++ library which assumes strings are UTF-8. However >> strings from R can have various encodings. It's not clear to me how I >> need to account for that in Rcpp. > > Follow-up on this: from what I have found, there is currently no > string type that is unambiguous across platforms and locales (other > than the actual STRSXP). If the native locale uses UTF8 than all is > fine, but we can not assume that in R. Here is a little script that > illustrates the various combinations I tried and the results on > Windows: https://gist.github.com/jeroenooms/9edf97f873f17a4ce5d3. > > Assuming that each of these cases are intended behavior, perhaps we > could introduce an additional string type e.g. Rcpp::UTF8String. The > mapping from STRSXP to Rcpp::UTF8String would use > translateCharUTF8(STRING_ELT(x, 0)) and the mapping Rcpp::UTF8String > back to STRSXP would use SET_STRING_ELT(out, 0, mkCharCE(olds, > CE_UTF8)). That would allow for defining c++ functions operating on > UTF8 strings which will work as expected across platforms and locales. > _______________________________________________ > Rcpp-devel mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel _______________________________________________ Rcpp-devel mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
