Yes, encoding is something we have not dealt with yet.
This is not high on my priority list but there are ways to alter that list, e.g. find a way to sponsor the development of that particular feature through funding or even crowdfunding if enough people are interested in having the feature and willing to pay for it.
Otherwise this will have to wait someone with the skills develops it. Romain Le 01/08/13 18:57, Ned Harding a écrit :
Just to follow up in case anyone other than me is using Unicode in R: Rcpp does not support Unicode, or really any encoding other than 7 bit ascii. Internally R marks every string with an encoding, typically UTF8, Latin1 or ASCII. When using as<string> Rcpp just copies the bytes over ignoring the encoding. This means that if you take a string that was utf8 and then later wrap it again, the encoding info is lost and the characters get corrupted. In particular, never use Rcpp::as<std::wstring> because the string gets widened without being converted to Unicode. If you want (or need) to support Unicode text in an R plugin, you need to use Rf_translateCharUTF8(…) to get a string. Regardless of what encoding it was originally, R will make sure it is encoded as UTF-8. In order to set a string into a R object you have to use the corresponding Rf_mkCharLenCE(p, len, CE_UTF8) function – which tells R that the data you have is UTF-8. Ned. *From:*rcpp-devel-boun...@lists.r-forge.r-project.org [mailto:rcpp-devel-boun...@lists.r-forge.r-project.org] *On Behalf Of *Ned Harding *Sent:* Wednesday, June 26, 2013 11:54 AM *To:* rcpp-devel@lists.r-forge.r-project.org *Subject:* [Rcpp-devel] Unicode on windows I am having issues with the wide string conversion to and from Rcpp. When taking in a string from R that is encoding UTF-8, I would expect as<wstring> to have converted the utf-8 to a wide string. Instead, it is just widening all the characters and leaving the UTF-8 encoding. I have no issue with UTF-8, but my issue is that Rcpp doesn’t seem to be able to tell me what encoding the source is so I don’t know if I should convert or not. Similarly, I would expect that wrap<wstring> would produce a UTF-8 encoding SEXP, but instead the encoding in R comes back “Unknown” and the data can’t print. See The C++ & R sources below along with the output. C++ function ---------------------------------------- RcppExport SEXP TestWide(SEXP _strIn) { std::wstring strIn = Rcpp::as<std::wstring>(_strIn); for (const wchar_t *p = strIn.c_str(); *p; ++p) Rprintf("%x\n", *p); std::wstring str = L"a\x02a5c"; return Rcpp::wrap(str); } R Script ---------------------------------------- test <- "a\u02a5b" a<-.Call( "TestWide", test, PACKAGE = "AlteryxRDataX" ) print(Encoding(a)) print(a) R Output ---------------------------------------- R version 3.0.0 (2013-04-03) - x86_64 rgeos version: 0.2-16, (SVN revision 389) GEOS runtime version: 3.3.6-CAPI-1.7.6 Polygon checking: TRUE 61 ffca ffa5 62 "unknown" "a?" Thanks, *Ned Harding* Alteryx CTO 3825 Iris Avenue, Suite 150 Boulder, CO 80301 Phone: 720-259-0541 eMail: n...@alteryx.com <mailto:n...@alteryx.com> _______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
-- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 R Graph Gallery: http://gallery.r-enthusiasts.com blog: http://blog.r-enthusiasts.com |- http://bit.ly/13SrjxO : highlight 0.4.2 `- http://bit.ly/10X94UM : Mobile version of the graph gallery _______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel