Yes, encoding is something we have not dealt with yet.

This is not high on my priority list but there are ways to alter that list, e.g. find a way to sponsor the development of that particular feature through funding or even crowdfunding if enough people are interested in having the feature and willing to pay for it.

Otherwise this will have to wait someone with the skills develops it.

Romain

Le 01/08/13 18:57, Ned Harding a écrit :
Just to follow up in case anyone other than me is using Unicode in R:
Rcpp does not support Unicode, or really any encoding other than 7 bit
ascii.  Internally R marks every string with an encoding, typically
UTF8, Latin1 or ASCII.  When using as<string> Rcpp just copies the bytes
over ignoring the encoding.  This means that if you take a string that
was utf8 and then later wrap it again, the encoding info is lost and the
characters get corrupted.  In particular, never use
Rcpp::as<std::wstring> because the string gets widened without being
converted to Unicode.

If you want (or need) to support Unicode text in an R plugin, you need
to use Rf_translateCharUTF8(…) to get a string.  Regardless of what
encoding it was originally, R will make sure it is encoded as UTF-8. In
order to set a string into a R object you have to use the corresponding
Rf_mkCharLenCE(p, len, CE_UTF8) function – which tells R that the data
you have is UTF-8.

Ned.

*From:*rcpp-devel-boun...@lists.r-forge.r-project.org
[mailto:rcpp-devel-boun...@lists.r-forge.r-project.org] *On Behalf Of
*Ned Harding
*Sent:* Wednesday, June 26, 2013 11:54 AM
*To:* rcpp-devel@lists.r-forge.r-project.org
*Subject:* [Rcpp-devel] Unicode on windows

I am having issues with the wide string conversion to and from Rcpp.
When taking in a string from R that is encoding UTF-8, I would expect
as<wstring> to have converted the utf-8 to a wide string.  Instead, it
is just widening all the characters and leaving the UTF-8 encoding.  I
have no issue with UTF-8, but my issue is that Rcpp doesn’t seem to be
able to tell me what encoding the source is so I don’t know if I should
convert or not.

Similarly, I would expect that wrap<wstring> would produce a UTF-8
encoding SEXP, but instead the encoding in R comes back “Unknown” and
the data can’t print.  See The C++ & R sources below along with the output.

C++ function

----------------------------------------

RcppExport SEXP TestWide(SEXP _strIn)

{

                 std::wstring strIn = Rcpp::as<std::wstring>(_strIn);

                 for (const wchar_t *p = strIn.c_str(); *p; ++p)

                                 Rprintf("%x\n", *p);

                 std::wstring str = L"a\x02a5c";

                 return Rcpp::wrap(str);

}

R Script

----------------------------------------

test <- "a\u02a5b"

a<-.Call( "TestWide", test, PACKAGE = "AlteryxRDataX" )

print(Encoding(a))

print(a)

R Output

----------------------------------------

R version 3.0.0 (2013-04-03) - x86_64

rgeos version: 0.2-16, (SVN revision 389)

GEOS runtime version: 3.3.6-CAPI-1.7.6

Polygon checking: TRUE

61

ffca

ffa5

62

"unknown"

"a?"

Thanks,

*Ned Harding*

Alteryx

CTO

3825 Iris Avenue, Suite 150

Boulder, CO 80301

Phone:  720-259-0541

eMail: n...@alteryx.com <mailto:n...@alteryx.com>



_______________________________________________
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel



--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

R Graph Gallery: http://gallery.r-enthusiasts.com

blog:            http://blog.r-enthusiasts.com
|- http://bit.ly/13SrjxO : highlight 0.4.2
`- http://bit.ly/10X94UM : Mobile version of the graph gallery

_______________________________________________
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Reply via email to