On 10/31/19 12:58 AM, Toby Hocking wrote:
Hi all, I am getting an "invalid multibyte string" error from one of my
examples when it is run on solaris, which results in check FAILURE:
https://www.r-project.org/nosvn/R.check/r-patched-solaris-x86/nc-00check.html

To fix this I guess I could just delete this example, but is there any
easy/known fix? I searched the r-devel and r-package-devel lists and I did
not find any relevant threads.

I also see that the same package on r-hub solaris is a check PASS:
https://builder.r-hub.io/status/nc_2019.10.19.tar.gz-8b46d2a02a6340bcb313eeec96e404f3

I was expecting that CRAN and r-hub solaris builds should report the same
results. What could be the difference? is this a bug in CRAN or in r-hub?

The configuration of the CRAN check machine is given at https://cran.r-project.org/web/checks/check_flavors.html#r-patched-solaris-x86 (see the Details section). I cannot reproduce the problem on a Solaris machine I have access to (but it is yet a different configuration, so I am not surprised). The problem is that during substring(), the C library function mbrtowc() fails to convert a multi-byte coded string to a wide character, which is needed to know how many bytes are used. I am not sure why it fails without being able to reproduce, maybe the runtime library does not support Emoji, but of course there can be a bug in R, too. From the previous issue you have run into with Emoji, we know that the machine (compiler runtime) does not declare that wchar_t is Unicode.

Clearly, by using Emoji you are stress-testing R, packages, external libraries and the OS libraries, because these characters need surrogate pairs in UTF-16 but a lot of old code was written before they even existed, with all the problems of wchar_t.

Pragmatically, I would avoid using Emoji for these reasons in production systems. If you, instead, wanted to stress test R or libraries to find out where surrogate pairs were still not handled properly, it would be better to look for reproducible examples on systems you have access to and you can debug on your end. Some of these problems could be found simply by code inspection as well, though. We could then fix at places where it is easy or at least document in the code.

Best
Tomas

        [[alternative HTML version deleted]]

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Reply via email to