[R-pkg-devel] Intrinsic UTF-8 use in aspired CRAN package

Schuhmacher, Dominic Thu, 18 May 2023 00:17:09 -0700

Dear list,

I have a package 
https://github.com/dschuhmacher/kanjistat
whose very purpose depends on working with Japanese kanji characters (in UTF-8 
encoding). Such characters appear vitally in the data sets, examples, tests, 
the vignette and the .Rd files.


My package checks fine with devtools::check on my system and via Github Actions 
produced with usethis::use_github_action_check_standard().
However, I would like to release the package on CRAN, and running R CMD check 
--as-cran gives me a number of headaches, mainly related to the production of 
pdf documents via latex as it seems to be not so easy to convince latex to 
typeset Japanese, see https://www.overleaf.com/learn/latex/Japanese

For the vignette, I can set in the Rmarkdown file
  pdf_document:
    latex_engine: lualatex
    includes:
      in_header: preamble.tex
and in the file preamble.tex
\usepackage{luatexja}
\usepackage{microtype}
This gives me a pdf-vignette that looks and checks fine (except that the 
abovementioned GitHub Actions don't seem to find lualatex, which is why the pdf 
output is commented out in the main branch on GitHub).

Unfortunately, I fail to find a similar solution for the pdf manual. R CMD 
check yields
--------------
checking PDF version of manual ... WARNING
LaTeX errors when creating PDF version.
This typically indicates Rd problems.
LaTeX errors found:
! Package inputenc Error: Unicode character 冷 (U+51B7)
(inputenc) not set up for use with LaTeX.
[and many more of the same]
* checking PDF version of manual without index ... ERROR
--------------
It seems that the pdf manual is generated by first producing a texinfo file and 
then running texi2dvi. From
https://www.gnu.org/software/texinfo/manual/texinfo/html_node/Inserting-Unicode.html
I take the message that texinfo does not do Japanese... Is there any way to 
work around the use of texinfo and use lualatex (with a preamble) instead? If 
not, is there a way to keep the UTF-8 encoded characters in the html help (I 
think this is very useful for the user!) and still produce a pdf that passes 
the check, e.g. by replacing the kanji characters automatically by their 
codepoints (or even a generic placeholder symbol) when generating the pdf 
manual?

Any thoughts and suggestions on this would be greatly appreciated! I think/hope 
then that the remaining problems in R CMD check are acceptable to the CRAN team 
given the nature of my package. They are:

1. Examples and tests fail if the check is not run in an UTF-8 locale.

2. checking data for non-ASCII characters ... NOTE
   Note: found 111752 marked UTF-8 strings

Many thanks,
Dominic Schuhmacher




______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

[R-pkg-devel] Intrinsic UTF-8 use in aspired CRAN package

Reply via email to