Re: [R-pkg-devel] Intrinsic UTF-8 use in aspired CRAN package

2023-05-18 Thread Schuhmacher, Dominic
Hi Ivan,

Thanks for the extensive answer. Both
PDFLATEX=xelatex R CMD Rd2pdf .
and
PDFLATEX=lualatex R CMD Rd2pdf .
work for me (for the whole package doc).

And yes, in both cases I need to inject code in the preamble of the .tex file. 
In fact for lualatex (which I prefer from my experience with the vignette)
PDFLATEX=lualatex RD2PDF_INPUTENC='inputenc}\usepackage{luatexja' R CMD Rd2pdf .
generates the desired manual with the correct characters.

That the font styles for the keywords are not salvaged from Rd.sty seems to be 
unfortunate and could possibly be fixed with some \renewcommands (which luckily 
go into the preamble ;-), but that would probably be too much of a hack even 
for my taste...

By the way, what is the recommended way of setting environment variables like 
PDFLATEX and (possibly) RD2PDF_INPUTENC in a package (if this is something that 
is allowed on CRAN)? If it is a Makevars file, where do I put it, directly into 
the man folder?

Best regards,
Dominic




> On 18. May 2023, at 13:21, Ivan Krylov  wrote:
> 
> В Wed, 17 May 2023 12:05:49 +
> "Schuhmacher, Dominic"
>  пишет:
> 
>> checking PDF version of manual ... WARNING
>> LaTeX errors when creating PDF version.
>> This typically indicates Rd problems.
>> LaTeX errors found:
>> ! Package inputenc Error: Unicode character 冷 (U+51B7)
>> (inputenc) not set up for use with LaTeX.
> 
> I see you'd like to use Kanji characters in your R documentation (not
> only a vignette). There are some workarounds for Cyrillic alphabets
> (that work if you set a special environment variable), but quite a lot
> more hurdles will need to be traversed for CJK support, and I'm not
> sure that CRAN will accept the result even if you overcome them on your
> own machine.
> 
> 1. You might need to switch the LaTeX engine from the default of
> pdflatex. (XeLaTeX in particular seems to have much better Unicode
> support.) Both the texi2dvi shell script and R's emulation of it
> understand the PDFLATEX environment variable (thank you Martin for
> mentioning this!), but I'm not sure there is a way to require an
> environment variable to be set for all invocations of R CMD INSTALL.
> Anyway, as Overleaf says, pdflatex can support CJK, but in a less
> convenient manner.
> 
> 2. For pdflatex, it's possible to use \usepackage{CJKutf8}. The
> required Debian packages are latex-cjk-japanese-wadalab (fonts) and
> latex-cjk-common (CJKutf8.sty itself). There's no way to require these
> packages to be installed on machines where your package's PDF
> documentation might be built.
> 
> 3. Once the packages are installed and you can compile an example *.tex
> file containing Kanji, it's time to get R's PDF documentation system to
> use these packages. You need to insert \usepackage{CJKutf8} in the
> document's preamble (which is too late for Rd \out{} markup). I don't
> see a way to convince Rd2pdf to do that, but there's a terrible hack to
> do that using a LaTeX injection from an undocumented environment
> variable.
> 
> 4. All uses of CJK characters need to be wrapped in
> \begin{CJK}{utf8}{min} ... \end{CJK}. Thankfully, this at least can be
> achieved in Rd using \if{latex}{\out{\begin{CJK}{utf8}{min}}} and can
> be wrapped in an Rd macro using \newcommand in man/macros/whatever.Rd.
> 
> Unfortunately, I couldn't find a way to wrap the \examples{} section in
> \begin{CJK}...\end{CJK}, so CJK characters cannot be used there.
> 
> To summarise, the Rd file from
> 

Re: [R-pkg-devel] Intrinsic UTF-8 use in aspired CRAN package

2023-05-18 Thread Ivan Krylov
В Wed, 17 May 2023 12:05:49 +
"Schuhmacher, Dominic"
 пишет:

> checking PDF version of manual ... WARNING
> LaTeX errors when creating PDF version.
> This typically indicates Rd problems.
> LaTeX errors found:
> ! Package inputenc Error: Unicode character 冷 (U+51B7)
> (inputenc) not set up for use with LaTeX.

I see you'd like to use Kanji characters in your R documentation (not
only a vignette). There are some workarounds for Cyrillic alphabets
(that work if you set a special environment variable), but quite a lot
more hurdles will need to be traversed for CJK support, and I'm not
sure that CRAN will accept the result even if you overcome them on your
own machine.

1. You might need to switch the LaTeX engine from the default of
pdflatex. (XeLaTeX in particular seems to have much better Unicode
support.) Both the texi2dvi shell script and R's emulation of it
understand the PDFLATEX environment variable (thank you Martin for
mentioning this!), but I'm not sure there is a way to require an
environment variable to be set for all invocations of R CMD INSTALL.
Anyway, as Overleaf says, pdflatex can support CJK, but in a less
convenient manner.

2. For pdflatex, it's possible to use \usepackage{CJKutf8}. The
required Debian packages are latex-cjk-japanese-wadalab (fonts) and
latex-cjk-common (CJKutf8.sty itself). There's no way to require these
packages to be installed on machines where your package's PDF
documentation might be built.

3. Once the packages are installed and you can compile an example *.tex
file containing Kanji, it's time to get R's PDF documentation system to
use these packages. You need to insert \usepackage{CJKutf8} in the
document's preamble (which is too late for Rd \out{} markup). I don't
see a way to convince Rd2pdf to do that, but there's a terrible hack to
do that using a LaTeX injection from an undocumented environment
variable.

4. All uses of CJK characters need to be wrapped in
\begin{CJK}{utf8}{min} ... \end{CJK}. Thankfully, this at least can be
achieved in Rd using \if{latex}{\out{\begin{CJK}{utf8}{min}}} and can
be wrapped in an Rd macro using \newcommand in man/macros/whatever.Rd.

Unfortunately, I couldn't find a way to wrap the \examples{} section in
\begin{CJK}...\end{CJK}, so CJK characters cannot be used there.

To summarise, the Rd file from

Re: [R-pkg-devel] Intrinsic UTF-8 use in aspired CRAN package

2023-05-18 Thread Uwe Ligges




On 18.05.2023 10:03, Martin Maechler wrote:

Schuhmacher, Dominic
 on Wed, 17 May 2023 12:05:49 + writes:


 > Dear list, I have a package
 > https://github.com/dschuhmacher/kanjistat whose very
 > purpose depends on working with Japanese kanji characters
 > (in UTF-8 encoding). Such characters appear vitally in the
 > data sets, examples, tests, the vignette and the .Rd
 > files.

 > My package checks fine with devtools::check on my system
 > and via Github Actions produced with
 > usethis::use_github_action_check_standard().  However, I
 > would like to release the package on CRAN, and running R
 > CMD check --as-cran gives me a number of headaches, mainly
 > related to the production of pdf documents via latex as it
 > seems to be not so easy to convince latex to typeset
 > Japanese, see
 > https://www.overleaf.com/learn/latex/Japanese

 > For the vignette, I can set in the Rmarkdown file
 > pdf_document: latex_engine: lualatex includes: in_header:
 > preamble.tex and in the file preamble.tex
 > \usepackage{luatexja} \usepackage{microtype} This gives me
 > a pdf-vignette that looks and checks fine (except that the
 > abovementioned GitHub Actions don't seem to find lualatex,
 > which is why the pdf output is commented out in the main
 > branch on GitHub).

 > Unfortunately, I fail to find a similar solution for the
 > pdf manual. R CMD check yields
 > --
 > checking PDF version of manual ... WARNING LaTeX errors
 > when creating PDF version.  This typically indicates Rd
 > problems.  LaTeX errors found: ! Package inputenc Error:
 > Unicode character 冷 (U+51B7) (inputenc) not set up for



Can you send me a minimal example package with these characters in an Rd 
file?


Best,
Uwe Ligges



 > use with LaTeX.  [and many more of the same] * checking
 > PDF version of manual without index ... ERROR
 > --
 > It seems that the pdf manual is generated by first
 > producing a texinfo file and then running texi2dvi. From
 > 
https://www.gnu.org/software/texinfo/manual/texinfo/html_node/Inserting-Unicode.html
 > I take the message that texinfo does not do Japanese... Is
 > there any way to work around the use of texinfo and use
 > lualatex (with a preamble) instead? If not, is there a way
 > to keep the UTF-8 encoded characters in the html help (I
 > think this is very useful for the user!) and still produce
 > a pdf that passes the check, e.g. by replacing the kanji
 > characters automatically by their codepoints (or even a
 > generic placeholder symbol) when generating the pdf
 > manual?

I cannot help much more,
but be assured that  texinfo is *not* used in the process
It's just a "historical coincidence"  that  texi2dvi , a "simple"
shell script, typically comes from the texinfo ("software
package", i.e., in Linux distributions the texi2dvi command
(shell script, see above) is provided by the 'texinfo'
(Debian/Ubuntu/..) package

man texi2dvi  tells you about a sleuth of environment variables,
notably  PDFLATEX  TEX etc and I guess you can just set one of
these to 'lualatex' .. .. and of course lualatex must be
findable on the CRAN servers but I'd bet that to be the case.

Best,
Martin



 > Any thoughts and suggestions on this would be greatly
 > appreciated! I think/hope then that the remaining problems
 > in R CMD check are acceptable to the CRAN team given the
 > nature of my package. They are:

 > 1. Examples and tests fail if the check is not run in an
 > UTF-8 locale.

 > 2. checking data for non-ASCII characters ... NOTE Note:
 > found 111752 marked UTF-8 strings

 > Many thanks, Dominic Schuhmacher




 > __
 > R-package-devel@r-project.org mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-package-devel

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Intrinsic UTF-8 use in aspired CRAN package

2023-05-18 Thread Martin Maechler
> Schuhmacher, Dominic 
> on Wed, 17 May 2023 12:05:49 + writes:

> Dear list, I have a package
> https://github.com/dschuhmacher/kanjistat whose very
> purpose depends on working with Japanese kanji characters
> (in UTF-8 encoding). Such characters appear vitally in the
> data sets, examples, tests, the vignette and the .Rd
> files.

> My package checks fine with devtools::check on my system
> and via Github Actions produced with
> usethis::use_github_action_check_standard().  However, I
> would like to release the package on CRAN, and running R
> CMD check --as-cran gives me a number of headaches, mainly
> related to the production of pdf documents via latex as it
> seems to be not so easy to convince latex to typeset
> Japanese, see
> https://www.overleaf.com/learn/latex/Japanese

> For the vignette, I can set in the Rmarkdown file
> pdf_document: latex_engine: lualatex includes: in_header:
> preamble.tex and in the file preamble.tex
> \usepackage{luatexja} \usepackage{microtype} This gives me
> a pdf-vignette that looks and checks fine (except that the
> abovementioned GitHub Actions don't seem to find lualatex,
> which is why the pdf output is commented out in the main
> branch on GitHub).

> Unfortunately, I fail to find a similar solution for the
> pdf manual. R CMD check yields
> --
> checking PDF version of manual ... WARNING LaTeX errors
> when creating PDF version.  This typically indicates Rd
> problems.  LaTeX errors found: ! Package inputenc Error:
> Unicode character 冷 (U+51B7) (inputenc) not set up for
> use with LaTeX.  [and many more of the same] * checking
> PDF version of manual without index ... ERROR
> --
> It seems that the pdf manual is generated by first
> producing a texinfo file and then running texi2dvi. From
> 
https://www.gnu.org/software/texinfo/manual/texinfo/html_node/Inserting-Unicode.html
> I take the message that texinfo does not do Japanese... Is
> there any way to work around the use of texinfo and use
> lualatex (with a preamble) instead? If not, is there a way
> to keep the UTF-8 encoded characters in the html help (I
> think this is very useful for the user!) and still produce
> a pdf that passes the check, e.g. by replacing the kanji
> characters automatically by their codepoints (or even a
> generic placeholder symbol) when generating the pdf
> manual?

I cannot help much more,
but be assured that  texinfo is *not* used in the process
It's just a "historical coincidence"  that  texi2dvi , a "simple"
shell script, typically comes from the texinfo ("software
package", i.e., in Linux distributions the texi2dvi command
(shell script, see above) is provided by the 'texinfo'
(Debian/Ubuntu/..) package

man texi2dvi  tells you about a sleuth of environment variables,
notably  PDFLATEX  TEX etc and I guess you can just set one of
these to 'lualatex' .. .. and of course lualatex must be
findable on the CRAN servers but I'd bet that to be the case.

Best,
Martin



> Any thoughts and suggestions on this would be greatly
> appreciated! I think/hope then that the remaining problems
> in R CMD check are acceptable to the CRAN team given the
> nature of my package. They are:

> 1. Examples and tests fail if the check is not run in an
> UTF-8 locale.

> 2. checking data for non-ASCII characters ... NOTE Note:
> found 111752 marked UTF-8 strings

> Many thanks, Dominic Schuhmacher




> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] Intrinsic UTF-8 use in aspired CRAN package

2023-05-18 Thread Schuhmacher, Dominic
Dear list,

I have a package 
https://github.com/dschuhmacher/kanjistat
whose very purpose depends on working with Japanese kanji characters (in UTF-8 
encoding). Such characters appear vitally in the data sets, examples, tests, 
the vignette and the .Rd files.

My package checks fine with devtools::check on my system and via Github Actions 
produced with usethis::use_github_action_check_standard().
However, I would like to release the package on CRAN, and running R CMD check 
--as-cran gives me a number of headaches, mainly related to the production of 
pdf documents via latex as it seems to be not so easy to convince latex to 
typeset Japanese, see https://www.overleaf.com/learn/latex/Japanese

For the vignette, I can set in the Rmarkdown file
  pdf_document:
latex_engine: lualatex
includes:
  in_header: preamble.tex
and in the file preamble.tex
\usepackage{luatexja}
\usepackage{microtype}
This gives me a pdf-vignette that looks and checks fine (except that the 
abovementioned GitHub Actions don't seem to find lualatex, which is why the pdf 
output is commented out in the main branch on GitHub).

Unfortunately, I fail to find a similar solution for the pdf manual. R CMD 
check yields
--
checking PDF version of manual ... WARNING
LaTeX errors when creating PDF version.
This typically indicates Rd problems.
LaTeX errors found:
! Package inputenc Error: Unicode character 冷 (U+51B7)
(inputenc) not set up for use with LaTeX.
[and many more of the same]
* checking PDF version of manual without index ... ERROR
--
It seems that the pdf manual is generated by first producing a texinfo file and 
then running texi2dvi. From
https://www.gnu.org/software/texinfo/manual/texinfo/html_node/Inserting-Unicode.html
I take the message that texinfo does not do Japanese... Is there any way to 
work around the use of texinfo and use lualatex (with a preamble) instead? If 
not, is there a way to keep the UTF-8 encoded characters in the html help (I 
think this is very useful for the user!) and still produce a pdf that passes 
the check, e.g. by replacing the kanji characters automatically by their 
codepoints (or even a generic placeholder symbol) when generating the pdf 
manual?

Any thoughts and suggestions on this would be greatly appreciated! I think/hope 
then that the remaining problems in R CMD check are acceptable to the CRAN team 
given the nature of my package. They are:

1. Examples and tests fail if the check is not run in an UTF-8 locale.

2. checking data for non-ASCII characters ... NOTE
   Note: found 111752 marked UTF-8 strings

Many thanks,
Dominic Schuhmacher




__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel