Bug#995678: Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-17 Thread Norbert Preining
reassign 995678 ghostscript
retitle 995678 deals incorrectly with embedded cmaps
thanks

As discussed on the TL mailing list, this is a problem of ghostscript.
Reassigning.

Best

Norbert

--
PREINING Norbert  https://www.preining.info
Fujitsu Research  +  IFMGA Guide  +  TU Wien  +  TeX Live  + Debian Dev
GPG: 0x860CDC13   fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13



Bug#995678: Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-11 Thread Vincent Lefevre
On 2021-10-12 01:27:22 +0900, Norbert Preining wrote:
> > >   /usr/bin/ps2pdf chartest3.pdf out.pdf
> 
> Honestly, feeding PDF into "ps2pdf" is something new, and I am not
> surprised that this does not work.

I don't understand what you mean by "something new". This has been
mentioned for many years (either with gs directly, or via the ps2pdf
wrapper):

  https://stackoverflow.com/questions/5296667/pdftk-compression-option
  
https://stackoverflow.com/questions/10450120/optimize-pdf-files-with-ghostscript-or-other
  
https://askubuntu.com/questions/113544/how-can-i-reduce-the-file-size-of-a-scanned-pdf-file
  https://askubuntu.com/questions/207447/how-to-reduce-the-size-of-a-pdf-file
  
https://askubuntu.com/questions/589862/how-to-compress-more-than-one-at-a-time-pdf-files-from-terminal

etc.

> The description clearly states
>   ps2pdf - Convert PostScript to PDF using ghostscript
> PDF is **not** PostScript.

The description has not been updated. The gs description is clearer:

  gs - Ghostscript (PostScript and PDF language interpreter and
  previewer)

> Can you explain me what are you trying to do?

Compress PDF files, in particular by converting Type 1 fonts
to Type 1C fonts and including only glyphs that are used,
instead of the full fonts.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995678: Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-11 Thread Norbert Preining
> >   /usr/bin/ps2pdf chartest3.pdf out.pdf

Honestly, feeding PDF into "ps2pdf" is something new, and I am not
surprised that this does not work.

The description clearly states
ps2pdf - Convert PostScript to PDF using ghostscript
PDF is **not** PostScript.

Can you explain me what are you trying to do?

Best

Norbert

--
PREINING Norbert  https://www.preining.info
Fujitsu Research  +  IFMGA Guide  +  TU Wien  +  TeX Live  + Debian Dev
GPG: 0x860CDC13   fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13



Bug#995678: Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-11 Thread Vincent Lefevre
On 2021-10-11 16:46:08 +0200, Vincent Lefevre wrote:
> I've attached an archive with
>   * chartest3.tex (already mentioned)
>   * chartest3.aux, chartest3.log, chartest3.pdf: the files generated
> with "pdflatex chartest3.tex"
>   * out.pdf: the file generated with
>   /usr/bin/ps2pdf chartest3.pdf out.pdf
> (ghostscript 9.54.0~dfsg-5 being installed) after unsetting
> GS_OPTIONS.

BTW, since ps2pdf is just a script, I could see with strace that
the following command was executed:

  /usr/bin/gs -P- -dSAFER -dCompatibilityLevel=1.4 -q -P- -dNOPAUSE -dBATCH 
-sDEVICE=pdfwrite -sstdout=%stderr -sOutputFile=out.pdf -P- -dSAFER 
-dCompatibilityLevel=1.4 chartest3.pdf

and indeed, I can reproduce the issue with this command (on
chartest3.pdf generated by pdflatex with TL 2021 installed).

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995678: Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-11 Thread Vincent Lefevre
Control: severity 995392 grave
Control: severity 995678 normal
Control: tags 995678 - moreinfo

Please be careful with the bug numbers...

On 2021-10-11 08:27:55 +0900, Norbert Preining wrote:
> Then, this is by far not a grave bug in TL. pdflatex is **not**
> affected, since it generated pdf files without using ghostscript.

??? A PDF file is designed to be read by other tools. If pdflatex
generates an invalid PDF file (as assumed by some Ghostscript
developer, which has not been confirmed yet), then it could be a
grave bug. This is a bit like C code with undefined behavior: with
some compilers, the compiled code may work, but other compilers may
generate code with erratic behavior; the bug is in the C code.

Here, the consequence is silent data loss when using Ghostscript on
the generated PDF file.

> Vincent, thanks for the tests, but without explanation or make files
> or some hints on **what** you did run, this is not reproducible and
> testable.

I've attached an archive with
  * chartest3.tex (already mentioned)
  * chartest3.aux, chartest3.log, chartest3.pdf: the files generated
with "pdflatex chartest3.tex"
  * out.pdf: the file generated with
  /usr/bin/ps2pdf chartest3.pdf out.pdf
(ghostscript 9.54.0~dfsg-5 being installed) after unsetting
GS_OPTIONS.

"pdftotext out.pdf -" gives:

Test: ń donŠt ż.

(but also copy-paste from xpdf and atril).

With chartest3.pdf generated by TL 2020, then the same ps2pdf command,
I get:

Test: « don’t ».

as expected.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


chartest.tar.xz
Description: application/xz