Bug#995392: ghostscript: ps2pdf trashes some characters

2022-04-13 Thread Vincent Lefevre
Control: reopen -1

On 2022-03-31 17:12:16 +0200, Vincent Lefevre wrote:
> [...] It now appears that this remaining issue was related to the
> PDF interpreter, not just the PDF writer. So I'm updating the bug
> title to be less restrictive.

The new PDF interpreter has actually more important issues: the
math symbols do not appear correctly with pdftotext. So one needs
to use the old interpreter with -dNEWPDF=false, which makes the
bug reappear.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995392: ghostscript: ps2pdf trashes some characters

2022-03-31 Thread Vincent Lefevre
Control: retitle 995392 ghostscript: ToUnicode CMap has incorrect mappings

Well, the story is the following one. On some PDF file, I initially
found an issue visible with pdftotext (in particular) after running
ps2pdf. The fact is that there were several bugs (which I didn't
know), and when I tried to simplify the file to produce a simple
testcase, I hit some of these bugs, later clearly identified in

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=998458

and

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=998461

(both in the PDF writer). But when these bugs were fixed, I could
still see an issue with my original file, still tracked in this
Debian bug. It now appears that this remaining issue was related
to the PDF interpreter, not just the PDF writer. So I'm updating
the bug title to be less restrictive.

Note: the old PDF interpreter can still be used with

  ps2pdf -dNEWPDF=false ...

which makes the bug reappear.

According to upstream, the bug may reappear later, but I don't
think that it is useful to reopen this bug in Debian, as long as
issues are no longer visible in Debian.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995392: ghostscript: ps2pdf trashes some characters

2021-11-04 Thread Jonas Smedegaard
Quoting Vincent Lefevre (2021-11-04 16:49:34)
> On 2021-11-03 18:06:50 +0100, Jonas Smedegaard wrote:
> > Quoting Vincent Lefevre (2021-11-03 14:29:26)
> > > This Debian bug actually covers several similar Ghostscript bugs.
> > 
> > Please track each bug separately.  Otherwise it is not possible to 
> > reliably track which bug affects which packaging releases.
> 
> Done:
>   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=998458
>   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=998461
> 
> I'm going to do tests against various ghostscript versions to see
> which ones are affected (I've currently only tested the experimental
> version ghostscript/9.55.0~~rc1~dfsg-1 for these bug reports).

Thanks!

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private

signature.asc
Description: signature


Bug#995392: ghostscript: ps2pdf trashes some characters

2021-11-04 Thread Vincent Lefevre
On 2021-11-03 18:06:50 +0100, Jonas Smedegaard wrote:
> Quoting Vincent Lefevre (2021-11-03 14:29:26)
> > This Debian bug actually covers several similar Ghostscript bugs.
> 
> Please track each bug separately.  Otherwise it is not possible to 
> reliably track which bug affects which packaging releases.

Done:
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=998458
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=998461

I'm going to do tests against various ghostscript versions to see
which ones are affected (I've currently only tested the experimental
version ghostscript/9.55.0~~rc1~dfsg-1 for these bug reports).

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995392: ghostscript: ps2pdf trashes some characters

2021-11-03 Thread Jonas Smedegaard
Quoting Vincent Lefevre (2021-11-03 14:29:26)
> This Debian bug actually covers several similar Ghostscript bugs.

Please track each bug separately.  Otherwise it is not possible to 
reliably track which bug affects which packaging releases.


 - Jonas

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private

signature.asc
Description: signature


Bug#995392: ghostscript: ps2pdf trashes some characters

2021-11-03 Thread Vincent Lefevre
Control: forwarded -1 https://bugs.ghostscript.com/show_bug.cgi?id=704681

On 2021-11-03 14:29:26 +0100, Vincent Lefevre wrote:
> Control: retitle -1 ghostscript: pdfwrite incorrectly deals with embedded 
> ToUnicode CMap
> Control: found -1 9.27~dfsg-2+deb10u4
> 
> This Debian bug actually covers several similar Ghostscript bugs.
> I consider the most general bug given by the testcase below,
> which is still not fixed upstream.

So I should move it to the new upstream bug URL.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995392: ghostscript: ps2pdf trashes some characters

2021-11-03 Thread Vincent Lefevre
Control: retitle -1 ghostscript: pdfwrite incorrectly deals with embedded 
ToUnicode CMap
Control: found -1 9.27~dfsg-2+deb10u4

This Debian bug actually covers several similar Ghostscript bugs.
I consider the most general bug given by the testcase below,
which is still not fixed upstream.

On 2021-11-03 05:04:36 +0100, Vincent Lefevre wrote:
> \documentclass{article}
> \usepackage[T1]{fontenc}
> \usepackage{lmodern}
> \pdfglyphtounicode{Scaron}{0160}
> \pdfgentounicode=1
> \begin{document}
> \thispagestyle{empty}
> 'ê
> \end{document}

This testcase shows that this bug is not new and could have always
been present in Ghostscript. The fact is that I had never used
\pdfglyphtounicode{Scaron}{0160} before (I don't need it), and
I noticed this bug only due to TeX Live 2021, which now uses this
mapping (among others), unless all mappings are disabled by the
user with an explicit \pdfgentounicode=0.

This remaining bug (the other ones being recently fixed upstream)
might be specific to this mapping. A partial cause may be that
the ' character is transformed to a /quoteright, which leads to
a /Differences that confuses Ghostscript when generating the
ToUnicode CMap.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995392: ghostscript: ps2pdf trashes some characters

2021-11-02 Thread Vincent Lefevre
On 2021-11-03 03:29:43 +0100, Vincent Lefevre wrote:
> On 2021-11-02 16:25:27 +0100, Vincent Lefevre wrote:
> > With commit 8f62213019bc682eeb0ed9467d8841f3770cfda6 upstream,
> > I can no longer reproduce any issue, even when
> > /usr/share/texlive/texmf-dist/tex/generic/pdftex/glyphtounicode.tex
> > from Tex Live 2020 is included and \pdfgentounicode=1 is used.
> 
> Hmm... I didn't check carefully. On one of my files, there is
> actually one place where the quoteright (used for the apostrophe)
> is replaced by "Š" (checked with pdftotext, xpdf and atril). The
> cause may be that the paragraph in question is in a smaller font.

I have an explanation: it seems that in this smaller font,
no ligatures (ff, ffi, fl...) are used.

In a recent fix, Ghostscript no longer generates a ToUnicode CMap
when there are \pdfglyphtounicode with more than 2 bytes (such as
those used for the ligatures). So this fix made the bug disappear
when ligatures are used. Bug the bug was still there, and visible
when ligatures are not used.

> So the issue is still visible in practice.
> 
> I'll try to produce a simple testcase.

Here is it:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\pdfglyphtounicode{Scaron}{0160}
\pdfgentounicode=1
\begin{document}
\thispagestyle{empty}
'ê
\end{document}

(Tested on the PDF generated by pdflatex from TeX Live 2020.)

My new upstream bug report:

  https://bugs.ghostscript.com/show_bug.cgi?id=704681

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995392: ghostscript: ps2pdf trashes some characters

2021-11-02 Thread Vincent Lefevre
Control: tags -1 - fixed-upstream

On 2021-11-02 16:25:27 +0100, Vincent Lefevre wrote:
> With commit 8f62213019bc682eeb0ed9467d8841f3770cfda6 upstream,
> I can no longer reproduce any issue, even when
> /usr/share/texlive/texmf-dist/tex/generic/pdftex/glyphtounicode.tex
> from Tex Live 2020 is included and \pdfgentounicode=1 is used.

Hmm... I didn't check carefully. On one of my files, there is
actually one place where the quoteright (used for the apostrophe)
is replaced by "Š" (checked with pdftotext, xpdf and atril). The
cause may be that the paragraph in question is in a smaller font.
So the issue is still visible in practice.

I'll try to produce a simple testcase.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995392: ghostscript: ps2pdf trashes some characters

2021-11-02 Thread Vincent Lefevre
On 2021-11-02 16:25:27 +0100, Vincent Lefevre wrote:
> With commit 8f62213019bc682eeb0ed9467d8841f3770cfda6 upstream,
> I can no longer reproduce any issue, even when
> /usr/share/texlive/texmf-dist/tex/generic/pdftex/glyphtounicode.tex
> from Tex Live 2020 is included and \pdfgentounicode=1 is used.
[...]

To be clear, both commit 8f62213019bc682eeb0ed9467d8841f3770cfda6
and the older commit b4e8434defb8e05ea05bb130b92217290efd2fba
should be needed and be sufficient to solve all the issues.

Commit b4e8434defb8e05ea05bb130b92217290efd2fba fixed

  https://bugs.ghostscript.com/show_bug.cgi?id=704478

triggered by the first simple testcase in this bug (the full story
is that by reducing my testcase, I inadvertently found this other
related issue).

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995392: ghostscript: ps2pdf trashes some characters

2021-11-02 Thread Vincent Lefevre
Control: found -1 9.55.0~~rc1~dfsg-1
Control: tags -1 fixed-upstream

With commit 8f62213019bc682eeb0ed9467d8841f3770cfda6 upstream,
I can no longer reproduce any issue, even when
/usr/share/texlive/texmf-dist/tex/generic/pdftex/glyphtounicode.tex
from Tex Live 2020 is included and \pdfgentounicode=1 is used.
However, note that:
  * Yesterday, I could still find an issue, though I could have made
a mistake in my test (e.g. looking at an old file): I've just
tested the same .tex file (with the same included files), and
the issue no longer appears.
  * The upstream bug is still open as an enhancement. Apparently,
Ghostscript no longer generates a (buggy) ToUnicode CMap, and
this could yield issues, but in practice on my files, everything
seems fine with xpdf, atril and pdftotext (not sure what happens,
but if they are using heuristics, they are working fine).

So I'm tagging this as fixed-upstream (the upstream bug should be
sufficient for the enhancement). Note that 9.55.0~~rc1~dfsg-1 from
experimental does not have the commit mentioned above, and I've
checked that this version is still incorrect.

I haven't tested with PDF generated by TeX Live 2021 yet, but I don't
expect any issue.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-28 Thread Vincent Lefevre
Control: forwarded -1 https://bugs.ghostscript.com/show_bug.cgi?id=704674
Control: tags -1 - fixed-upstream
Control: retitle -1 ghostscript: pdfwrite no longer preserves the ToUnicode 
CMap of PDF files

This is actually the real upstream bug (it appears that the first
testcase I gave was affected by a similar bug).

See the testcases in the attached archive from

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=995392#86

I've attached the main testcase.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


chartest5a-tl2021.pdf
Description: Adobe PDF document


Bug#995678: Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-17 Thread Norbert Preining
reassign 995678 ghostscript
retitle 995678 deals incorrectly with embedded cmaps
thanks

As discussed on the TL mailing list, this is a problem of ghostscript.
Reassigning.

Best

Norbert

--
PREINING Norbert  https://www.preining.info
Fujitsu Research  +  IFMGA Guide  +  TU Wien  +  TeX Live  + Debian Dev
GPG: 0x860CDC13   fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13



Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-14 Thread Jonas Smedegaard
Control: severity -1 normal

Quoting Vincent Lefevre (2021-09-30 16:53:01)
> Package: ghostscript
> Version: 9.54.0~dfsg-5
> Severity: grave
> Justification: causes non-serious data loss
> 
> The ps2pdf trashes some characters, making text non-searchable and
> partly unreadable via pdftotext (even though the glyph appears to
> be OK). There was no such issue in the recent past.

I agree with Ken Sharp¹ that this issue is not grave for the ghostscript 
package as a whole - regardless of how important that feature of 
ghostscript is for your usecase.

 - Jonas

¹ https://bugs.ghostscript.com/show_bug.cgi?id=704478#c8

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private

signature.asc
Description: signature


Bug#995678: Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-11 Thread Vincent Lefevre
On 2021-10-12 01:27:22 +0900, Norbert Preining wrote:
> > >   /usr/bin/ps2pdf chartest3.pdf out.pdf
> 
> Honestly, feeding PDF into "ps2pdf" is something new, and I am not
> surprised that this does not work.

I don't understand what you mean by "something new". This has been
mentioned for many years (either with gs directly, or via the ps2pdf
wrapper):

  https://stackoverflow.com/questions/5296667/pdftk-compression-option
  
https://stackoverflow.com/questions/10450120/optimize-pdf-files-with-ghostscript-or-other
  
https://askubuntu.com/questions/113544/how-can-i-reduce-the-file-size-of-a-scanned-pdf-file
  https://askubuntu.com/questions/207447/how-to-reduce-the-size-of-a-pdf-file
  
https://askubuntu.com/questions/589862/how-to-compress-more-than-one-at-a-time-pdf-files-from-terminal

etc.

> The description clearly states
>   ps2pdf - Convert PostScript to PDF using ghostscript
> PDF is **not** PostScript.

The description has not been updated. The gs description is clearer:

  gs - Ghostscript (PostScript and PDF language interpreter and
  previewer)

> Can you explain me what are you trying to do?

Compress PDF files, in particular by converting Type 1 fonts
to Type 1C fonts and including only glyphs that are used,
instead of the full fonts.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995678: Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-11 Thread Norbert Preining
> >   /usr/bin/ps2pdf chartest3.pdf out.pdf

Honestly, feeding PDF into "ps2pdf" is something new, and I am not
surprised that this does not work.

The description clearly states
ps2pdf - Convert PostScript to PDF using ghostscript
PDF is **not** PostScript.

Can you explain me what are you trying to do?

Best

Norbert

--
PREINING Norbert  https://www.preining.info
Fujitsu Research  +  IFMGA Guide  +  TU Wien  +  TeX Live  + Debian Dev
GPG: 0x860CDC13   fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13



Bug#995678: Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-11 Thread Vincent Lefevre
On 2021-10-11 16:46:08 +0200, Vincent Lefevre wrote:
> I've attached an archive with
>   * chartest3.tex (already mentioned)
>   * chartest3.aux, chartest3.log, chartest3.pdf: the files generated
> with "pdflatex chartest3.tex"
>   * out.pdf: the file generated with
>   /usr/bin/ps2pdf chartest3.pdf out.pdf
> (ghostscript 9.54.0~dfsg-5 being installed) after unsetting
> GS_OPTIONS.

BTW, since ps2pdf is just a script, I could see with strace that
the following command was executed:

  /usr/bin/gs -P- -dSAFER -dCompatibilityLevel=1.4 -q -P- -dNOPAUSE -dBATCH 
-sDEVICE=pdfwrite -sstdout=%stderr -sOutputFile=out.pdf -P- -dSAFER 
-dCompatibilityLevel=1.4 chartest3.pdf

and indeed, I can reproduce the issue with this command (on
chartest3.pdf generated by pdflatex with TL 2021 installed).

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995678: Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-11 Thread Vincent Lefevre
Control: severity 995392 grave
Control: severity 995678 normal
Control: tags 995678 - moreinfo

Please be careful with the bug numbers...

On 2021-10-11 08:27:55 +0900, Norbert Preining wrote:
> Then, this is by far not a grave bug in TL. pdflatex is **not**
> affected, since it generated pdf files without using ghostscript.

??? A PDF file is designed to be read by other tools. If pdflatex
generates an invalid PDF file (as assumed by some Ghostscript
developer, which has not been confirmed yet), then it could be a
grave bug. This is a bit like C code with undefined behavior: with
some compilers, the compiled code may work, but other compilers may
generate code with erratic behavior; the bug is in the C code.

Here, the consequence is silent data loss when using Ghostscript on
the generated PDF file.

> Vincent, thanks for the tests, but without explanation or make files
> or some hints on **what** you did run, this is not reproducible and
> testable.

I've attached an archive with
  * chartest3.tex (already mentioned)
  * chartest3.aux, chartest3.log, chartest3.pdf: the files generated
with "pdflatex chartest3.tex"
  * out.pdf: the file generated with
  /usr/bin/ps2pdf chartest3.pdf out.pdf
(ghostscript 9.54.0~dfsg-5 being installed) after unsetting
GS_OPTIONS.

"pdftotext out.pdf -" gives:

Test: ń donŠt ż.

(but also copy-paste from xpdf and atril).

With chartest3.pdf generated by TL 2020, then the same ps2pdf command,
I get:

Test: « don’t ».

as expected.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


chartest.tar.xz
Description: application/xz


Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-10 Thread Norbert Preining
severity 995392 normal
tags 995392 + moreinfo
thanks

Hi all,

first of all, it seems this message didn't make it either to the list or
my computer, just found it by randomly checking transitioning.

Then, this is by far not a grave bug in TL. pdflatex is **not**
affected, since it generated pdf files without using ghostscript.

What might have some problems - and I haven't reproduced this till now
nor tested - is dvi -> ps -> pdf route.

>   * chartest3.tex:  Test: « don't ».
>   * chartest4[ab].tex:  Test: « don't finite float ».
>   * chartest5[ab].tex:  Test: « don't finite float offer affine ».
> where the 4b and 5b versions contain \pdfglyphtounicode commands for
> the ligatures (from glyphtounicode.tex), though the tests below show
> that they do not have any influence here.

Vincent, thanks for the tests, but without explanation or make files or
some hints on **what** you did run, this is not reproducible and
testable.

What I want to see is
* input file
* commands run
* log files of each program run
* what is the problematic output

Thanks

> \documentclass[12pt]{article}
> \usepackage[utf8]{inputenc}
> \usepackage[T1]{fontenc}
> \usepackage{lmodern}
> \begin{document}
> \thispagestyle{empty}
> Test: « don't ».
> \end{document}

This document and copy and paste of its content does work fine for me
with
* current sid
* latex/dvips/ps2pdf
* pdflatex

Generated pdf file can be copy/pasted.

I really don't see what is going on, thanks for any explanations.

Best

Norbert

--
PREINING Norbert  https://www.preining.info
Fujitsu Research  +  IFMGA Guide  +  TU Wien  +  TeX Live  + Debian Dev
GPG: 0x860CDC13   fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13



Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-01 Thread Vincent Lefevre
On 2021-10-01 16:52:13 +0200, Jonas Smedegaard wrote:
> Quoting Vincent Lefevre (2021-10-01 15:53:38)
> > It seems that the issue partly comes from pdflatex: On an old file for 
> > which ps2pdf was correct with ghostscript 9.53.3~dfsg-4, it is now 
> > incorrect still with ghostscript 9.53.3~dfsg-4. But if I regenerate 
> > the intermediate PDF file on an old Debian machine and transfer it to 
> > my current machine, ps2pdf is correct with ghostscript 9.53.3~dfsg-4 
> > and with ghostscript 9.53.3~dfsg-7 (stable), and also with ghostscript 
> > 9.54.0~dfsg-5.
> 
> Are you sure you mean 9.53.3~dfsg-7, not 9.53.3~dfsg-7+deb11u1?

Yes, I used "apt .../stable", and it was 9.53.3~dfsg-7 that was
fetched, not the security update.

zira:~> apt-show-versions -a ghostscript
ghostscript:amd64 9.54.0~dfsg-5 install ok installed
ghostscript:amd64 9.53.3~dfsg-7 stable  ftp.debian.org
ghostscript:amd64 9.53.3~dfsg-7+deb11u1 stable-security security.debian.org
No stable-updates version
ghostscript:amd64 9.54.0~dfsg-5 testing ftp.debian.org
ghostscript:amd64 9.54.0~dfsg-5 unstableftp.debian.org
ghostscript:amd64 9.55.0~~rc1~dfsg-1experimentalftp.debian.org
ghostscript:amd64/testing 9.54.0~dfsg-5 uptodate

Perhaps I should have used "/stable-security".

> Some upstream changes was backported for -7 and other changes was 
> introduced by -7+deb11u1: 
> https://tracker.debian.org/media/packages/g/ghostscript/changelog-9.53.3dfsg-7deb11u1
> 
> Possibly related to the recent changes to Ghostscripts SAFER: 
> https://www.ghostscript.com/doc/9.55.0/Use.htm#Safer
> 
> Perhaps recent pdflatex was adapted to handle the change to SAFER, and 
> in doing so became dependent on recent Ghostscript (and perhaps that was 
> then not reflected in packaging of pdflatex)?

Anyway, I doubt that this is related to the font / mapping issue.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-01 Thread Jonas Smedegaard
Quoting Vincent Lefevre (2021-10-01 15:53:38)
> On 2021-10-01 14:31:57 +0200, Vincent Lefevre wrote:
> > On 2021-10-01 14:26:02 +0200, Vincent Lefevre wrote:
> > > On 2021-10-01 14:17:53 +0200, Vincent Lefevre wrote:
> > > > In my archives, I can see that the issue occurred with 
> > > > ghostscript 9.26a~dfsg-0+deb9u1, but 9.27~dfsg-2+deb10u4 isn't 
> > > > affected on my second testcase.
> > > 
> > > The following PDF file (on which I got the issue with ghostscript 
> > > 9.26a~dfsg-0+deb9u1) may be a useful testcase:
> > > 
> > >   https://hal.archives-ouvertes.fr/hal-02001080v1/document
> > 
> > On this testcase, the issue is actually reproducible with 
> > ghostscript 9.27~dfsg-2+deb10u4!
> 
> It seems that the issue partly comes from pdflatex: On an old file for 
> which ps2pdf was correct with ghostscript 9.53.3~dfsg-4, it is now 
> incorrect still with ghostscript 9.53.3~dfsg-4. But if I regenerate 
> the intermediate PDF file on an old Debian machine and transfer it to 
> my current machine, ps2pdf is correct with ghostscript 9.53.3~dfsg-4 
> and with ghostscript 9.53.3~dfsg-7 (stable), and also with ghostscript 
> 9.54.0~dfsg-5.

Are you sure you mean 9.53.3~dfsg-7, not 9.53.3~dfsg-7+deb11u1?

Some upstream changes was backported for -7 and other changes was 
introduced by -7+deb11u1: 
https://tracker.debian.org/media/packages/g/ghostscript/changelog-9.53.3dfsg-7deb11u1

Possibly related to the recent changes to Ghostscripts SAFER: 
https://www.ghostscript.com/doc/9.55.0/Use.htm#Safer

Perhaps recent pdflatex was adapted to handle the change to SAFER, and 
in doing so became dependent on recent Ghostscript (and perhaps that was 
then not reflected in packaging of pdflatex)?


 - Jonas

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private

signature.asc
Description: signature


Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-01 Thread Vincent Lefevre
On 2021-10-01 14:31:57 +0200, Vincent Lefevre wrote:
> On 2021-10-01 14:26:02 +0200, Vincent Lefevre wrote:
> > On 2021-10-01 14:17:53 +0200, Vincent Lefevre wrote:
> > > In my archives, I can see that the issue occurred with
> > > ghostscript 9.26a~dfsg-0+deb9u1, but 9.27~dfsg-2+deb10u4
> > > isn't affected on my second testcase.
> > 
> > The following PDF file (on which I got the issue with
> > ghostscript 9.26a~dfsg-0+deb9u1) may be a useful testcase:
> > 
> >   https://hal.archives-ouvertes.fr/hal-02001080v1/document
> 
> On this testcase, the issue is actually reproducible with
> ghostscript 9.27~dfsg-2+deb10u4!

It seems that the issue partly comes from pdflatex: On an old
file for which ps2pdf was correct with ghostscript 9.53.3~dfsg-4,
it is now incorrect still with ghostscript 9.53.3~dfsg-4. But if
I regenerate the intermediate PDF file on an old Debian machine
and transfer it to my current machine, ps2pdf is correct with
ghostscript 9.53.3~dfsg-4 and with ghostscript 9.53.3~dfsg-7
(stable), and also with ghostscript 9.54.0~dfsg-5.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-01 Thread Vincent Lefevre
On 2021-10-01 14:26:02 +0200, Vincent Lefevre wrote:
> On 2021-10-01 14:17:53 +0200, Vincent Lefevre wrote:
> > In my archives, I can see that the issue occurred with
> > ghostscript 9.26a~dfsg-0+deb9u1, but 9.27~dfsg-2+deb10u4
> > isn't affected on my second testcase.
> 
> The following PDF file (on which I got the issue with
> ghostscript 9.26a~dfsg-0+deb9u1) may be a useful testcase:
> 
>   https://hal.archives-ouvertes.fr/hal-02001080v1/document

On this testcase, the issue is actually reproducible with
ghostscript 9.27~dfsg-2+deb10u4!

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-01 Thread Vincent Lefevre
On 2021-10-01 14:17:53 +0200, Vincent Lefevre wrote:
> In my archives, I can see that the issue occurred with
> ghostscript 9.26a~dfsg-0+deb9u1, but 9.27~dfsg-2+deb10u4
> isn't affected on my second testcase.

The following PDF file (on which I got the issue with
ghostscript 9.26a~dfsg-0+deb9u1) may be a useful testcase:

  https://hal.archives-ouvertes.fr/hal-02001080v1/document

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-01 Thread Vincent Lefevre
In my archives, I can see that the issue occurred with
ghostscript 9.26a~dfsg-0+deb9u1, but 9.27~dfsg-2+deb10u4
isn't affected on my second testcase.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-01 Thread Vincent Lefevre
On 2021-10-01 12:05:28 +0200, Vincent Lefevre wrote:
> Well, with 9.53.3~dfsg-8, I can reproduce the bug on another PDF file,
> where it is the U+2019 RIGHT SINGLE QUOTATION MARK character (used as
> an apostrophe) that is incorrectly replaced by Š. I'll have to make
> another simple testcase.

The LaTeX source generating the testcase:

\documentclass[12pt]{article}
\usepackage{lmodern}
\begin{document}
\thispagestyle{empty}
Don't ff.
\end{document}

I've attached this testcase "chartest2.pdf", and the incorrect file
"chartest2-gs.pdf" obtained with

  ps2pdf chartest2.pdf chartest2-gs.pdf

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


chartest2.pdf
Description: Adobe PDF document


chartest2-gs.pdf
Description: Adobe PDF document


Bug#995392: ghostscript: ps2pdf trashes some characters

2021-10-01 Thread Vincent Lefevre
Control: found -1 9.53.3~dfsg-8

On 2021-09-30 22:00:47 +, JustAnotherArchivist wrote:
> Apologies, I somehow missed the part about pdftotext and the glyph's normal
> appearance in your original message. I can reproduce that with both files
> produced by 9.54.0~dfsg-5 but *not* the one produced by 9.53.3~dfsg-8
> (attached for reference), using the same pdftotext version (poppler-utils
> 20.09.0-3.1) for all files.

Well, with 9.53.3~dfsg-8, I can reproduce the bug on another PDF file,
where it is the U+2019 RIGHT SINGLE QUOTATION MARK character (used as
an apostrophe) that is incorrectly replaced by Š. I'll have to make
another simple testcase.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995392: ghostscript: ps2pdf trashes some characters

2021-09-30 Thread Vincent Lefevre
Control: tags -1 upstream
Control: forwarded -1 https://bugs.ghostscript.com/show_bug.cgi?id=704478

On 2021-09-30 18:49:02 +0200, Jonas Smedegaard wrote:
> Quoting Vincent Lefevre (2021-09-30 18:28:51)
> > On 2021-09-30 17:18:46 +0200, Jonas Smedegaard wrote:
> > > This seems an upstream bug, and it would be helpful if you report it 
> > > upstream as well.  Their bugtracker is at https://bugs.ghostscript.com/
> > 
> > OK. I'll do it tonight (I could also try to find the cause).

I've identified the commit that introduced the issue (though I'm not
sure whether the bug could be already present on other kinds of text)
and reported the bug upstream with the details (see above URL).

> Also, you could test against the newer pre-release in experimental.

Not tried, but the issue is still present in master.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995392: ghostscript: ps2pdf trashes some characters

2021-09-30 Thread JustAnotherArchivist

Control: notfound -1 9.53.3~dfsg-8

Apologies, I somehow missed the part about pdftotext and the glyph's 
normal appearance in your original message. I can reproduce that with 
both files produced by 9.54.0~dfsg-5 but *not* the one produced by 
9.53.3~dfsg-8 (attached for reference), using the same pdftotext version 
(poppler-utils 20.09.0-3.1) for all files.




chartest-gs-jaa-9.53.3.pdf
Description: Adobe PDF document


Bug#995392: ghostscript: ps2pdf trashes some characters

2021-09-30 Thread JustAnotherArchivist

Hi Vincent,

For what it's worth, I do not see the corruption you're describing with 
`gv chartest-gs.pdf` nor when converting it myself from your input file 
using versions 9.53.3~dfsg-8 or 9.54.0~dfsg-5.


I noticed that your file used a different internal conversion command 
compared to when I try it with ps2pdf 9.54.0~dfsg-5:


Yours: %%Invocation: path/gs -dPrinted=false -P- -dSAFER 
-dCompatibilityLevel=1.5 -q -P- -dNOPAUSE -dBATCH -sDEVICE=pdfwrite 
-sstdout=? -sOutputFile=? -P- -dSAFER -dCompatibilityLevel=1.5 ?
Mine: %%Invocation: path/gs -P- -dSAFER -dCompatibilityLevel=1.4 -q -P- 
-dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sstdout=? -sOutputFile=? -P- 
-dSAFER -dCompatibilityLevel=1.4 ?


Invoking it like your command manually did not make a difference for me 
though, with the output file being identical except for the expected 
differences in the version string, timestamps, and UUIDs.


I have attached my `ps2pdf chartest.pdf chartest-gs-jaa.pdf` output file 
(created with 9.54.0~dfsg-5).


Cheers,
JustAnotherArchivist



chartest-gs-jaa.pdf
Description: Adobe PDF document


Bug#995392: ghostscript: ps2pdf trashes some characters

2021-09-30 Thread Jonas Smedegaard
Quoting Vincent Lefevre (2021-09-30 18:28:51)
> On 2021-09-30 17:18:46 +0200, Jonas Smedegaard wrote:
> > This seems an upstream bug, and it would be helpful if you report it 
> > upstream as well.  Their bugtracker is at https://bugs.ghostscript.com/
> 
> OK. I'll do it tonight (I could also try to find the cause).

Great. Thanks!

Also, you could test against the newer pre-release in experimental.


 - Jonas

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private

signature.asc
Description: signature


Bug#995392: ghostscript: ps2pdf trashes some characters

2021-09-30 Thread Vincent Lefevre
On 2021-09-30 17:18:46 +0200, Jonas Smedegaard wrote:
> This seems an upstream bug, and it would be helpful if you report it 
> upstream as well.  Their bugtracker is at https://bugs.ghostscript.com/

OK. I'll do it tonight (I could also try to find the cause).

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#995392: ghostscript: ps2pdf trashes some characters

2021-09-30 Thread Jonas Smedegaard
Hi Vincent,

Quoting Vincent Lefevre (2021-09-30 16:53:01)
> The ps2pdf trashes some characters, making text non-searchable and 
> partly unreadable via pdftotext (even though the glyph appears to be 
> OK). There was no such issue in the recent past.

Thanks for reporting this!

This seems an upstream bug, and it would be helpful if you report it 
upstream as well.  Their bugtracker is at https://bugs.ghostscript.com/


Kind regards,

 - Jonas

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private

signature.asc
Description: signature


Bug#995392: ghostscript: ps2pdf trashes some characters

2021-09-30 Thread Vincent Lefevre
Package: ghostscript
Version: 9.54.0~dfsg-5
Severity: grave
Justification: causes non-serious data loss

The ps2pdf trashes some characters, making text non-searchable and
partly unreadable via pdftotext (even though the glyph appears to
be OK). There was no such issue in the recent past.

LaTeX source to generate the PDF testcase:

\documentclass[12pt]{article}
\usepackage[T1]{fontenc}
\begin{document}
\thispagestyle{empty}
Test: float.
\end{document}

to be compiled with pdflatex.

I've attached 2 files:
  * chartest.pdf (testcase generated by pdflatex).
  * chartest-gs.pdf, which is the buggy result obtained with
"ps2pdf chartest.pdf chartest-gs.pdf".

chartest.pdf contains the text "Test: float." as expected.
But chartest-gs.pdf contains the text "Test: ŕoat.", which
is incorrect: "fl" has been replaced by "ŕ".

Removing "\usepackage[T1]{fontenc}" or the period after "float" makes
this issue disappear.

-- System Information:
Debian Release: bookworm/sid
  APT prefers unstable-debug
  APT policy: (500, 'unstable-debug'), (500, 'stable-updates'), (500, 
'stable-security'), (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 
'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.14.0-1-amd64 (SMP w/12 CPU threads)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, 
TAINT_UNSIGNED_MODULE
Locale: LANG=POSIX, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages ghostscript depends on:
ii  libc6   2.32-4
ii  libgs9  9.54.0~dfsg-5

ghostscript recommends no packages.

Versions of packages ghostscript suggests:
ii  ghostscript-x  9.54.0~dfsg-5

-- no debconf information

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


chartest.pdf
Description: Adobe PDF document


chartest-gs.pdf
Description: Adobe PDF document