Re: problem mouse copy/past from PDF

2016-09-21 Thread David Wright
On Thu 22 Sep 2016 at 00:04:07 (-0500), David Wright wrote:
> I'm always on the lookout for a "pa→pdf" successor to pa→ps, but
> see no sign of it on the horizon.

Or perhaps I do. I need to check out
https://github.com/dov/paps
but the night has drawn in...

Cheers,
David.



Re: problem mouse copy/past from PDF

2016-09-21 Thread David Wright
On Wed 21 Sep 2016 at 20:37:28 (+0100), Brian wrote:
> On Tue 20 Sep 2016 at 15:08:58 +0100, Brian wrote:
> 
> > On Mon 19 Sep 2016 at 22:41:23 -0500, David Wright wrote:
> > 
> > > On Sun 18 Sep 2016 at 16:14:37 (-0400), Haines Brown wrote:
> > > > I've begun to experience problems using the mouse to select a passage in
> > > > a PDF displayed with xpdf 3.03-10 in order to paste it elsewhere.
> > > > 
> > > > The ends of lines are truncated to varying degrees. For example in a
> > > > PDF with this:
> > > > 
> > > >   123456789
> > > >   123456789
> > > >   1234567
> > > > 
> > > > The past might look like
> > > > 
> > > >   12345678
> > > >   1234567
> > > >   123456
> > > 
> > > Can you confirm that dragging your mouse produces a black rectangle,
> > > and that the rectangle has the last digits (the ones that get lost)
> > > highlighted thus.
> > 
> > Could be a possible cause. My mouse skills aren't brilliant and not
> > precisely positioning the rectangle has often lead to my having to redo
> > the copying.
> 
> The OP appears to have totally lost interest in his own question and the
> reponses to it but the ins and outs of copying from a PDF get more
> intriguing.
> 
> I own up to being quite cavalier in dragging the mouse to produce a
> black rectangle to be copied. The positioning of *all* sides of the
> rectangle in mupdf seems somewhat critical, however.

Ditto, but I don't attempt any precision with the mouse (unlike using
scrot, for example); I just grab a large chuck and then sort it all
out in an emacs buffer.

> I have a PDF which on the screen displays
> 
>   If You Hear 
> means that the command you have entered 
> has been recognised as being valid (correct), 
>   i.e. you entered # 0 *
> 
> If I postition the black rectangle to just about cover what is on the
> screen (or a little bit less at top and bottom) the text copies as
> 
>   If You Hear e
>   e means that the command you have entered has been recognised as being 
> valid (correct), 
>   i.e. you entered # 0 ..
> 
> (The font for the musical note is embedded in the PDF but has no
> ToUnicode map. It comes up as "e").
> 
> If the lower boundary is a smidgeon (5 or so pixels) down it picks up
> the following line too. That, of course, doesn't explain the OP's
> observation but it does not appear we are going to progress beyond that
> initial post.
> 
>   If You Hear e
>   e means that the command you have entered has been recognised as being 
> valid (correct), 
>   i.e. you entered # 0 ..
>   If You Hear ee
> 
> ("ee" is two quavers).

For xpdf, my experience is similar. I assume you're lifting text that
happens to have musical glyphs in it. I lift text from actual music,
and the text is usable. However, any Unicode characters are displayed
as the individual bytes making them up. The music produces mainly
blanks with odd characters in it, usually Unicode bytes or control
characters.

> The line by line selection by evince appears to be less error-prone in
> terms of text copying.

Lifting the text from the same music in evince, it's almost impossible
to select more than one line without the selection expanding uncontrollably.
The result, when pasted, is short lines of fragments of text in
random order: completely unusable. In Unicode, however.

Mupdf is no better: usually a single syllable to each line, with lots
of ? lines respresenting the music. Unicode again. I haven't tried
being selective with the rectangle, just taking the lot.

(The music in question is LilyPond output.)

Cheers,
David.



Re: problem mouse copy/past from PDF

2016-09-21 Thread David Wright
On Wed 21 Sep 2016 at 18:13:18 (+0200), to...@tuxteam.de wrote:
> On Wed, Sep 21, 2016 at 11:38:41AM +0100, Brian wrote:
> 
> [...]
> 
> > Although it is a different topic
> > 
> >  
> > http://stackoverflow.com/questions/26066535/ps2pdf-creates-a-very-big-pdf-file-from-paps-created-ps-file
> > 
> > backs up your "pretty funny" feeling. KenS is a Ghostscript developer.
> 
> Thanks for the link.
> 
> From a cursory look at the .ps I had that impression:
> 
>   "The problem is the paps file, it doesn't actually contain any text
>at all, in a PostScript sense.
> 
>Each character is stored as a procedure, where a path is drawn and
>then filled. This is NOT stored in a font, just in a dictionary."
> 
> so paps basically "paints" the text. 
> 
> Yikes. I still hoped to be wrong :-(
> 
> > Maybe this new version does not fix mouse copying from a PDF generated
> > from paps' PS but it isn't in unstable anyway. (Furthermore, paps isn't
> > in testing due to a FTBFS).
> 
> Let's hope. In the meantime use a2ps (but I don't know how well that
> handles Unicode/UTF-8). Perhaps paps's author had a strong reason to
> do it that way.

Yes, Brian, thanks for the link. One reading of
http://www.tldp.org/HOWTO/Unicode-HOWTO-5.html
suggests that paps may be doing it the only way possible,
apart from the duplication of each occurrence of a glyph.
I've just installed uniprint and I can't see that it does
anything different, except produce a much larger file.

$ paps --font="Freemono 10" --left-margin=54 --top-margin=54 --paper letter \
 UNICODE-chars.txt > unicode-chars.ps
$ uniprint -out unicode-chars-uniprint.ps -in UNICODE-chars.txt \
 -font /usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf
uniprint: printed 17 pages.

$ ps2pdf unicode-chars.ps unicode-chars.pdf
$ ps2pdf unicode-chars-uniprint.ps unicode-chars-uniprint.pdf

45395 UNICODE-chars.txt
  7344022 unicode-chars.ps
  3457331 unicode-chars.pdf
 27775050 unicode-chars-uniprint.ps
  7993797 unicode-chars-uniprint.pdf

The comparison is as fair as I can make it: paps opens three
font files AFAICT: FreeMono.otf, DejaVuSansMono.ttf and
DejaVuSansMono-Bold.ttf, so I gave uniprint the second one.

I gave up using a2ps at least a decade ago because it can't handle
Unicode; AFAIK there's no sign of its adopting it at all.
I'm always on the lookout for a "pa→pdf" successor to pa→ps, but
see no sign of it on the horizon.

Cheers,
David.



Re: problem mouse copy/past from PDF

2016-09-21 Thread Brian
On Tue 20 Sep 2016 at 15:08:58 +0100, Brian wrote:

> On Mon 19 Sep 2016 at 22:41:23 -0500, David Wright wrote:
> 
> > On Sun 18 Sep 2016 at 16:14:37 (-0400), Haines Brown wrote:
> > > I've begun to experience problems using the mouse to select a passage in
> > > a PDF displayed with xpdf 3.03-10 in order to paste it elsewhere.
> > > 
> > > The ends of lines are truncated to varying degrees. For example in a
> > > PDF with this:
> > > 
> > >   123456789
> > >   123456789
> > >   1234567
> > > 
> > > The past might look like
> > > 
> > >   12345678
> > >   1234567
> > >   123456
> > 
> > Can you confirm that dragging your mouse produces a black rectangle,
> > and that the rectangle has the last digits (the ones that get lost)
> > highlighted thus.
> 
> Could be a possible cause. My mouse skills aren't brilliant and not
> precisely positioning the rectangle has often lead to my having to redo
> the copying.

The OP appears to have totally lost interest in his own question and the
reponses to it but the ins and outs of copying from a PDF get more
intriguing.

I own up to being quite cavalier in dragging the mouse to produce a
black rectangle to be copied. The positioning of *all* sides of the
rectangle in mupdf seems somewhat critical, however.

I have a PDF which on the screen displays

  If You Hear 
means that the command you have entered 
has been recognised as being valid (correct), 
  i.e. you entered # 0 *

If I postition the black rectangle to just about cover what is on the
screen (or a little bit less at top and bottom) the text copies as

  If You Hear e
  e means that the command you have entered has been recognised as being valid 
(correct), 
  i.e. you entered # 0 ..

(The font for the musical note is embedded in the PDF but has no
ToUnicode map. It comes up as "e").

If the lower boundary is a smidgeon (5 or so pixels) down it picks up
the following line too. That, of course, doesn't explain the OP's
observation but it does not appear we are going to progress beyond that
initial post.

  If You Hear e
  e means that the command you have entered has been recognised as being valid 
(correct), 
  i.e. you entered # 0 ..
  If You Hear ee

("ee" is two quavers).

The line by line selection by evince appears to be less error-prone in
terms of text copying.

Probably nothing to do with the OP's issue but merely an indication of
another user's experience. All very inconsequential and probably of no
importance but it passes the time as the nights draw in.

-- 
Brian.





Re: problem mouse copy/past from PDF

2016-09-21 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, Sep 21, 2016 at 11:38:41AM +0100, Brian wrote:

[...]

> Although it is a different topic
> 
>  
> http://stackoverflow.com/questions/26066535/ps2pdf-creates-a-very-big-pdf-file-from-paps-created-ps-file
> 
> backs up your "pretty funny" feeling. KenS is a Ghostscript developer.

Thanks for the link.

- From a cursory look at the .ps I had that impression:

  "The problem is the paps file, it doesn't actually contain any text
   at all, in a PostScript sense.

   Each character is stored as a procedure, where a path is drawn and
   then filled. This is NOT stored in a font, just in a dictionary."

so paps basically "paints" the text. 

Yikes. I still hoped to be wrong :-(

> Maybe this new version does not fix mouse copying from a PDF generated
> from paps' PS but it isn't in unstable anyway. (Furthermore, paps isn't
> in testing due to a FTBFS).

Let's hope. In the meantime use a2ps (but I don't know how well that
handles Unicode/UTF-8). Perhaps paps's author had a strong reason to
do it that way.

regards
- -- t
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlfisZ4ACgkQBcgs9XrR2kY+MgCfUsaNI7JFPUhwjbks/li+XC6l
VtQAn3gcMYQQ8irb1YuRi1EpyPCfaWOD
=HGZ0
-END PGP SIGNATURE-



Re: problem mouse copy/past from PDF

2016-09-21 Thread Brian
On Tue 20 Sep 2016 at 21:05:29 +0200, to...@tuxteam.de wrote:

> On Tue, Sep 20, 2016 at 06:30:10PM +0100, Brian wrote:
> > 
> > Selection of text from a pdf isn't always possible with evince. Example:
> > 
> >   paps /etc/nssitch.conf > nsswitch.ps
> ^^^ probably typo
> >   ps2pdf nsswitch.ps nsswitch.pdf
> > 
> > When it is selectable it isn't necessarily capable of being copied.
> > 
> > Isn't life confusing?
> 
> I can confirm that a pdf produced this way has no usable text to
> select (tested here with xpdf). If you generate the .ps with a2ps
> 
>   a2ps /etc/nsswitch.conf -o nsswitch.ps
> 
> then "it works". Thus, it seems to be paps who's doing something
> strange (and in fact, looking at the Postscript file yields something
> pretty funny. Paps seems to be out-smarting itself.
> 
> Note that a Postscript (or a PDF) can render something which *looks*
> like text, but for all purposes *isn't* a text (for an extreme case,
> think a bitmap image of a text).

Although it is a different topic

 
http://stackoverflow.com/questions/26066535/ps2pdf-creates-a-very-big-pdf-file-from-paps-created-ps-file

backs up your "pretty funny" feeling. KenS is a Ghostscript developer.

  > The problem is the paps file, it doesn't actually contain any
  > text at all, in a PostScript sense.

The response was

  > As the author of paps, I agree with the above description of
  > paps' inner workings. Indeed, I chose to create my own font
  > mechanism in the postscript language. That is history though
  > as I have just released a new version of paps that uses cairo
  > for its postscript, pdf, or svg rendering.

Maybe this new version does not fix mouse copying from a PDF generated
from paps' PS but it isn't in unstable anyway. (Furthermore, paps isn't
in testing due to a FTBFS).

-- 
Brian.



Re: problem mouse copy/past from PDF

2016-09-20 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tue, Sep 20, 2016 at 11:01:09AM -0500, David Wright wrote:

[...]

> Sorry, missed out a step. The paps output is filtered through ps2pdf
> so that could explain a lot. Thanks for reminding me. (The clue is in
> the name!)

See above. I can reproduce that (with xpdf as viewer). If you use a2ps
instead of paps, the text is there. It's paps who's producing a strange
Postscript.

regards
- -- t
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlfhiS0ACgkQBcgs9XrR2kbwpgCfe/OOpDcbSAnWS33n9dvBfQi9
WYkAnj80TI1QLQkYlhuM2bp7J2C6A3+P
=sc51
-END PGP SIGNATURE-



Re: problem mouse copy/past from PDF

2016-09-20 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tue, Sep 20, 2016 at 06:30:10PM +0100, Brian wrote:
> On Tue 20 Sep 2016 at 16:12:35 +0100, Lisi Reisz wrote:
> 
> > On Monday 19 September 2016 14:15:03 Celejar wrote:
> > > On Sun, 18 Sep 2016 16:14:37 -0400
> > > Haines Brown  wrote:
> > >
> > > ...
> > >
> > > > Evince apparently does not support selecting text for copying. This does
> > >
> > > My evince (3.14.1, from  3.14.1-2+deb8u1) does support selecting text
> > > for copying.
> > >
> > > Celejar
> > 
> > Mine too.  But then I also have 3.14.1-2+deb8u1.
> 
> Selection of text from a pdf isn't always possible with evince. Example:
> 
>   paps /etc/nssitch.conf > nsswitch.ps
^^^ probably typo
>   ps2pdf nsswitch.ps nsswitch.pdf
> 
> When it is selectable it isn't necessarily capable of being copied.
> 
> Isn't life confusing?

I can confirm that a pdf produced this way has no usable text to
select (tested here with xpdf). If you generate the .ps with a2ps

  a2ps /etc/nsswitch.conf -o nsswitch.ps

then "it works". Thus, it seems to be paps who's doing something
strange (and in fact, looking at the Postscript file yields something
pretty funny. Paps seems to be out-smarting itself.

Note that a Postscript (or a PDF) can render something which *looks*
like text, but for all purposes *isn't* a text (for an extreme case,
think a bitmap image of a text).

regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlfhiHkACgkQBcgs9XrR2kbeIgCfXjP4EstyiF12pOKrhSRTdfQa
FCkAn1YdHq5nqaeuzmLozFr6OTHsr2HD
=LsXh
-END PGP SIGNATURE-



Re: problem mouse copy/past from PDF

2016-09-20 Thread Brian
On Tue 20 Sep 2016 at 16:12:35 +0100, Lisi Reisz wrote:

> On Monday 19 September 2016 14:15:03 Celejar wrote:
> > On Sun, 18 Sep 2016 16:14:37 -0400
> > Haines Brown  wrote:
> >
> > ...
> >
> > > Evince apparently does not support selecting text for copying. This does
> >
> > My evince (3.14.1, from  3.14.1-2+deb8u1) does support selecting text
> > for copying.
> >
> > Celejar
> 
> Mine too.  But then I also have 3.14.1-2+deb8u1.

Selection of text from a pdf isn't always possible with evince. Example:

  paps /etc/nssitch.conf > nsswitch.ps
  ps2pdf nsswitch.ps nsswitch.pdf

When it is selectable it isn't necessarily capable of being copied.

Isn't life confusing?

-- 
Brian.




Re: problem mouse copy/past from PDF

2016-09-20 Thread Brian
On Tue 20 Sep 2016 at 11:01:09 -0500, David Wright wrote:

> Well, I did write in
> https://lists.debian.org/debian-user/2016/09/msg00653.html
> that "This is one area where a bit of experimentation will help much
> more than trying to understand the scattered documentation."

I'm unsure whether the issue is copy/paste with a mouse or the nature of
the PDF/PS file. Scattered documentation for both cases doesn't help.
 
> On Tue 20 Sep 2016 at 15:08:58 (+0100), Brian wrote:
> > On Mon 19 Sep 2016 at 22:41:23 -0500, David Wright wrote:
> > 
> > > My own experience is all or nothing. What I get correlates with the
> > > output of pdftotext; if that can extract the text, I can copy it
> > > with the mouse, if not then I can't. PDFs I produce with paps, for
> > > example, don't work: I don't know why this is the case.
> > 
> > How do you produce a PDF using paps?
> 
> Sorry, missed out a step. The paps output is filtered through ps2pdf
> so that could explain a lot. Thanks for reminding me. (The clue is in
> the name!)

I thought you had but wanted to check. Does paps produce a searchable PS
file? My quick tests with evince and okular indicate it doesn't, If not,
ps2pdf isn't likely to produce a PDF with extractable text.

> >   https://github.com/angea/PDF101/tree/master/handcoded/textextract
> > 
> > is of interest.
> 
> Useful reference, thanks.

There is much more to it than that. But it can saved for another day.

> > > My experience here is similar to xpdf but with a few differences: when
> > > it works (the same files do), the selection is line by line (ie like
> > > an xterm) rather than a strict rectangle; if it can't do it, it
> > > doesn't highlight (whereas xpdf "lies": it highlights but fails to
> > > copy); the highlighting may be coloured (white→blue, black→white) or
> > > black (which hides the text).
> > 
> > Evince seems to be aware if *all* the text is not copiable and will then
> > not allow it to be selected. It does not appear to be aware when only
> > portions of a document are not copiable/searchable and these portions
> > are selectable.
> 
> Well,   man xpdf   says baldly "Dragging the mouse with the left
> button held down will highlight an arbitrary rectangle." I guess I
> hadn't realised just how bald that rectangle can be.
> It's tedious ascertaining anything about xpdf in the "jessie period"
> because so much of it is broken; I have to repeat everything in
> wheezy to make sure the problem is ephemeral. (Will these problems
> go away?)

xpdf had some loving care in the past; it could do with more. I like the
program but these days tend to use mupdf.

-- 
Brian.



Re: problem mouse copy/past from PDF

2016-09-20 Thread David Wright
Well, I did write in
https://lists.debian.org/debian-user/2016/09/msg00653.html
that "This is one area where a bit of experimentation will help much
more than trying to understand the scattered documentation."

On Tue 20 Sep 2016 at 15:08:58 (+0100), Brian wrote:
> On Mon 19 Sep 2016 at 22:41:23 -0500, David Wright wrote:
> 
> > On Sun 18 Sep 2016 at 16:14:37 (-0400), Haines Brown wrote:
> > > I've begun to experience problems using the mouse to select a passage in
> > > a PDF displayed with xpdf 3.03-10 in order to paste it elsewhere.
> > > 
> > > The ends of lines are truncated to varying degrees. For example in a
> > > PDF with this:
> > > 
> > >   123456789
> > >   123456789
> > >   1234567
> > > 
> > > The past might look like
> > > 
> > >   12345678
> > >   1234567
> > >   123456
> > 
> > Can you confirm that dragging your mouse produces a black rectangle,
> > and that the rectangle has the last digits (the ones that get lost)
> > highlighted thus.
> 
> Could be a possible cause. My mouse skills aren't brilliant and not
> precisely positioning the rectangle has often lead to my having to redo
> the copying.
> 
> What could also be tried is a search for '123456789'. Searching is just
> another form of text extraction. If it cannot be found a string cannot
> be copied correctly after highlighting it.

That's a good idea, and it seems to correlate with pdftotext's
behaviour but is much quicker.

> > My own experience is all or nothing. What I get correlates with the
> > output of pdftotext; if that can extract the text, I can copy it
> > with the mouse, if not then I can't. PDFs I produce with paps, for
> > example, don't work: I don't know why this is the case.
> 
> How do you produce a PDF using paps?

Sorry, missed out a step. The paps output is filtered through ps2pdf
so that could explain a lot. Thanks for reminding me. (The clue is in
the name!)

> > Actually, there is a third case: the pasted text is garbage. I think
> > this happens if the fonts are stripped of unused glyphs and then
> > packed into the minimum number of fonts to save memory. I may be
> > wrong here, though.
> 
> One table in a PDF stores character shapes (glyphs). This table is used
> by mupdf (say) to draw the page. mupdf does this without knowing that it
> is text; it is interested only in the shapess.
> 
> A second table (the ToUnicode map) is used to work out what the text
> says. The first table says that first shape in the word "Debian" looks
> like a "D". The second table says that that shape has a particular
> unicode value.
> 
> A defective or missing ToUnicode map has mupdf having no idea what the
> shapes mean, although it will render them them correctly on the screen
> or in print. So it resorts to a default mapping. The result is garbage
> for copy/paste. However, it can be logical garbage; every "D" becomes
> "X", every "b" a "P" etc. When searching, the string being looked for
> will not be found. ("Debian" is "XGP?yL", for example).
> 
>   https://github.com/angea/PDF101/tree/master/handcoded/textextract
> 
> is of interest.

Useful reference, thanks.

> > > Evince apparently does not support selecting text for copying. This does
> > > not happen on other machines.
> > 
> > My experience here is similar to xpdf but with a few differences: when
> > it works (the same files do), the selection is line by line (ie like
> > an xterm) rather than a strict rectangle; if it can't do it, it
> > doesn't highlight (whereas xpdf "lies": it highlights but fails to
> > copy); the highlighting may be coloured (white→blue, black→white) or
> > black (which hides the text).
> 
> Evince seems to be aware if *all* the text is not copiable and will then
> not allow it to be selected. It does not appear to be aware when only
> portions of a document are not copiable/searchable and these portions
> are selectable.

Well,   man xpdf   says baldly "Dragging the mouse with the left
button held down will highlight an arbitrary rectangle." I guess I
hadn't realised just how bald that rectangle can be.
It's tedious ascertaining anything about xpdf in the "jessie period"
because so much of it is broken; I have to repeat everything in
wheezy to make sure the problem is ephemeral. (Will these problems
go away?)

Cheers,
David.



Re: problem mouse copy/past from PDF

2016-09-20 Thread Lisi Reisz
On Monday 19 September 2016 14:15:03 Celejar wrote:
> On Sun, 18 Sep 2016 16:14:37 -0400
> Haines Brown  wrote:
>
> ...
>
> > Evince apparently does not support selecting text for copying. This does
>
> My evince (3.14.1, from  3.14.1-2+deb8u1) does support selecting text
> for copying.
>
> Celejar

Mine too.  But then I also have 3.14.1-2+deb8u1.

Lisi



Re: problem mouse copy/past from PDF

2016-09-20 Thread Brian
On Mon 19 Sep 2016 at 22:41:23 -0500, David Wright wrote:

> On Sun 18 Sep 2016 at 16:14:37 (-0400), Haines Brown wrote:
> > I've begun to experience problems using the mouse to select a passage in
> > a PDF displayed with xpdf 3.03-10 in order to paste it elsewhere.
> > 
> > The ends of lines are truncated to varying degrees. For example in a
> > PDF with this:
> > 
> >   123456789
> >   123456789
> >   1234567
> > 
> > The past might look like
> > 
> >   12345678
> >   1234567
> >   123456
> 
> Can you confirm that dragging your mouse produces a black rectangle,
> and that the rectangle has the last digits (the ones that get lost)
> highlighted thus.

Could be a possible cause. My mouse skills aren't brilliant and not
precisely positioning the rectangle has often lead to my having to redo
the copying.

What could also be tried is a search for '123456789'. Searching is just
another form of text extraction. If it cannot be found a string cannot
be copied correctly after highlighting it.

> My own experience is all or nothing. What I get correlates with the
> output of pdftotext; if that can extract the text, I can copy it
> with the mouse, if not then I can't. PDFs I produce with paps, for
> example, don't work: I don't know why this is the case.

How do you produce a PDF using paps?

> Actually, there is a third case: the pasted text is garbage. I think
> this happens if the fonts are stripped of unused glyphs and then
> packed into the minimum number of fonts to save memory. I may be
> wrong here, though.

One table in a PDF stores character shapes (glyphs). This table is used
by mupdf (say) to draw the page. mupdf does this without knowing that it
is text; it is interested only in the shapess.

A second table (the ToUnicode map) is used to work out what the text
says. The first table says that first shape in the word "Debian" looks
like a "D". The second table says that that shape has a particular
unicode value.

A defective or missing ToUnicode map has mupdf having no idea what the
shapes mean, although it will render them them correctly on the screen
or in print. So it resorts to a default mapping. The result is garbage
for copy/paste. However, it can be logical garbage; every "D" becomes
"X", every "b" a "P" etc. When searching, the string being looked for
will not be found. ("Debian" is "XGP?yL", for example).

  https://github.com/angea/PDF101/tree/master/handcoded/textextract

is of interest.
 
> > Evince apparently does not support selecting text for copying. This does
> > not happen on other machines.
> 
> My experience here is similar to xpdf but with a few differences: when
> it works (the same files do), the selection is line by line (ie like
> an xterm) rather than a strict rectangle; if it can't do it, it
> doesn't highlight (whereas xpdf "lies": it highlights but fails to
> copy); the highlighting may be coloured (white→blue, black→white) or
> black (which hides the text).

Evince seems to be aware if *all* the text is not copiable and will then
not allow it to be selected. It does not appear to be aware when only
portions of a document are not copiable/searchable and these portions
are selectable.

-- 
Brian.



Re: problem mouse copy/past from PDF

2016-09-19 Thread Doug

On 09/20/2016 12:58 AM, Doug wrote:

On 09/19/2016 11:41 PM, David Wright wrote:

On Sun 18 Sep 2016 at 16:14:37 (-0400), Haines Brown wrote:
I've begun to experience problems using the mouse to select a 
passage in

a PDF displayed with xpdf 3.03-10 in order to paste it elsewhere.

The ends of lines are truncated to varying degrees. For example in a
PDF with this:

   123456789
   123456789
   1234567

The past might look like

   12345678
   1234567
   123456

Can you confirm that dragging your mouse produces a black rectangle,
and that the rectangle has the last digits (the ones that get lost)
highlighted thus.

My own experience is all or nothing. What I get correlates with the
output of pdftotext; if that can extract the text, I can copy it
with the mouse, if not then I can't. PDFs I produce with paps, for
example, don't work: I don't know why this is the case.

Actually, there is a third case: the pasted text is garbage. I think
this happens if the fonts are stripped of unused glyphs and then
packed into the minimum number of fonts to save memory. I may be
wrong here, though.

Evince apparently does not support selecting text for copying. This 
does

not happen on other machines.

My experience here is similar to xpdf but with a few differences: when
it works (the same files do), the selection is line by line (ie like
an xterm) rather than a strict rectangle; if it can't do it, it
doesn't highlight (whereas xpdf "lies": it highlights but fails to
copy); the highlighting may be coloured (white→blue, black→white) or
black (which hides the text).

Cheers,
David.


I just tried a copy and paste from a PDF rendered by Master PDF Editor 
3, running on PCLOS-KDE-64.
I hi-lited a balance summary of bills paid by PayPal and pasted it 
into LibreOffice Writer 5.2.
All the words were there, but the format was different--multi-line 
spaces were squeezed to one line
space, for instance. I tried pasting the copy into TextMaker 2016, a 
paid word processor from
SoftMakerOffice, with the same result. If Master PDF Editor is not in 
your repos, it is available from their

web-site. I can't swear that there's a .deb version, however.

If you want an exact copy, you may have to use a screen-capture 
program, but then you won't be able
to make any modifications to the output, except with GIMP, or 
something similar.


--doug
Replying to me: Yes, there is a .deb version on the Master PDF Editor 
website. Also, I did not mention,
but the cut and pasted file when pasted into a word processor is 
editable by that processor. Or, if you

need to edit the PDF, Master can do that.

--dm



Re: problem mouse copy/past from PDF

2016-09-19 Thread Doug

On 09/19/2016 11:41 PM, David Wright wrote:

On Sun 18 Sep 2016 at 16:14:37 (-0400), Haines Brown wrote:

I've begun to experience problems using the mouse to select a passage in
a PDF displayed with xpdf 3.03-10 in order to paste it elsewhere.

The ends of lines are truncated to varying degrees. For example in a
PDF with this:

   123456789
   123456789
   1234567

The past might look like

   12345678
   1234567
   123456

Can you confirm that dragging your mouse produces a black rectangle,
and that the rectangle has the last digits (the ones that get lost)
highlighted thus.

My own experience is all or nothing. What I get correlates with the
output of pdftotext; if that can extract the text, I can copy it
with the mouse, if not then I can't. PDFs I produce with paps, for
example, don't work: I don't know why this is the case.

Actually, there is a third case: the pasted text is garbage. I think
this happens if the fonts are stripped of unused glyphs and then
packed into the minimum number of fonts to save memory. I may be
wrong here, though.


Evince apparently does not support selecting text for copying. This does
not happen on other machines.

My experience here is similar to xpdf but with a few differences: when
it works (the same files do), the selection is line by line (ie like
an xterm) rather than a strict rectangle; if it can't do it, it
doesn't highlight (whereas xpdf "lies": it highlights but fails to
copy); the highlighting may be coloured (white→blue, black→white) or
black (which hides the text).

Cheers,
David.


I just tried a copy and paste from a PDF rendered by Master PDF Editor 
3, running on PCLOS-KDE-64.
I hi-lited a balance summary of bills paid by PayPal and pasted it into 
LibreOffice Writer 5.2.
All the words were there, but the format was different--multi-line 
spaces were squeezed to one line
space, for instance. I tried pasting the copy into TextMaker 2016, a 
paid word processor from
SoftMakerOffice, with the same result. If Master PDF Editor is not in 
your repos, it is available from their

web-site. I can't swear that there's a .deb version, however.

If you want an exact copy, you may have to use a screen-capture program, 
but then you won't be able
to make any modifications to the output, except with GIMP, or something 
similar.


--doug



Re: problem mouse copy/past from PDF

2016-09-19 Thread David Wright
On Sun 18 Sep 2016 at 16:14:37 (-0400), Haines Brown wrote:
> I've begun to experience problems using the mouse to select a passage in
> a PDF displayed with xpdf 3.03-10 in order to paste it elsewhere.
> 
> The ends of lines are truncated to varying degrees. For example in a
> PDF with this:
> 
>   123456789
>   123456789
>   1234567
> 
> The past might look like
> 
>   12345678
>   1234567
>   123456

Can you confirm that dragging your mouse produces a black rectangle,
and that the rectangle has the last digits (the ones that get lost)
highlighted thus.

My own experience is all or nothing. What I get correlates with the
output of pdftotext; if that can extract the text, I can copy it
with the mouse, if not then I can't. PDFs I produce with paps, for
example, don't work: I don't know why this is the case.

Actually, there is a third case: the pasted text is garbage. I think
this happens if the fonts are stripped of unused glyphs and then
packed into the minimum number of fonts to save memory. I may be
wrong here, though.

> Evince apparently does not support selecting text for copying. This does
> not happen on other machines.

My experience here is similar to xpdf but with a few differences: when
it works (the same files do), the selection is line by line (ie like
an xterm) rather than a strict rectangle; if it can't do it, it
doesn't highlight (whereas xpdf "lies": it highlights but fails to
copy); the highlighting may be coloured (white→blue, black→white) or
black (which hides the text).

Cheers,
David.



Re: problem mouse copy/past from PDF

2016-09-19 Thread Celejar
On Sun, 18 Sep 2016 16:14:37 -0400
Haines Brown  wrote:

...

> Evince apparently does not support selecting text for copying. This does

My evince (3.14.1, from  3.14.1-2+deb8u1) does support selecting text
for copying.

Celejar