Hi all.

The attached PDF displays just a single word which includes
an accented character. It was created using (La)TeX as:  f\"ur

Within the PDF it appears as a stream:

stream
BT
/F15 10.9091 Tf 108.737 686 Td [(fu)556(^?r)]TJ
ET
endstream


Attachment: testaccents.pdf
Description: Adobe PDF document



The diaeresis accent is encoded as follows:

/Encoding 256 array
0 1 255 {1 index exch /.notdef put} for
dup 127 /dieresis put
dup 102 /f put
dup 114 /r put
dup 117 /u put
readonly def

and has a corresponding CMap entry

<7F> <0308>

mapping to the "combining diaeresis" character.


The result of extracting text using  pdftotext  is interesting.

This is "correct", using the -raw  option:

rossmoor% pdftotext -layout -raw testaccents.pdf
rossmoor% more testaccents.txt
fu<CC><88>r
^L

... but it comes out wrong with default options:

rossmoor% pdftotext testaccents.pdf
rossmoor% more testaccents.txt
fur <CC><88>

^L

rossmoor% pdftotext -layout testaccents.pdf
rossmoor% more testaccents.txt
fur
 <CC><88>
^L


The man page says:

-raw Keep the text in content stream order. This is a hack which often "undoes" column formatting, etc. Use of raw mode is no
        longer recommended.

Yet it is precisely use of  -raw  which gets this situation correct.

So my questions are:

   Why is the accent wrongly placed, by default?
   What makes it go to after the containing word, or next line ?

   If  -raw  is not recommended for bad effects in some situations,
   what is the replacement for when it *is* appropriate ?


Thanks in advance for any help in getting this fixed.


Regards,

        Ross

------------------------------------------------------------------------
Ross Moore                                       [EMAIL PROTECTED]
Mathematics Department                           office: E7A-419
Macquarie University                             tel: +61 (0)2 9850 8955
Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
------------------------------------------------------------------------



_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to