RE: Mismatch between XeLaTeX fontspec and Apache PDFBox

Flynn, Peter Mon, 29 Jan 2018 02:04:20 -0800

Sorry, forgot to edit off the stuff at the top.

P


--
Peter Flynn | Academic & Collaborative Technologies | University College Cork 
IT Services | ☎ +353 21 490 2609 | ✉ [email protected]<mailto:[email protected]> | 🌍 
www.ucc.ie<http://www.ucc.ie>



On 2018-01-29 10:02:29+00:00 Flynn, Peter wrote:


/bin/java -jar /usr/local/src/pdfbox-app-1.8.4.jar \
                    ExtractText -html -force V$pubno-crop.pdf \
                    V$pubno-crop.html
--
Peter Flynn | Academic & Collaborative Technologies | University College Cork 
IT Services | ☎ +353 21 490 2609 | ✉ [email protected]<mailto:[email protected]> | 🌍 
www.ucc.ie<http://www.ucc.ie>



On 2018-01-28 12:30:58+00:00 Tilman Hausherr wrote:

Hi,
I can only answer about PDFBox... no PDF has anything bold. Both have
something italic.

Yes, sorry about that. I picked an example that only has italic.

The PDF without fontspec doesn't have the "é".

Correct. But it does convert with PDFBox and identifies the italics.

The PDF with fontspec can be converted to HTML with "ExtractText -html"

I convert with the command

/bin/java -jar /usr/local/src/pdfbox-app-1.8.4.jar ExtractText -html -force 
filename.pdf filename.html

and the results were as given in thre .zip file: no italics. What version are 
you using?

P

RE: Mismatch between XeLaTeX fontspec and Apache PDFBox

Reply via email to