sorry please i copied and pasted a message this message was sent by mistake
________________________________
Gönderen: Mehmet Fatih ÇİN <mfatih...@outlook.com.tr>
Gönderildi: Thursday, August 4, 2022 1:32:18 PM
Kime: users@pdfbox.apache.org <users@pdfbox.apache.org>
Konu: page margins and line spacing

Hi, this project looks perfect for my needs - converting PDF pages into images 
for easy rendering elsewhere. This is very much my first try so apologies in 
advance if this is a stupid question, but in the docs at 
https://pdfbox.apache.org/2.0/commandline.html I can't see any options that 
might improve the output.

Here's a side-by-side comparison, ExtractImages output on the left, and the PDF 
opened in chrome on the right:

https://imgur.com/a/KgNAZQ2

The PDF is an example I got from: 
https://www.ets.org/Media/Tests/GRE/pdf/gre_research_validity_data.pdf

Just in case this is relevant, I ran it a clean debian container:

    docker run -it -v c:/Users/me:/external debian:bullseye-slim

    apt update
    apt install openjdk-17-jre -y
    apt install wget -y
    wget https://dlcdn.apache.org/pdfbox/2.0.26/pdfbox-app-2.0.26.jar

and then tested with:

    java -jar pdfbox-app-2.0.26.jar ExtractImages -prefix 
/external/extract-test /external/gre_research_validity_data.pdf

The screenshot is of the resulting extract-test-2.jpg file.

There's obviously some problem with the colours, and also there's a lot of 
extra stuff in the page margins that Chrome somehow knows it ought to hide. Is 
there any way to configure this extraction process so the image to look like 
how Chrome displays it? And for this kind of accurate rendering to work for the 
majority of PDFs? (this being the first one I tried). Thanks!
This email is from FISCAL Technologies Limited, a company registered in England 
and Wales with company number 4801836, whose registered office is at 448 
Basingstoke Road, Reading, RG2 0LP, United Kingdom. This notice applies to this 
email and to any other email subsequently sent by anyone at FISCAL Technologies 
Limited and appearing in the same chain of email correspondence. References 
below to "this email" should be read accordingly. The contents of this email 
and any attachments (if any) are private and confidential. If you have received 
this message in error, please notify us immediately by returning it to the 
sender or call our switchboard on +44 (0) 845 680 1905 and remove it from your 
system, do not use, copy or disclose it. The opinions expressed within this 
communication are not necessarily those expressed by FISCAL Technologies 
Limited. Emails are not secure and may contain viruses and it is your 
responsibility to scan attachments (if any).  The e-mail system of FISCAL 
Technologies Limited is subject to random monitoring. For information about how 
we use your personal data (including your rights) please see our privacy policy 
- https://www.fiscaltec.com/uk/general/privacy-policy/
Visit our website at 
www.fiscaltec.co.uk<http://www.fiscaltec.co.uk<http://www.fiscaltec.co.uk/>>

Reply via email to