Hi Daniel, the command you are using extracts images contaoined in the PDF but doesn't render the PDF into an Image.
Use https://pdfbox.apache.org/2.0/commandline.html#pdftoimage BR Maruan Am Dienstag, dem 02.08.2022 um 15:31 +0000 schrieb Daniel Earwicker: > Hi, this project looks perfect for my needs - converting PDF pages > into images for easy rendering elsewhere. This is very much my first > try so apologies in advance if this is a stupid question, but in the > docs at https://pdfbox.apache.org/2.0/commandline.html I can't see > any options that might improve the output. > > Here's a side-by-side comparison, ExtractImages output on the left, > and the PDF opened in chrome on the right: > > https://imgur.com/a/KgNAZQ2 > > The PDF is an example I got from: > https://www.ets.org/Media/Tests/GRE/pdf/gre_research_validity_data.pdf > > Just in case this is relevant, I ran it a clean debian container: > > docker run -it -v c:/Users/me:/external debian:bullseye-slim > > apt update > apt install openjdk-17-jre -y > apt install wget -y > wget https://dlcdn.apache.org/pdfbox/2.0.26/pdfbox-app-2.0.26.jar > > and then tested with: > > java -jar pdfbox-app-2.0.26.jar ExtractImages -prefix > /external/extract-test /external/gre_research_validity_data.pdf > > The screenshot is of the resulting extract-test-2.jpg file. > > There's obviously some problem with the colours, and also there's a > lot of extra stuff in the page margins that Chrome somehow knows it > ought to hide. Is there any way to configure this extraction process > so the image to look like how Chrome displays it? And for this kind > of accurate rendering to work for the majority of PDFs? (this being > the first one I tried). Thanks! > This email is from FISCAL Technologies Limited, a company registered > in England and Wales with company number 4801836, whose registered > office is at 448 Basingstoke Road, Reading, RG2 0LP, United Kingdom. > This notice applies to this email and to any other email subsequently > sent by anyone at FISCAL Technologies Limited and appearing in the > same chain of email correspondence. References below to "this email" > should be read accordingly. The contents of this email and any > attachments (if any) are private and confidential. If you have > received this message in error, please notify us immediately by > returning it to the sender or call our switchboard on +44 (0) 845 680 > 1905 and remove it from your system, do not use, copy or disclose it. > The opinions expressed within this communication are not necessarily > those expressed by FISCAL Technologies Limited. Emails are not secure > and may contain viruses and it is your responsibility to scan > attachments (if any). The e-mail system of FISCAL Technologies > Limited is subject to random monitoring. For information about how we > use your personal data (including your rights) please see our privacy > policy - https://www.fiscaltec.com/uk/general/privacy-policy/ > Visit our website at www.fiscaltec.co.uk<http://www.fiscaltec.co.uk> -- -- Maruan Sahyoun FileAffairs GmbH Josef-Schappe-Straße 21 40882 Ratingen Tel: +49 (2102) 89497 88 Fax: +49 (2102) 89497 91 sahy...@fileaffairs.de www.fileaffairs.de Geschäftsführer: Maruan Sahyoun Handelsregister: AG Düsseldorf, HRB 53837 UST.-ID: DE248275827 --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org