Arthur Wang has shared OneDrive files with you. To view them, click the links
below.
<https://1drv.ms/b/s!AhA_REgBppCpgQ062sb5LoKlZkC4>
[https://r1.res.office365.com/owa/prem/images/dc-pdf_20.png]<https://1drv.ms/b/s!AhA_REgBppCpgQ062sb5LoKlZkC4>
Herman & hiss - PPHI101201 - FV
1.pdf<https://1drv.ms/b/s!AhA_REgBppCpgQ062sb5LoKlZkC4>
<https://1drv.ms/u/s!AhA_REgBppCpgQ7jWRqI5BtoKiMx>
[https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!AhA_REgBppCpgQ7jWRqI5BtoKiMx>
fileListPage.png<https://1drv.ms/u/s!AhA_REgBppCpgQ7jWRqI5BtoKiMx>
<https://1drv.ms/u/s!AhA_REgBppCpgQ9zgx9cBhmI2DfH>
[https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!AhA_REgBppCpgQ9zgx9cBhmI2DfH>
downloadpage.png<https://1drv.ms/u/s!AhA_REgBppCpgQ9zgx9cBhmI2DfH>
Arthur Wang has shared a OneDrive file with you. To view it, click the link
below.
<https://1drv.ms/b/s!AhA_REgBppCpgQ062sb5LoKlZkC4>
[https://r1.res.office365.com/owa/prem/images/dc-pdf_20.png]<https://1drv.ms/b/s!AhA_REgBppCpgQ062sb5LoKlZkC4>
Herman & hiss - PPHI101201 - FV
1.pdf<https://1drv.ms/b/s!AhA_REgBppCpgQ062sb5LoKlZkC4>
[https://ipodlq.by.files.1drv.com/y4mV0VxHK2D4uxyC4TQL_Pm0kYsbjNHGUAguMLxlVa4ykUtL6nwg19za0G74IoFusv2FwwJzohKoGdIKXg_MF26eavOY6hXbddC36qMI8vALhyNSmU8cAlpuAsMwah5b5__skoa2koVvs5wP2tekcxmBSQE-KPRahIsVu6ZtVnLV_I?width=800&height=800&cropmode=none]<https://1drv.ms/b/s!AhA_REgBppCpgQ062sb5LoKlZkC4>
[https://r1.res.office365.com/owa/prem/images/dc-pdf_40.png]<https://1drv.ms/b/s!AhA_REgBppCpgQ062sb5LoKlZkC4>
Herman & hiss - PPHI101201 - FV
1.pdf<https://1drv.ms/b/s!AhA_REgBppCpgQ062sb5LoKlZkC4>
Shared via OneDrive
Tilman,
Since my email got rejected due to the size limit of apache mail server--1m. I
send it again here.
First, thank you very much for the extra information and udpate.
My application is an internal web based production system. Many designers in
our graphic department uploaded the print-ready file to the system every hours,
and other users include prepress, press, shipping, customers will log into the
system to download the files. The print-ready pdf file sometimes are extremely
big in terms of the size, 5 M to 1 G are most popular, 2G to 5G are rare, but
do happen sometimes. Please refer to the attached two screenshots(fileListPage,
downloadPage). What I am trying to do is to show a thumbnail on the
fileDownloadPage. we used to show a download icon on the download page instead
of the thumbnail, but users have to download the file to their local computer
before actually seeing it. Sometimes the fileListPage show a long list of
files, people get confused, it would be more convenient for the user to have a
peek of the file before actually download it. so it's better to have a
thumbnail on the download page. In terms of the pdf.js, I had never tried, do
you think it can load 40M or 50 M file in one or two seconds by the apache
server?
I copied my code below for you reference.(one is for testing, the other one is
for production .)
Attached you will also find a pdf file named Herman..pdf. it only has two
pages, but by only convert the first page, the best I can do it 7 seconds. that
would be very slow for web application. If by adding a GPU, the performance
could get better, I would certainly like to try, just not sure if it's going to
work.
******************below are testing code running on eclilpse
platform**************
package com.test;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.apache.pdfbox.tools.imageio.ImageIOUtil;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import org.apache.commons.lang3.time.StopWatch;
import org.apache.commons.lang3.StringUtils;
public class PdfToImage {
private static final String OUTPUT_DIR = "/Users/someone/Desktop/";
public static void main(String[] args) throws Exception{
System.setProperty("sun.java2d.cmm",
"sun.java2d.cmm.kcms.KcmsServiceProvider");
StopWatch stopwatch = new StopWatch();
stopwatch.start();
try (final PDDocument document = PDDocument.load(new
File("/Users/someone/Desktop/Herman & hiss - PPHI101201 - FV.pdf"))){
PDFRenderer pdfRenderer = new PDFRenderer(document);
pdfRenderer.setSubsamplingAllowed(true);
//for (int page = 0; page < document.getNumberOfPages(); ++page)
for (int page = 0; page < 1; ++page)
{
BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 72,
ImageType.RGB); //<--this number have performance impact
String fileName = OUTPUT_DIR + "Herman & hiss - PPHI101201 - FV" + page
+ ".jpg";
ImageIOUtil.writeImage(bim, fileName, 72); //<---this number
}
document.close();
} catch (IOException e){
System.err.println("Exception while trying to create pdf document -
" + e);
}
stopwatch.stop(); // optional
System.out.println("Time elapsed is "+ stopwatch.getTime() + "
milliseconds");
}
//test Files: Ashley NJ_HHL101125_FV.pdf, 40M, 4 pages
//try Ashley without set property: 4 pages@70074 milliseconds
//try Ashley with property set: 4 pagesQ@32552 milliseconds
//try have subSampling true set: 4 pages@9481 milliseconds
//try Herman & hiss - PPHI101201 - FV.png: two pages@14050 milliseconds
//try Herman & hiss - PPHI101201 - FV.jpg: two pages@13612 milliseconds
//try Herman: 1 page@7625
//try Ashley: 1 page@3237
//try Ashely with 72 dpi: 1 page@2807
//try Herman with 72 dpi: 1 page@6788
//try herman without subSampling true setting: 1 page@7087
}
*****************below is production code running as an action class of struts
*********
public void processPdf(String pdfFilePath, String imageFilePath){
System.setProperty("sun.java2d.cmm",
"sun.java2d.cmm.kcms.KcmsServiceProvider");
try (final PDDocument document = PDDocument.load(new
File(pdfFilePath))){
PDFRenderer pdfRenderer = new PDFRenderer(document);
pdfRenderer.setSubsamplingAllowed(true);
//for (int page = 0; page < document.getNumberOfPages(); ++page)
for (int page = 0; page < 1; ++page)
{
BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 72,
ImageType.RGB);
ImageIOUtil.writeImage(bim, imageFilePath, 72);
}
document.close();
} catch (IOException e){
log.info("Exception while trying to create pdf document - " +
e);
}
}
*********************
________________________________
From: Tilman Hausherr <thaush...@t-online.de>
Sent: Tuesday, April 17, 2018 10:39 AM
To: users@pdfbox.apache.org
Subject: Re: Performance issue with PDFBox 2.0.8
Hi,
I ran the Ashley file through the profiler, most time is used for
decoding the jpeg files within and converting some of the from CMYK to
RGB. Nothing to optimize. I also found another one-time initialization
that takes 100-300ms, which I will add to the next version of PDFDebugger.
FilterFactory.INSTANCE.getFilter(COSName.FLATE_DECODE);
I also tested the UsePureJavaCMYKConversion, it made rendering much
slower. IIRC, that option only helps with files with many tiny CMYK images.
I have committed a change that adds the subsampling option to
PDFToImage, that version will be available within a few hours at
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.10-SNAPSHOT/
look for todays date.
Or get the source code here:
https://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/java/org/apache/pdfbox/tools/PDFToImage.java?revision=1829374&view=markup
What type of application are you creating? If you want to show a PDF in
the browser, PDF.js works nicely, is free and included in firefox. If
you want to do thumbnails, then you should use a smaller dpi value. In
that case using subsampling would help even more.
Tilman
Am 17.04.2018 um 07:26 schrieb Tilman Hausherr:
Hi,
I have a Ryzen 1700 cpu and for tests I'm running it on max energy
settings. It is unclear if a mac has a similar setting. This url
http://www.macos.utah.edu/documentation/administration/pmset.html
shows there is a setting for "better performance" but I don't know if
that does the same as on Windows where I get a performance doubling.
Try PDFDebugger, it has a built-in benchmark feature, it shows the
rendering speed in the status line.
I'm also avoiding that one-time initializations are part of the
benchmark results with this code that is also in PDFDebugger:
// trigger premature initializations for more accurate
rendering benchmarks
// See discussion in PDFBOX-3988
if (PDType1Font.COURIER.isStandard14())
{
// Yes this is always true
PDDeviceCMYK.INSTANCE.toRGB(new float[] { 0, 0, 0, 0} );
PDDeviceRGB.INSTANCE.toRGB(new float[] { 0, 0, 0 } );
IIORegistry.getDefaultInstance();
}
I see you're using the PDFToImage utility. That one doesn't support
subsampling yet, it has been on my "todo" list for a few days, I'll
try to do it tonight... But PDFToImage is really just a command line
utility.
Args 7, 8 and 11 don't work that way. Re arg 7 and 8, you need to call
System.setProperty(). Re arg 11, you need to have a PDFRenderer object.
Another way to convert to images is explained here:
https://stackoverflow.com/questions/23326562/apache-pdfbox-convert-pdf-to-images
there call pdfRenderer.setSubsamplingAllowed(true) to activate
subsampling. PDFDebugger also supports it in the menu.
Tilman
Am 17.04.2018 um 01:20 schrieb Arthur Wang:
Tilman,
Thanks for the quick response and testing on my case. Below is my
java code, my test result after adding the subsampling allowed. For
the first page of ashley file, it cost 3362 milliseconds.
For the Gill file, the time elapsed is 2456 milliseconds.
My test were conducted on my MAC with processor: 2.2GHz, Core i7.
how come your PC runs so fast? 1.4 seconds is fast enough for a web
access. Maybe there is something wrong with my code? I would
appreciate if you take a look at my code.
Best,
Arthur
*******************
import org.apache.pdfbox.tools.PDFToImage;
//import java.awt.image.BufferedImage;
import java.io.File;
//import java.io.IOException;
//import java.io.OutputStream;
import org.apache.commons.lang3.time.StopWatch;
public class PdfToImage2 {
private static final String OUTPUT_DIR = "/Users/someone/Desktop/";
public static void main(String[] args) throws Exception{
String pdfPath = "/Users/someone/Desktop/Ashley
NJ_HHL101125_FV.pdf";
//config option 2:convert page 1 in pdf to image
String [] args_1 = new String[13];
args_1[0] = "-startPage";
args_1[1] = "1";
args_1[2] = "-endPage";
args_1[3] = "1";
args_1[4] = "-outputPrefix";
args_1[5] = OUTPUT_DIR+"Ashley NJ_HHL101125_FV1";
args_1[6] = pdfPath;
args_1[7] =
"-Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion";
args_1[8] = "true";
args_1[9] = "-dpi";
args_1[10] = "72";//@48-->3283 milliseconds, @96>3545
milliseconds, @72--> 3362milliseconds
args_1[11] = "-PDFRenderer.setSubsamplingAllowed";
args_1[12] = "true";
File f = new File(args_1[5]+"1.jpg");
if(f.exists() && !f.isDirectory()) {
System.out.println("file exist already");;
}
else{
StopWatch stopwatch = new StopWatch();
stopwatch.start();
try {
System.setProperty("sun.java2d.cmm",
"sun.java2d.cmm.kcms.KcmsServiceProvider");
PDFToImage.main(args_1);
System.out.println("Done!");
} catch (Exception e) {
System.err.println("Exception while trying to
create pdf document - " + e);
}
stopwatch.stop(); // optional
System.out.println("Time elapsed is "+
stopwatch.getTime() + " milliseconds");
}//else
//first try without setting property: 3779 milliseconds
//second try with the property set: 3852 milliseconds
//third try with subsamplingAllowed: 3362 milliseconds
}
*******************************
________________________________
From: Tilman Hausherr <thaush...@t-online.de>
Sent: Monday, April 16, 2018 10:55 AM
To: users@pdfbox.apache.org
Subject: Re: Performance issue with PDFBox 2.0.8
The java code didn't get through, most attachments get deleted. Call
PDFRenderer.setSubsamplingAllowed(true) to activate subsampling.
I had a look at your files... These are not extremely slow renderings. 4
seconds for such a page is pretty good.
On my PC, the first page of the Ashley file is rendered in PDFDebugger
in 1.4 seconds at 72dpi. The Gill file is done in less than a second.
Tilman
Am 16.04.2018 um 19:05 schrieb Arthur Wang:
Arthur Wang has shared OneDrive files with you. To view them, click
the links below.
<https://1drv.ms/b/s%21AhA_REgBppCpgQluAoJe28B935ru>
[https://t5xdlg.by.files.1drv.com/y4miTL2BDX0qtz_xYqqct6mwo2l56s1alwlz1rLGpW_Mc5E7Ru9u3d-eAzERHVsoihIPc7xtNgaswIIeBlDh-hbm93zWQ6vL_PbzONFlHkd9shlCjeSfByIpMBE3EY161sZ77ggD87nTJRTRysSn4sLSrmwyenyMNNSuT6_EsqGw8Db-iHQ8Fr14T8lW0hdVFrxBDvGPI4J5G6IV2RH21FHiw?width=800&height=800&cropmode=none]<https://1drv.ms/b/s%21AhA_REgBppCpgQluAoJe28B935ru>
[https://r1.res.office365.com/owa/prem/images/dc-pdf_40.png]<https://1drv.ms/b/s%21AhA_REgBppCpgQluAoJe28B935ru>
Ashley
NJ_HHL101125_FV.pdf<https://1drv.ms/b/s%21AhA_REgBppCpgQluAoJe28B935ru>
Shared via OneDrive
Ashley NJ_HHL101125_FV.pdf
<https://1drv.ms/b/s%21AhA_REgBppCpgQluAoJe28B935ru>
<https://1drv.ms/b/s%21AhA_REgBppCpgQpdnBIl_hmK6Wt0>
Gill1-1356_KM102685-INS_FV.pdf
<https://1drv.ms/b/s%21AhA_REgBppCpgQpdnBIl_hmK6Wt0>
<https://1drv.ms/u/s%21AhA_REgBppCpgQvygYjm2eaJQmSH>
Screen Shot 2018-04-16 at 9.23.52 AM.png
<https://1drv.ms/u/s%21AhA_REgBppCpgQvygYjm2eaJQmSH>
[Screen Shot 2018-04-16 at 9.23.52 AM.png]
just tried on 2.0.9, it works almost the same. to process all 4 pages
cost 32 seconds, if only process the first page, it cost about 4
seconds.
My server is HP DL380 with dual Xeon processors and 32 G RAM, the hard
drive is Intel Optane SSD NVMe.
Once the JPG image is produced, the access of the image is almost
instant regardless the size of the image file, so the open and close
time of the image file are insignificant and could be ignored.
By enable subsampling, do you mean to set up the dpi option ? do you
have the sample code for PDFRenderer ? attached file
---PdfToImage2.java is my testing code. Ashley...pdf is a file with
size about 45 M, and Gill...pdf is a file with size about 5 M. with
the size 1/10th of the other one, the processing time is cut down to
2657 milliseconds compare to 3779 milliseconds. seems like the size
does matter.
thanks,
Arthur
------------------------------------------------------------------------
*From:* Tilman Hausherr <thaush...@t-online.de>
*Sent:* Monday, April 16, 2018 8:57 AM
*To:* users@pdfbox.apache.org
*Subject:* Re: Performance issue with PDFBox 2.0.8
Please
- retry with the current version 2.0.9
- share your file for a profiler analysis
- as said by Itai (who implemented it) try enabling subsampling in
PDFRenderer (read the javadoc first). Compare the results and decide
whether the quality is OK for you.
- set the energy settings of your computer to maximum or at least to
"balanced", not to "energy save"
- don't know if adding GPU will help;
- try also the
-Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true option
The speed is not related to the size but to the complexity. 32 seconds
may sound disappointing but it's not the worst I've ever seen. "Nice
illustrations" with nested patterns or large shadings may be slow.
Tilman
Am 16.04.2018 um 09:21 schrieb Arthur Wang:
Hi, everyone,
I am using PDFBox 2.0.8 and java 8 running in tomcat 8 for
production to convert pdf into image for display. it works very well
for pdf file size less than 5 M, it takes about 3800 milliseconds.
however, it slows down very much when the file size is increased to 50
M. it takes about 70,000 milliseconds, after setting system property
of sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider", it does
increase the performance to 32550 milliseconds, which almost double
the speed. but for 32 seconds to load a web page still too slow. Is
there any other way to speed up the performance? would adding a GPU
into the server help the performance? or any other software or
hardware solution could help on the processing speed? My current
server come with 32 G RAM, and the server never used more than half
of it.
thanks,
Arthur
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org