Am 19.04.2018 um 00:18 schrieb Arthur Wang:
Hi, Tilman,


 From you last email----" I don't know if my graphics card plays any role in 
this", by any chance, would it be possible that you could benchmark if the GPU 
really help the pdfbox processing or not? I can not plug in a GPU into my mac to do the 
testing, but if you already have a GPU in your PC, I would be very interested to know if 
the performance would be decreased after you unplug it (if it's unluggable). if it proved 
to be helpful, I would like to buy a nice GPU and put it into our production server to 
improve the performance.

Sorry, no, I can't... My GPU is a $100 model (I don't go games), and I didn't find any setting to turn it on/off, and I don't have a different replacement adapter that I could switch. In java there is "-Dsun.java2d.opengl=true" but it has no effect.
https://docs.oracle.com/javase/8/docs/technotes/guides/2d/flags.html#opengl

The only things that make a difference on my system (windows) are enabling subsampling and setting energy settings to max performance. Using both then the first page of the Ashley file is rendered in 922ms.

Tilman


If this take some effort or time, never mind,


thanks for all the help,


Arthur


________________________________
From: Tilman Hausherr <thaush...@t-online.de>
Sent: Tuesday, April 17, 2018 3:17 PM
To: users@pdfbox.apache.org
Subject: Re: Fw: Performance issue with PDFBox 2.0.8

Hi,

Yeah, for thumbnails / previews the subsampling option is definitively
for you.

Can you calculate the preview in the background? I.e. at the time the
PDFs are uploaded, instead of when the download page is requested?

Re pdf.js you can test it here:
https://mozilla.github.io/pdf.js/web/viewer.html
PDF.js viewer - GitHub Pages<https://mozilla.github.io/pdf.js/web/viewer.html>
mozilla.github.io
Enter the password to open this PDF file: Cancel OK. File name:-



I tried the Herman file, to me it seemed that it is slower with PDF.js
than PDFBox, which is a bit surprising because usually they're faster.

With PDFDebugger with subsampling enabled it is rendered in 4409ms on my
system. I don't know if my graphics card plays any role in this.

Tilman

Am 17.04.2018 um 23:40 schrieb Arthur Wang:
Arthur Wang has shared OneDrive files with you. To view them, click the links 
below.


<https://1drv.ms/b/s!AhA_REgBppCpgQ062sb5LoKlZkC4>
[https://r1.res.office365.com/owa/prem/images/dc-pdf_20.png]<https://1drv.ms/b/s!AhA_REgBppCpgQ062sb5LoKlZkC4>

Herman & hiss - PPHI101201 - FV 
1.pdf<https://1drv.ms/b/s!AhA_REgBppCpgQ062sb5LoKlZkC4>

<https://1drv.ms/u/s!AhA_REgBppCpgQ7jWRqI5BtoKiMx>
[https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!AhA_REgBppCpgQ7jWRqI5BtoKiMx>

fileListPage.png<https://1drv.ms/u/s!AhA_REgBppCpgQ7jWRqI5BtoKiMx>

<https://1drv.ms/u/s!AhA_REgBppCpgQ9zgx9cBhmI2DfH>
[https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!AhA_REgBppCpgQ9zgx9cBhmI2DfH>

downloadpage.png<https://1drv.ms/u/s!AhA_REgBppCpgQ9zgx9cBhmI2DfH>




Arthur Wang has shared a OneDrive file with you. To view it, click the link 
below.


<https://1drv.ms/b/s!AhA_REgBppCpgQ062sb5LoKlZkC4>
[https://r1.res.office365.com/owa/prem/images/dc-pdf_20.png]<https://1drv.ms/b/s!AhA_REgBppCpgQ062sb5LoKlZkC4>

Herman & hiss - PPHI101201 - FV 
1.pdf<https://1drv.ms/b/s!AhA_REgBppCpgQ062sb5LoKlZkC4>


[https://ipodlq.by.files.1drv.com/y4mV0VxHK2D4uxyC4TQL_Pm0kYsbjNHGUAguMLxlVa4ykUtL6nwg19za0G74IoFusv2FwwJzohKoGdIKXg_MF26eavOY6hXbddC36qMI8vALhyNSmU8cAlpuAsMwah5b5__skoa2koVvs5wP2tekcxmBSQE-KPRahIsVu6ZtVnLV_I?width=800&height=800&cropmode=none]<https://1drv.ms/b/s!AhA_REgBppCpgQ062sb5LoKlZkC4>

[https://r1.res.office365.com/owa/prem/images/dc-pdf_40.png]<https://1drv.ms/b/s!AhA_REgBppCpgQ062sb5LoKlZkC4>
Herman & hiss - PPHI101201 - FV 
1.pdf<https://1drv.ms/b/s!AhA_REgBppCpgQ062sb5LoKlZkC4>
Shared via OneDrive




Tilman,


Since my email got rejected due to the size limit of apache mail server--1m. I 
send it again here.


First, thank you very much for the extra information and udpate.


My application is an internal web based production system. Many designers in 
our graphic department uploaded the print-ready file to the system every hours, 
and other users include prepress, press, shipping, customers will log into the 
system to download the files. The print-ready pdf file sometimes are extremely 
big in terms of the size, 5 M to 1 G are most popular, 2G to 5G are rare, but 
do happen sometimes. Please refer to the attached two screenshots(fileListPage, 
downloadPage). What I am trying to do is to show a thumbnail on the 
fileDownloadPage. we used to show a download icon on the download page instead 
of the thumbnail, but users have to download the file to their local computer 
before actually seeing it. Sometimes the fileListPage show a long list of 
files, people get confused, it would be more convenient for the user to have a 
peek of the file before actually download it. so it's better to have a 
thumbnail on the download page. In terms of the pdf.js, I had never tried, do 
you think it can load 40M or 50 M file in one or two seconds by the apache 
server?


I copied my code below for you reference.(one is for testing, the other one is 
for production .)


Attached you will also find a pdf file named Herman..pdf. it only has two 
pages, but by only convert the first page, the best I can do it 7 seconds. that 
would be very slow for web application. If by adding a GPU, the performance 
could get better, I would certainly like to try, just not sure if it's going to 
work.


******************below are testing code running on eclilpse 
platform**************


package com.test;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.apache.pdfbox.tools.imageio.ImageIOUtil;

import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import org.apache.commons.lang3.time.StopWatch;
import org.apache.commons.lang3.StringUtils;

public class PdfToImage {

      private static final String OUTPUT_DIR = "/Users/someone/Desktop/";

      public static void main(String[] args) throws Exception{

          System.setProperty("sun.java2d.cmm", 
"sun.java2d.cmm.kcms.KcmsServiceProvider");

          StopWatch stopwatch = new StopWatch();

          stopwatch.start();

          try (final PDDocument document = PDDocument.load(new 
File("/Users/someone/Desktop/Herman & hiss - PPHI101201 - FV.pdf"))){
              PDFRenderer pdfRenderer = new PDFRenderer(document);
              pdfRenderer.setSubsamplingAllowed(true);
              //for (int page = 0; page < document.getNumberOfPages(); ++page)
              for (int page = 0; page < 1; ++page)
              {
                  BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 72, 
ImageType.RGB); //<--this number have performance impact
                  String fileName = OUTPUT_DIR + "Herman & hiss - PPHI101201 - FV" + page 
+ ".jpg";
                  ImageIOUtil.writeImage(bim, fileName, 72); //<---this number
              }
              document.close();
          } catch (IOException e){
              System.err.println("Exception while trying to create pdf document - 
" + e);
          }

           stopwatch.stop(); // optional
          System.out.println("Time elapsed is "+ stopwatch.getTime() + " 
milliseconds");


      }
      //test Files: Ashley NJ_HHL101125_FV.pdf, 40M, 4 pages
      //try Ashley without set property: 4 pages@70074 milliseconds
      //try Ashley with property set:   4 pagesQ@32552 milliseconds
      //try have subSampling true set: 4 pages@9481 milliseconds
      //try Herman & hiss - PPHI101201 - FV.png: two pages@14050 milliseconds
      //try Herman & hiss - PPHI101201 - FV.jpg: two pages@13612 milliseconds
      //try Herman: 1 page@7625
      //try Ashley: 1 page@3237
      //try Ashely with 72 dpi: 1 page@2807
      //try Herman with 72 dpi: 1 page@6788
      //try herman without subSampling true setting: 1 page@7087

}



*****************below is production code running as an action class of struts 
*********


public void processPdf(String pdfFilePath, String imageFilePath){

          System.setProperty("sun.java2d.cmm", 
"sun.java2d.cmm.kcms.KcmsServiceProvider");

          try (final PDDocument document = PDDocument.load(new 
File(pdfFilePath))){
              PDFRenderer pdfRenderer = new PDFRenderer(document);
              pdfRenderer.setSubsamplingAllowed(true);
              //for (int page = 0; page < document.getNumberOfPages(); ++page)
              for (int page = 0; page < 1; ++page)
              {
                  BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 72, 
ImageType.RGB);

                  ImageIOUtil.writeImage(bim, imageFilePath, 72);
              }
              document.close();
          } catch (IOException e){
                  log.info("Exception while trying to create pdf document - " + 
e);
          }


      }


*********************



________________________________
From: Tilman Hausherr <thaush...@t-online.de>
Sent: Tuesday, April 17, 2018 10:39 AM
To: users@pdfbox.apache.org
Subject: Re: Performance issue with PDFBox 2.0.8

Hi,

I ran the Ashley file through the profiler, most time is used for
decoding the jpeg files within and converting some of the from CMYK to
RGB. Nothing to optimize. I also found another one-time initialization
that takes 100-300ms, which I will add to the next version of PDFDebugger.

       FilterFactory.INSTANCE.getFilter(COSName.FLATE_DECODE);

I also tested the UsePureJavaCMYKConversion, it made rendering much
slower. IIRC, that option only helps with files with many tiny CMYK images.

I have committed a change that adds the subsampling option to
PDFToImage, that version will be available within a few hours at
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.10-SNAPSHOT/
look for todays date.

Or get the source code here:
https://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/java/org/apache/pdfbox/tools/PDFToImage.java?revision=1829374&view=markup

What type of application are you creating? If you want to show a PDF in
the browser, PDF.js works nicely, is free and included in firefox. If
you want to do thumbnails, then you should use a smaller dpi value. In
that case using subsampling would help even more.

Tilman



Am 17.04.2018 um 07:26 schrieb Tilman Hausherr:
Hi,

I have a Ryzen 1700 cpu and for tests I'm running it on max energy
settings. It is unclear if a mac has a similar setting.  This url
http://www.macos.utah.edu/documentation/administration/pmset.html
shows there is a setting for "better performance" but I don't know if
that does the same as on Windows where I get a performance doubling.
Try PDFDebugger, it has a built-in benchmark feature, it shows the
rendering speed in the status line.

I'm also avoiding that one-time initializations are part of the
benchmark results with this code that is also in PDFDebugger:

          // trigger premature initializations for more accurate
rendering benchmarks
          // See discussion in PDFBOX-3988
          if (PDType1Font.COURIER.isStandard14())
          {
              // Yes this is always true
              PDDeviceCMYK.INSTANCE.toRGB(new float[] { 0, 0, 0, 0} );
              PDDeviceRGB.INSTANCE.toRGB(new float[] { 0, 0, 0 } );
              IIORegistry.getDefaultInstance();
          }

I see you're using the PDFToImage utility. That one doesn't support
subsampling yet, it has been on my "todo" list for a few days, I'll
try to do it tonight... But PDFToImage is really just a command line
utility.

Args 7, 8 and 11 don't work that way. Re arg 7 and 8, you need to call
System.setProperty(). Re arg 11, you need to have a PDFRenderer object.

Another way to convert to images is explained here:
https://stackoverflow.com/questions/23326562/apache-pdfbox-convert-pdf-to-images


there call pdfRenderer.setSubsamplingAllowed(true) to activate
subsampling. PDFDebugger also supports it in the menu.

Tilman

Am 17.04.2018 um 01:20 schrieb Arthur Wang:
Tilman,


Thanks for the quick response and testing on my case. Below is my
java code, my test result after adding the subsampling allowed. For
the first page of ashley file, it cost 3362 milliseconds.

For the Gill file, the time elapsed is 2456 milliseconds.

My test were conducted on my MAC with processor: 2.2GHz, Core i7.
how come your PC runs so fast? 1.4 seconds is fast enough for a web
access. Maybe there is something wrong with my code? I would
appreciate if you take a look at my code.


Best,


Arthur


*******************

import org.apache.pdfbox.tools.PDFToImage;
//import java.awt.image.BufferedImage;
import java.io.File;
//import java.io.IOException;
//import java.io.OutputStream;
import org.apache.commons.lang3.time.StopWatch;


public class PdfToImage2 {

       private static final String OUTPUT_DIR = "/Users/someone/Desktop/";

       public static void main(String[] args) throws Exception{

           String pdfPath = "/Users/someone/Desktop/Ashley
NJ_HHL101125_FV.pdf";
           //config option 2:convert page 1 in pdf to image
           String [] args_1 =  new String[13];
           args_1[0] = "-startPage";
           args_1[1] = "1";
           args_1[2] = "-endPage";
           args_1[3] = "1";
           args_1[4] = "-outputPrefix";
           args_1[5] = OUTPUT_DIR+"Ashley NJ_HHL101125_FV1";
           args_1[6] = pdfPath;
           args_1[7] =
"-Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion";
           args_1[8] = "true";
           args_1[9] = "-dpi";
           args_1[10] = "72";//@48-->3283 milliseconds, @96>3545
milliseconds, @72--> 3362milliseconds
           args_1[11] = "-PDFRenderer.setSubsamplingAllowed";
           args_1[12] = "true";

           File f = new File(args_1[5]+"1.jpg");
           if(f.exists() && !f.isDirectory()) {
               System.out.println("file exist already");;
           }
           else{

               StopWatch stopwatch = new StopWatch();

               stopwatch.start();

                 try {

                   System.setProperty("sun.java2d.cmm",
"sun.java2d.cmm.kcms.KcmsServiceProvider");
                   PDFToImage.main(args_1);
                   System.out.println("Done!");
                 } catch (Exception e) {
                     System.err.println("Exception while trying to
create pdf document - " + e);
                 }

                    stopwatch.stop(); // optional
                   System.out.println("Time elapsed is "+
stopwatch.getTime() + " milliseconds");


           }//else

           //first try without setting property: 3779 milliseconds
           //second try with the property set: 3852 milliseconds
           //third try with subsamplingAllowed: 3362 milliseconds

       }

*******************************

________________________________
From: Tilman Hausherr <thaush...@t-online.de>
Sent: Monday, April 16, 2018 10:55 AM
To: users@pdfbox.apache.org
Subject: Re: Performance issue with PDFBox 2.0.8

The java code didn't get through, most attachments get deleted. Call
PDFRenderer.setSubsamplingAllowed(true) to activate subsampling.

I had a look at your files... These are not extremely slow renderings. 4
seconds for such a page is pretty good.

On my PC, the first page of the Ashley file is rendered in PDFDebugger
in 1.4 seconds at 72dpi. The Gill file is done in less than a second.

Tilman

Am 16.04.2018 um 19:05 schrieb Arthur Wang:
Arthur Wang has shared OneDrive files with you. To view them, click
the links below.

<https://1drv.ms/b/s%21AhA_REgBppCpgQluAoJe28B935ru>
[https://t5xdlg.by.files.1drv.com/y4miTL2BDX0qtz_xYqqct6mwo2l56s1alwlz1rLGpW_Mc5E7Ru9u3d-eAzERHVsoihIPc7xtNgaswIIeBlDh-hbm93zWQ6vL_PbzONFlHkd9shlCjeSfByIpMBE3EY161sZ77ggD87nTJRTRysSn4sLSrmwyenyMNNSuT6_EsqGw8Db-iHQ8Fr14T8lW0hdVFrxBDvGPI4J5G6IV2RH21FHiw?width=800&height=800&cropmode=none]<https://1drv.ms/b/s%21AhA_REgBppCpgQluAoJe28B935ru>


[https://r1.res.office365.com/owa/prem/images/dc-pdf_40.png]<https://1drv.ms/b/s%21AhA_REgBppCpgQluAoJe28B935ru>

Ashley
NJ_HHL101125_FV.pdf<https://1drv.ms/b/s%21AhA_REgBppCpgQluAoJe28B935ru>
Shared via OneDrive



Ashley NJ_HHL101125_FV.pdf
<https://1drv.ms/b/s%21AhA_REgBppCpgQluAoJe28B935ru>


<https://1drv.ms/b/s%21AhA_REgBppCpgQpdnBIl_hmK6Wt0>

Gill1-1356_KM102685-INS_FV.pdf
<https://1drv.ms/b/s%21AhA_REgBppCpgQpdnBIl_hmK6Wt0>


<https://1drv.ms/u/s%21AhA_REgBppCpgQvygYjm2eaJQmSH>

Screen Shot 2018-04-16 at 9.23.52 AM.png
<https://1drv.ms/u/s%21AhA_REgBppCpgQvygYjm2eaJQmSH>
          [Screen Shot 2018-04-16 at 9.23.52 AM.png]

just tried on 2.0.9, it works almost the same. to process all 4 pages
cost 32 seconds, if only process the first page, it cost about 4
seconds.


My server is HP DL380 with dual Xeon processors and 32 G RAM, the hard
drive is Intel Optane SSD NVMe.

Once the JPG image is produced, the access of the image is almost
instant regardless the size of the image file, so the open and close
time of the image file are insignificant and could be ignored.


By enable subsampling, do you mean to set up the dpi option ? do you
have the sample code for PDFRenderer ? attached file
---PdfToImage2.java is my testing code. Ashley...pdf is a file with
size about 45 M, and Gill...pdf is a file with size about 5 M. with
the size 1/10th of the other one, the processing time is cut down to
2657 milliseconds compare to 3779 milliseconds. seems like the size
does matter.


thanks,


Arthur



------------------------------------------------------------------------

*From:* Tilman Hausherr <thaush...@t-online.de>
*Sent:* Monday, April 16, 2018 8:57 AM
*To:* users@pdfbox.apache.org
*Subject:* Re: Performance issue with PDFBox 2.0.8
Please
- retry with the current version 2.0.9
- share your file for a profiler analysis
- as said by Itai (who implemented it) try enabling subsampling in
PDFRenderer (read the javadoc first). Compare the results and decide
whether the quality is OK for you.
- set the energy settings of your computer to maximum or at least to
"balanced", not to "energy save"
- don't know if adding GPU will help;
- try also the
-Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true option

The speed is not related to the size but to the complexity. 32 seconds
may sound disappointing but it's not the worst I've ever seen. "Nice
illustrations" with nested patterns or large shadings may be slow.

Tilman

Am 16.04.2018 um 09:21 schrieb Arthur Wang:
Hi, everyone,



I am using PDFBox 2.0.8 and java 8 running in tomcat 8 for
production to convert pdf into image for display. it works very well
for pdf file size less than 5 M, it takes about 3800 milliseconds.
however, it slows down very much when the file size is increased to 50
M. it takes about 70,000 milliseconds, after setting system property
of sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider", it does
increase the performance to 32550 milliseconds, which almost double
the speed. but for 32 seconds to load a web page still too slow. Is
there any other way to speed up the performance? would adding a GPU
into the server help the performance? or any other software or
hardware solution could help on the processing speed? My current
server come with 32 G RAM, and the server never used more than half
of it.
thanks,


Arthur

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Reply via email to