Re: PDFRenderer, PDDocument memory issue

2015-07-06 Thread John Hewson

 On 1 Jul 2015, at 23:29, Andreas Lehmkühler andr...@lehmi.de wrote:
 
 
 
 John Hewson j...@jahewson.com hat am 2. Juli 2015 um 06:10 geschrieben:
 
 
 
 On 1 Jul 2015, at 07:52, Tilman Hausherr thaush...@t-online.de wrote:
 
 Am 01.07.2015 um 10:16 schrieb Alex Sviridov:
 In my application I have real time memory graphs and they show that memory
 is very fast filled.
 When there is no more free memory getPageThumbImage hangs - no exception,
 nothing. But the code stops.
 When I do pdfDocument=null,pdfRenderer=null I get about 400mb free memory.
 How to solve this problem?
 
 If you're building from source, try this: in PDImageXObject.java, remove the
 line cachedImage = image;. This will consume less space if you have large
 PDFs with many images.
 
 We don't retain XObjects across pages (anymore), so that shouldn't be the
 cause of his gradual memory increase?
 IMHO, it's quite simply to explain. During the initial parse all streams are
 read and all the data is stored in COSStream (see COSParser#parseCOSStream).
 That isn't a new behaviour and I'm working on a better solution (it's my last
 TODO in PDFBOX-2301)

So it’s cached data in COSStream? That wouldn’t be affected by cachedImage = 
image;”
but it would certainly explain the increasing heap usage. Glad to hear that you 
have an
improvement underway!

— John

 Tilman
 
 
 
 -
 To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
 For additional commands, e-mail: users-h...@pdfbox.apache.org
 
 
 -
 To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
 For additional commands, e-mail: users-h...@pdfbox.apache.org
 
 BR
 Andreas
 
 -
 To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
 For additional commands, e-mail: users-h...@pdfbox.apache.org
 


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: PDFRenderer, PDDocument memory issue

2015-07-02 Thread Andreas Lehmkühler


 John Hewson j...@jahewson.com hat am 2. Juli 2015 um 06:10 geschrieben:
 
 
 
  On 1 Jul 2015, at 07:52, Tilman Hausherr thaush...@t-online.de wrote:
  
  Am 01.07.2015 um 10:16 schrieb Alex Sviridov:
  In my application I have real time memory graphs and they show that memory
  is very fast filled.
  When there is no more free memory getPageThumbImage hangs - no exception,
  nothing. But the code stops.
  When I do pdfDocument=null,pdfRenderer=null I get about 400mb free memory.
  How to solve this problem?
  
  If you're building from source, try this: in PDImageXObject.java, remove the
  line cachedImage = image;. This will consume less space if you have large
  PDFs with many images.
 
 We don't retain XObjects across pages (anymore), so that shouldn't be the
 cause of his gradual memory increase?
IMHO, it's quite simply to explain. During the initial parse all streams are
read and all the data is stored in COSStream (see COSParser#parseCOSStream).
That isn't a new behaviour and I'm working on a better solution (it's my last
TODO in PDFBOX-2301)

  Tilman
  
  
  
  -
  To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
  For additional commands, e-mail: users-h...@pdfbox.apache.org
  
 
 -
 To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
 For additional commands, e-mail: users-h...@pdfbox.apache.org

BR
Andreas

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Re[10]: PDFRenderer, PDDocument memory issue

2015-07-01 Thread John Hewson


 On 1 Jul 2015, at 05:15, Alex Sviridov ooo_satu...@mail.ru wrote:
 
 Ok. Thank you again. I just don't understand one thing. What is the reason to 
 keep so large data if I only need to take page images and the most important 
 I DO IT BY PAGE?
 
 Is there no way not to keep data for previous pages if I need only data for 
 page N?

Try profiling PDFBox to see what that data actually is. We don't cache page 
resources anymore. It could be cached stream data, or fonts, perhaps.

-- John

 Среда,  1 июля 2015, 14:08 +02:00 от Andreas Lehmkühler andr...@lehmi.de:
 
 
 Alex Sviridov  ooo_satu...@mail.ru  hat am 1. Juli 2015 um 13:59 
 geschrieben:
 
 
 Ok. Thank you very much for explanation. Could you say where this scratch
 file is located linux/windows?
 java.io.File.createTempFile is used to create that file. It uses the default
 temp directory. It's /tmp on linux. I'm not sure for windows as different
 environment variables (TMP, TEMP, USERPROFILE, ) are used to search for 
 such
 a directory.
 
 You may define your own temp directory using the following parameter when
 starting your application
 
 -Djava.io.tmpdir=PATH-TO-YOUR-TEMP
 
 
 
 
 Среда,  1 июля 2015, 13:54 +02:00 от Andreas Lehmkühler  andr...@lehmi.de 
 :
 Alex Sviridov   ooo_satu...@mail.ru  hat am 1. Juli 2015 um 13:38
 geschrieben:
 
 
 The file is here  https://yadi.sk/i/Y0fTuvHmhbZiE
 Ah, that explains a lot. The pdf is a scanned document, every page holds a
 color
 image, consuming a lot of memory when processed
 
 I tried with load (fileName,true). The result - now I don't have memory
 problems. However now I have 2 problems:
 
 1) All the thumbnail images are loaded. However, the speed is VERY SLOW.
 One
 thumbnail image is loaded about 4 seconds!
 If it comes to huge pdfs, you have to die one death. Either you provide
 enough
 memory to do all the stuff in memory (fast) or you use a scratch file to 
 save
 memory (slow)
 
 And yes, there is room for an improvement of the memory handling (read on
 demand, remove after usage) in PDFBox, but that is some future feature.
 Patches
 are welcome.
 
 2) Besides, as you see thumbnail images are loaded in separate thread.
 While
 this thread is running and I try to
 get big image for main content using   BufferedImage
 bi=pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB); I get the
 following exception:
 
 java.io.IOException: java.util.zip.DataFormatException: unknown 
 compression
 method
 at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
 at org.apache.pdfbox.cos.COSStream.attemptDecode(COSStream.java:422)
 at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:398)
 at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:335)
 at
 org.apache.pdfbox.cos.COSStream.checkUnfilteredBuffer(COSStream.java:265)
 at
 org.apache.pdfbox.cos.COSStream.getUnfilteredRandomAccess(COSStream.java:239)
 at org.apache.pdfbox.pdfparser.BaseParser.init(BaseParser.java:146)
 at
 org.apache.pdfbox.pdfparser.PDFStreamParser.init(PDFStreamParser.java:78)
 at
 org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:451)
 at
 org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:438)
 at
 org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
 at 
 org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:180)
 at
 org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:205)
 at
 org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:136)
 at
 org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:95)
   
 at javafx.concurrent.Task$TaskCallable.call(Task.java:1423)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.util.zip.DataFormatException: unknown compression method
 at java.util.zip.Inflater.inflateBytes(Native Method)
 at java.util.zip.Inflater.inflate(Inflater.java:259)
 at java.util.zip.Inflater.inflate(Inflater.java:280)
 at
 org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:101)
 at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:74)
 ... 20 more
 
 How to solve these problems?
 PDFBox isn't supposed to be thread safe.
 
 
 
 Среда,  1 июля 2015, 13:17 +02:00 от Andreas Lehmkühler   
 andr...@lehmi.de
 :
 
 
 Alex Sviridov   ooo_satu...@mail.ru  hat am 1. Juli 2015 um 13:09
 geschrieben:
 
 
 I decided to show all the code. I also send the pdf file - some file
 from
 internet I use for testing.
 The attachment didn't make it due to some restrictions to the mailing
 list.
 Please post a link to the origin source or another place where we can
 download
 the pdf in question.
 
 
 Task task = new Task() {
 @Override protected Integer call() throws Exception {
 for (int i=0;imodel.getTotalPages();i++){
 

Re: Re[8]: PDFRenderer, PDDocument memory issue

2015-07-01 Thread Andreas Lehmkühler


 Alex Sviridov ooo_satu...@mail.ru hat am 1. Juli 2015 um 13:59 geschrieben:
 
 
  Ok. Thank you very much for explanation. Could you say where this scratch
 file is located linux/windows?
java.io.File.createTempFile is used to create that file. It uses the default
temp directory. It's /tmp on linux. I'm not sure for windows as different
environment variables (TMP, TEMP, USERPROFILE, ) are used to search for such
a directory.

You may define your own temp directory using the following parameter when
starting your application

-Djava.io.tmpdir=PATH-TO-YOUR-TEMP


 
 
 Среда,  1 июля 2015, 13:54 +02:00 от Andreas Lehmkühler andr...@lehmi.de:
  Alex Sviridov  ooo_satu...@mail.ru  hat am 1. Juli 2015 um 13:38
  geschrieben:
  
  
   The file is here  https://yadi.sk/i/Y0fTuvHmhbZiE
 Ah, that explains a lot. The pdf is a scanned document, every page holds a
 color
 image, consuming a lot of memory when processed
 
  I tried with load (fileName,true). The result - now I don't have memory
  problems. However now I have 2 problems:
 
  1) All the thumbnail images are loaded. However, the speed is VERY SLOW.
  One
  thumbnail image is loaded about 4 seconds! 
 If it comes to huge pdfs, you have to die one death. Either you provide
 enough
 memory to do all the stuff in memory (fast) or you use a scratch file to save
 memory (slow)
 
 And yes, there is room for an improvement of the memory handling (read on
 demand, remove after usage) in PDFBox, but that is some future feature.
 Patches
 are welcome.
 
  2) Besides, as you see thumbnail images are loaded in separate thread.
  While
  this thread is running and I try to
  get big image for main content using   BufferedImage
  bi=pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB); I get the
  following exception:
  
  java.io.IOException: java.util.zip.DataFormatException: unknown compression
  method
      at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
      at org.apache.pdfbox.cos.COSStream.attemptDecode(COSStream.java:422)
      at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:398)
      at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:335)
      at
  org.apache.pdfbox.cos.COSStream.checkUnfilteredBuffer(COSStream.java:265)
      at
  org.apache.pdfbox.cos.COSStream.getUnfilteredRandomAccess(COSStream.java:239)
      at org.apache.pdfbox.pdfparser.BaseParser.init(BaseParser.java:146)
      at
  org.apache.pdfbox.pdfparser.PDFStreamParser.init(PDFStreamParser.java:78)
      at
  org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:451)
      at
  org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:438)
      at
  org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
      at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:180)
      at
  org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:205)
      at
  org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:136)
      at
  org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:95)
    
      at javafx.concurrent.Task$TaskCallable.call(Task.java:1423)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.lang.Thread.run(Thread.java:745)
  Caused by: java.util.zip.DataFormatException: unknown compression method
      at java.util.zip.Inflater.inflateBytes(Native Method)
      at java.util.zip.Inflater.inflate(Inflater.java:259)
      at java.util.zip.Inflater.inflate(Inflater.java:280)
      at
  org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:101)
      at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:74)
      ... 20 more
  
  How to solve these problems?
 PDFBox isn't supposed to be thread safe.
 
  
  
  Среда,  1 июля 2015, 13:17 +02:00 от Andreas Lehmkühler  andr...@lehmi.de
  :
  
  
   Alex Sviridov   ooo_satu...@mail.ru  hat am 1. Juli 2015 um 13:09
   geschrieben:
   
   
I decided to show all the code. I also send the pdf file - some file
   from
   internet I use for testing.
  The attachment didn't make it due to some restrictions to the mailing
  list.
  Please post a link to the origin source or another place where we can
  download
  the pdf in question.
  
   
   Task task = new Task() {
       @Override protected Integer call() throws Exception {
       for (int i=0;imodel.getTotalPages();i++){
       System.out.println(Point a:+i);
       WritableImage writableImage=model.getPageThumbImage(i);
       System.out.println(Point b:+i);
       ImageView imageView=new ImageView(writableImage);
       System.out.println(Point c:+i);
       Label label=new Label(Integer.toString(i+1));
       System.out.println(Point d:+i);
       VBox vBox=new VBox(imageView,label);
       System.out.println(Point e:+i);
       

Re: Re[6]: PDFRenderer, PDDocument memory issue

2015-07-01 Thread Andreas Lehmkühler
 Alex Sviridov ooo_satu...@mail.ru hat am 1. Juli 2015 um 13:38 geschrieben:
 
 
  The file is here  https://yadi.sk/i/Y0fTuvHmhbZiE
Ah, that explains a lot. The pdf is a scanned document, every page holds a color
image, consuming a lot of memory when processed

 I tried with load (fileName,true). The result - now I don't have memory
 problems. However now I have 2 problems:

 1) All the thumbnail images are loaded. However, the speed is VERY SLOW. One
 thumbnail image is loaded about 4 seconds! 
If it comes to huge pdfs, you have to die one death. Either you provide enough
memory to do all the stuff in memory (fast) or you use a scratch file to save
memory (slow)

And yes, there is room for an improvement of the memory handling (read on
demand, remove after usage) in PDFBox, but that is some future feature. Patches
are welcome.

 2) Besides, as you see thumbnail images are loaded in separate thread. While
 this thread is running and I try to
 get big image for main content using   BufferedImage
 bi=pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB); I get the
 following exception:
 
 java.io.IOException: java.util.zip.DataFormatException: unknown compression
 method
     at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
     at org.apache.pdfbox.cos.COSStream.attemptDecode(COSStream.java:422)
     at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:398)
     at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:335)
     at
 org.apache.pdfbox.cos.COSStream.checkUnfilteredBuffer(COSStream.java:265)
     at
 org.apache.pdfbox.cos.COSStream.getUnfilteredRandomAccess(COSStream.java:239)
     at org.apache.pdfbox.pdfparser.BaseParser.init(BaseParser.java:146)
     at
 org.apache.pdfbox.pdfparser.PDFStreamParser.init(PDFStreamParser.java:78)
     at
 org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:451)
     at
 org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:438)
     at
 org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
     at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:180)
     at
 org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:205)
     at
 org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:136)
     at
 org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:95)
   
     at javafx.concurrent.Task$TaskCallable.call(Task.java:1423)
     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
     at java.lang.Thread.run(Thread.java:745)
 Caused by: java.util.zip.DataFormatException: unknown compression method
     at java.util.zip.Inflater.inflateBytes(Native Method)
     at java.util.zip.Inflater.inflate(Inflater.java:259)
     at java.util.zip.Inflater.inflate(Inflater.java:280)
     at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:101)
     at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:74)
     ... 20 more
 
 How to solve these problems?
PDFBox isn't supposed to be thread safe.

 
 
 Среда,  1 июля 2015, 13:17 +02:00 от Andreas Lehmkühler andr...@lehmi.de:
 
 
  Alex Sviridov  ooo_satu...@mail.ru  hat am 1. Juli 2015 um 13:09
  geschrieben:
  
  
   I decided to show all the code. I also send the pdf file - some file from
  internet I use for testing.
 The attachment didn't make it due to some restrictions to the mailing list.
 Please post a link to the origin source or another place where we can
 download
 the pdf in question.
 
  
  Task task = new Task() {
      @Override protected Integer call() throws Exception {
      for (int i=0;imodel.getTotalPages();i++){
      System.out.println(Point a:+i);
      WritableImage writableImage=model.getPageThumbImage(i);
      System.out.println(Point b:+i);
      ImageView imageView=new ImageView(writableImage);
      System.out.println(Point c:+i);
      Label label=new Label(Integer.toString(i+1));
      System.out.println(Point d:+i);
      VBox vBox=new VBox(imageView,label);
      System.out.println(Point e:+i);
      vBox.setAlignment(Pos.CENTER);
      vBox.setStyle(-fx-padding:5px 5px 5px
  5px;-fx-background-color:red);
      System.out.println(Point f:+i);
      Platform.runLater(new Runnable() {
      @Override
      public void run() {
   thumbFlowPane.getChildren().add(vBox);
      }
      });
      }
      return null;
      }
  };
  new Thread(task).start();
  
  And here is the tail of the output
  
  Point a:30
  Point b:30
  Point c:30
  Point d:30
  Point e:30
  Point f:30
  Point a:31
  
  What is scratch file? Sorry, I don't understand you.
 
 PDFBox holds a lot of temporary data in the memory. To reduce the memory
 footprint one can choose to use a scratch file instead, so that some/most 

Re: Re[10]: PDFRenderer, PDDocument memory issue

2015-07-01 Thread Andreas Lehmkühler


 Alex Sviridov ooo_satu...@mail.ru hat am 1. Juli 2015 um 14:15 geschrieben:
 
 
  Ok. Thank you again. I just don't understand one thing. What is the reason to
 keep so large data if I only need to take page images and the most important I
 DO IT BY PAGE?
PDFBox doesn't know that you are doing it page by page.

 
 Is there no way not to keep data for previous pages if I need only data for
 page N?
As I said, we don't have a read on demand mechanism yet. It is in our focus but
that will take a while, as the pdf format isn't that easy to work with and
therefore the code to be extended is more or less complex.

 Среда,  1 июля 2015, 14:08 +02:00 от Andreas Lehmkühler andr...@lehmi.de:
 
 
  Alex Sviridov  ooo_satu...@mail.ru  hat am 1. Juli 2015 um 13:59
  geschrieben:
  
  
   Ok. Thank you very much for explanation. Could you say where this scratch
  file is located linux/windows?
 java.io.File.createTempFile is used to create that file. It uses the default
 temp directory. It's /tmp on linux. I'm not sure for windows as different
 environment variables (TMP, TEMP, USERPROFILE, ) are used to search for
 such
 a directory.
 
 You may define your own temp directory using the following parameter when
 starting your application
 
 -Djava.io.tmpdir=PATH-TO-YOUR-TEMP
 
 
  
  
  Среда,  1 июля 2015, 13:54 +02:00 от Andreas Lehmkühler  andr...@lehmi.de
  :
   Alex Sviridov   ooo_satu...@mail.ru  hat am 1. Juli 2015 um 13:38
   geschrieben:
   
   
The file is here  https://yadi.sk/i/Y0fTuvHmhbZiE
  Ah, that explains a lot. The pdf is a scanned document, every page holds a
  color
  image, consuming a lot of memory when processed
  
   I tried with load (fileName,true). The result - now I don't have memory
   problems. However now I have 2 problems:
  
   1) All the thumbnail images are loaded. However, the speed is VERY SLOW.
   One
   thumbnail image is loaded about 4 seconds! 
  If it comes to huge pdfs, you have to die one death. Either you provide
  enough
  memory to do all the stuff in memory (fast) or you use a scratch file to
  save
  memory (slow)
  
  And yes, there is room for an improvement of the memory handling (read on
  demand, remove after usage) in PDFBox, but that is some future feature.
  Patches
  are welcome.
  
   2) Besides, as you see thumbnail images are loaded in separate thread.
   While
   this thread is running and I try to
   get big image for main content using   BufferedImage
   bi=pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB); I get the
   following exception:
   
   java.io.IOException: java.util.zip.DataFormatException: unknown
   compression
   method
       at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
       at org.apache.pdfbox.cos.COSStream.attemptDecode(COSStream.java:422)
       at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:398)
       at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:335)
       at
   org.apache.pdfbox.cos.COSStream.checkUnfilteredBuffer(COSStream.java:265)
       at
   org.apache.pdfbox.cos.COSStream.getUnfilteredRandomAccess(COSStream.java:239)
       at
   org.apache.pdfbox.pdfparser.BaseParser.init(BaseParser.java:146)
       at
   org.apache.pdfbox.pdfparser.PDFStreamParser.init(PDFStreamParser.java:78)
       at
   org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:451)
       at
   org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:438)
       at
   org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
       at
   org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:180)
       at
   org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:205)
       at
   org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:136)
       at
   org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:95)
     
       at javafx.concurrent.Task$TaskCallable.call(Task.java:1423)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.lang.Thread.run(Thread.java:745)
   Caused by: java.util.zip.DataFormatException: unknown compression method
       at java.util.zip.Inflater.inflateBytes(Native Method)
       at java.util.zip.Inflater.inflate(Inflater.java:259)
       at java.util.zip.Inflater.inflate(Inflater.java:280)
       at
   org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:101)
       at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:74)
       ... 20 more
   
   How to solve these problems?
  PDFBox isn't supposed to be thread safe.
  
   
   
   Среда,  1 июля 2015, 13:17 +02:00 от Andreas Lehmkühler 
andr...@lehmi.de
   :
   
   
Alex Sviridov   ooo_satu...@mail.ru  hat am 1. Juli 2015 um 13:09
geschrieben:


 I decided to show all the code. I also send the pdf file - some file
from
internet I use for testing.
   The attachment didn't make it 

Re: PDFRenderer, PDDocument memory issue

2015-07-01 Thread Andreas Lehmkühler


 Alex Sviridov ooo_satu...@mail.ru hat am 1. Juli 2015 um 10:16 geschrieben:
 
 
  I want to display all page thumbnails. However I came across memory size
 problem with PDFRenderer or PDDocument - I don't know which one. 
 
 I have the following code:
    
     private PDDocument pdfDocument;
     
     private PDFRenderer pdfRenderer;
 
     public WritableImage getPageThumbImage(int page){
     WritableImage result=null;
     try {
     BufferedImage bi=pdfRenderer.renderImageWithDPI(page, 12,
 ImageType.RGB);
     result=SwingFXUtils.toFXImage(bi, null);
     } catch (IOException ex) {
  
     }
     return result;
     }
  .
 The method getPageThumbImage I run in for loop for every page.I set java
 memory heap to 500mb. 
 And I can get about 30 images using getPageThumbImage (if I set more memory I
 get more). 
 In my application I have real time memory graphs and they show that memory is
 very fast filled. 
 When there is no more free memory getPageThumbImage hangs - no exception,
 nothing. But the code stops.
 When I do pdfDocument=null,pdfRenderer=null I get about 400mb free memory. How
 to solve this problem?
There are 2 possible issues and maybe both are relevant.

1. PDFBox consumes more or less memory to load a pdf depending on the size and
the content of the pdf.

- Are you using the latest 2.0.0-SNAPSHOT? There were some improvements
concerning the memory footprint lately
- Try to use of a scratch file (there are load methods including a boolean
switcht ot activate that)

2. Your own implementation consumes more or less memory to process those
thumbnails

- check if you are releasing all resources (ecspecially those images you're
creating) you are using during your process

HTH,
Andreas

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re[2]: PDFRenderer, PDDocument memory issue

2015-07-01 Thread Alex Sviridov
 Thank you for answer. I tried pdfbox-app-2.0.0-20150630.220424-1464.jar the 
result is the same.

When I create images I add them to javafx FlowPane. However, the problem is not 
in images because I repeat - I get 400mb when I do 
pdfDocument=null,pdfRenderer=null.

Bedised, when I do pdfDocument = PDDocument.load(new File(fileName)) I don't 
have any problems with memory. 

I'm getting problem with memory when I run in for loop getPageThumbImage.

I am sure that the problem is in PdfBox. Please, help me.


Среда,  1 июля 2015, 12:48 +02:00 от Andreas Lehmkühler andr...@lehmi.de:


 Alex Sviridov  ooo_satu...@mail.ru  hat am 1. Juli 2015 um 10:16 
 geschrieben:
 
 
  I want to display all page thumbnails. However I came across memory size
 problem with PDFRenderer or PDDocument - I don't know which one. 
 
 I have the following code:
    
     private PDDocument pdfDocument;
     
     private PDFRenderer pdfRenderer;
 
     public WritableImage getPageThumbImage(int page){
     WritableImage result=null;
     try {
     BufferedImage bi=pdfRenderer.renderImageWithDPI(page, 12,
 ImageType.RGB);
     result=SwingFXUtils.toFXImage(bi, null);
     } catch (IOException ex) {
  
     }
     return result;
     }
  .
 The method getPageThumbImage I run in for loop for every page.I set java
 memory heap to 500mb. 
 And I can get about 30 images using getPageThumbImage (if I set more memory I
 get more). 
 In my application I have real time memory graphs and they show that memory is
 very fast filled. 
 When there is no more free memory getPageThumbImage hangs - no exception,
 nothing. But the code stops.
 When I do pdfDocument=null,pdfRenderer=null I get about 400mb free memory. 
 How
 to solve this problem?
There are 2 possible issues and maybe both are relevant.

1. PDFBox consumes more or less memory to load a pdf depending on the size and
the content of the pdf.

- Are you using the latest 2.0.0-SNAPSHOT? There were some improvements
concerning the memory footprint lately
- Try to use of a scratch file (there are load methods including a boolean
switcht ot activate that)

2. Your own implementation consumes more or less memory to process those
thumbnails

- check if you are releasing all resources (ecspecially those images you're
creating) you are using during your process

HTH,
Andreas

-
To unsubscribe, e-mail:  users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail:  users-h...@pdfbox.apache.org



-- 
Alex Sviridov


Re: Re[4]: PDFRenderer, PDDocument memory issue

2015-07-01 Thread Andreas Lehmkühler


 Alex Sviridov ooo_satu...@mail.ru hat am 1. Juli 2015 um 13:09 geschrieben:
 
 
  I decided to show all the code. I also send the pdf file - some file from
 internet I use for testing.
The attachment didn't make it due to some restrictions to the mailing list.
Please post a link to the origin source or another place where we can download
the pdf in question.

 
 Task task = new Task() {
     @Override protected Integer call() throws Exception {
     for (int i=0;imodel.getTotalPages();i++){
     System.out.println(Point a:+i);
     WritableImage writableImage=model.getPageThumbImage(i);
     System.out.println(Point b:+i);
     ImageView imageView=new ImageView(writableImage);
     System.out.println(Point c:+i);
     Label label=new Label(Integer.toString(i+1));
     System.out.println(Point d:+i);
     VBox vBox=new VBox(imageView,label);
     System.out.println(Point e:+i);
     vBox.setAlignment(Pos.CENTER);
     vBox.setStyle(-fx-padding:5px 5px 5px
 5px;-fx-background-color:red);
     System.out.println(Point f:+i);
     Platform.runLater(new Runnable() {
     @Override
     public void run() {
  thumbFlowPane.getChildren().add(vBox);
     }
     });
     }
     return null;
     }
 };
 new Thread(task).start();
 
 And here is the tail of the output
 
 Point a:30
 Point b:30
 Point c:30
 Point d:30
 Point e:30
 Point f:30
 Point a:31
 
 What is scratch file? Sorry, I don't understand you.

PDFBox holds a lot of temporary data in the memory. To reduce the memory
footprint one can choose to use a scratch file instead, so that some/most of
that data will be hold in a file.

To do so, simply use another load method, e.g. 

load(File file, boolean useScratchFiles)
 
 
 
 
 
 
 Среда,  1 июля 2015, 13:04 +02:00 от Andreas Lehmkühler andr...@lehmi.de:
 
 
  Alex Sviridov  ooo_satu...@mail.ru  hat am 1. Juli 2015 um 12:58
  geschrieben:
  
  
   Thank you for answer. I tried pdfbox-app-2.0.0-20150630.220424-1464.jar
  the
  result is the same.
  
  When I create images I add them to javafx FlowPane. However, the problem is
  not in images because I repeat - I get 400mb when I do
  pdfDocument=null,pdfRenderer=null.
  
  Bedised, when I do pdfDocument = PDDocument.load(new File(fileName)) I
  don't
  have any problems with memory. 
  
  I'm getting problem with memory when I run in for loop getPageThumbImage.
  
  I am sure that the problem is in PdfBox. Please, help me.
 Maybe, but I'm not sure at all.
 
 Try to use the scratch file.
 
  Среда,  1 июля 2015, 12:48 +02:00 от Andreas Lehmkühler  andr...@lehmi.de
  :
  
  
   Alex Sviridov   ooo_satu...@mail.ru  hat am 1. Juli 2015 um 10:16
   geschrieben:
   
   
I want to display all page thumbnails. However I came across memory
   size
   problem with PDFRenderer or PDDocument - I don't know which one. 
   
   I have the following code:
      
       private PDDocument pdfDocument;
       
       private PDFRenderer pdfRenderer;
   
       public WritableImage getPageThumbImage(int page){
       WritableImage result=null;
       try {
       BufferedImage bi=pdfRenderer.renderImageWithDPI(page, 12,
   ImageType.RGB);
       result=SwingFXUtils.toFXImage(bi, null);
       } catch (IOException ex) {
    
       }
       return result;
       }
    .
   The method getPageThumbImage I run in for loop for every page.I set java
   memory heap to 500mb. 
   And I can get about 30 images using getPageThumbImage (if I set more
   memory
   I
   get more). 
   In my application I have real time memory graphs and they show that
   memory
   is
   very fast filled. 
   When there is no more free memory getPageThumbImage hangs - no
   exception,
   nothing. But the code stops.
   When I do pdfDocument=null,pdfRenderer=null I get about 400mb free
   memory.
   How
   to solve this problem?
  There are 2 possible issues and maybe both are relevant.
  
  1. PDFBox consumes more or less memory to load a pdf depending on the size
  and
  the content of the pdf.
  
  - Are you using the latest 2.0.0-SNAPSHOT? There were some improvements
  concerning the memory footprint lately
  - Try to use of a scratch file (there are load methods including a boolean
  switcht ot activate that)
  
  2. Your own implementation consumes more or less memory to process those
  thumbnails
  
  - check if you are releasing all resources (ecspecially those images
  you're
  creating) you are using during your process
  
  HTH,
  Andreas
  
  -
  To unsubscribe, e-mail:  users-unsubscr...@pdfbox.apache.org
  For additional commands, e-mail:  users-h...@pdfbox.apache.org
  
  
  
  -- 
  Alex Sviridov
 
 BR
 Andreas
 
 

Re[8]: PDFRenderer, PDDocument memory issue

2015-07-01 Thread Alex Sviridov
 Ok. Thank you very much for explanation. Could you say where this scratch file 
is located linux/windows?


Среда,  1 июля 2015, 13:54 +02:00 от Andreas Lehmkühler andr...@lehmi.de:
 Alex Sviridov  ooo_satu...@mail.ru  hat am 1. Juli 2015 um 13:38 
 geschrieben:
 
 
  The file is here  https://yadi.sk/i/Y0fTuvHmhbZiE
Ah, that explains a lot. The pdf is a scanned document, every page holds a 
color
image, consuming a lot of memory when processed

 I tried with load (fileName,true). The result - now I don't have memory
 problems. However now I have 2 problems:

 1) All the thumbnail images are loaded. However, the speed is VERY SLOW. One
 thumbnail image is loaded about 4 seconds! 
If it comes to huge pdfs, you have to die one death. Either you provide enough
memory to do all the stuff in memory (fast) or you use a scratch file to save
memory (slow)

And yes, there is room for an improvement of the memory handling (read on
demand, remove after usage) in PDFBox, but that is some future feature. Patches
are welcome.

 2) Besides, as you see thumbnail images are loaded in separate thread. While
 this thread is running and I try to
 get big image for main content using   BufferedImage
 bi=pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB); I get the
 following exception:
 
 java.io.IOException: java.util.zip.DataFormatException: unknown compression
 method
     at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
     at org.apache.pdfbox.cos.COSStream.attemptDecode(COSStream.java:422)
     at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:398)
     at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:335)
     at
 org.apache.pdfbox.cos.COSStream.checkUnfilteredBuffer(COSStream.java:265)
     at
 org.apache.pdfbox.cos.COSStream.getUnfilteredRandomAccess(COSStream.java:239)
     at org.apache.pdfbox.pdfparser.BaseParser.init(BaseParser.java:146)
     at
 org.apache.pdfbox.pdfparser.PDFStreamParser.init(PDFStreamParser.java:78)
     at
 org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:451)
     at
 org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:438)
     at
 org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
     at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:180)
     at
 org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:205)
     at
 org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:136)
     at
 org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:95)
   
     at javafx.concurrent.Task$TaskCallable.call(Task.java:1423)
     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
     at java.lang.Thread.run(Thread.java:745)
 Caused by: java.util.zip.DataFormatException: unknown compression method
     at java.util.zip.Inflater.inflateBytes(Native Method)
     at java.util.zip.Inflater.inflate(Inflater.java:259)
     at java.util.zip.Inflater.inflate(Inflater.java:280)
     at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:101)
     at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:74)
     ... 20 more
 
 How to solve these problems?
PDFBox isn't supposed to be thread safe.

 
 
 Среда,  1 июля 2015, 13:17 +02:00 от Andreas Lehmkühler  andr...@lehmi.de :
 
 
  Alex Sviridov   ooo_satu...@mail.ru  hat am 1. Juli 2015 um 13:09
  geschrieben:
  
  
   I decided to show all the code. I also send the pdf file - some file from
  internet I use for testing.
 The attachment didn't make it due to some restrictions to the mailing list.
 Please post a link to the origin source or another place where we can
 download
 the pdf in question.
 
  
  Task task = new Task() {
      @Override protected Integer call() throws Exception {
      for (int i=0;imodel.getTotalPages();i++){
      System.out.println(Point a:+i);
      WritableImage writableImage=model.getPageThumbImage(i);
      System.out.println(Point b:+i);
      ImageView imageView=new ImageView(writableImage);
      System.out.println(Point c:+i);
      Label label=new Label(Integer.toString(i+1));
      System.out.println(Point d:+i);
      VBox vBox=new VBox(imageView,label);
      System.out.println(Point e:+i);
      vBox.setAlignment(Pos.CENTER);
      vBox.setStyle(-fx-padding:5px 5px 5px
  5px;-fx-background-color:red);
      System.out.println(Point f:+i);
      Platform.runLater(new Runnable() {
      @Override
      public void run() {
   thumbFlowPane.getChildren().add(vBox);
      }
      });
      }
      return null;
      }
  };
  new Thread(task).start();
  
  And here is the tail of the output
  
  Point a:30
  Point b:30
  Point c:30
  Point d:30
  Point e:30
  Point f:30
  Point a:31
  
  What is 

Re: Re[2]: PDFRenderer, PDDocument memory issue

2015-07-01 Thread Andreas Lehmkühler


 Alex Sviridov ooo_satu...@mail.ru hat am 1. Juli 2015 um 12:58 geschrieben:
 
 
  Thank you for answer. I tried pdfbox-app-2.0.0-20150630.220424-1464.jar the
 result is the same.
 
 When I create images I add them to javafx FlowPane. However, the problem is
 not in images because I repeat - I get 400mb when I do
 pdfDocument=null,pdfRenderer=null.
 
 Bedised, when I do pdfDocument = PDDocument.load(new File(fileName)) I don't
 have any problems with memory. 
 
 I'm getting problem with memory when I run in for loop getPageThumbImage.
 
 I am sure that the problem is in PdfBox. Please, help me.
Maybe, but I'm not sure at all.

Try to use the scratch file.

 Среда,  1 июля 2015, 12:48 +02:00 от Andreas Lehmkühler andr...@lehmi.de:
 
 
  Alex Sviridov  ooo_satu...@mail.ru  hat am 1. Juli 2015 um 10:16
  geschrieben:
  
  
   I want to display all page thumbnails. However I came across memory size
  problem with PDFRenderer or PDDocument - I don't know which one. 
  
  I have the following code:
     
      private PDDocument pdfDocument;
      
      private PDFRenderer pdfRenderer;
  
      public WritableImage getPageThumbImage(int page){
      WritableImage result=null;
      try {
      BufferedImage bi=pdfRenderer.renderImageWithDPI(page, 12,
  ImageType.RGB);
      result=SwingFXUtils.toFXImage(bi, null);
      } catch (IOException ex) {
   
      }
      return result;
      }
   .
  The method getPageThumbImage I run in for loop for every page.I set java
  memory heap to 500mb. 
  And I can get about 30 images using getPageThumbImage (if I set more memory
  I
  get more). 
  In my application I have real time memory graphs and they show that memory
  is
  very fast filled. 
  When there is no more free memory getPageThumbImage hangs - no exception,
  nothing. But the code stops.
  When I do pdfDocument=null,pdfRenderer=null I get about 400mb free memory.
  How
  to solve this problem?
 There are 2 possible issues and maybe both are relevant.
 
 1. PDFBox consumes more or less memory to load a pdf depending on the size
 and
 the content of the pdf.
 
 - Are you using the latest 2.0.0-SNAPSHOT? There were some improvements
 concerning the memory footprint lately
 - Try to use of a scratch file (there are load methods including a boolean
 switcht ot activate that)
 
 2. Your own implementation consumes more or less memory to process those
 thumbnails
 
 - check if you are releasing all resources (ecspecially those images you're
 creating) you are using during your process
 
 HTH,
 Andreas
 
 -
 To unsubscribe, e-mail:  users-unsubscr...@pdfbox.apache.org
 For additional commands, e-mail:  users-h...@pdfbox.apache.org
 
 
 
 -- 
 Alex Sviridov

BR
Andreas

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re[4]: PDFRenderer, PDDocument memory issue

2015-07-01 Thread Alex Sviridov
 I decided to show all the code. I also send the pdf file - some file from 
internet I use for testing.

Task task = new Task() {
    @Override protected Integer call() throws Exception {
    for (int i=0;imodel.getTotalPages();i++){
    System.out.println(Point a:+i);
    WritableImage writableImage=model.getPageThumbImage(i);
    System.out.println(Point b:+i);
    ImageView imageView=new ImageView(writableImage);
    System.out.println(Point c:+i);
    Label label=new Label(Integer.toString(i+1));
    System.out.println(Point d:+i);
    VBox vBox=new VBox(imageView,label);
    System.out.println(Point e:+i);
    vBox.setAlignment(Pos.CENTER);
    vBox.setStyle(-fx-padding:5px 5px 5px 
5px;-fx-background-color:red);
    System.out.println(Point f:+i);
    Platform.runLater(new Runnable() {
    @Override
    public void run() {
 thumbFlowPane.getChildren().add(vBox);
    }
    });
    }
    return null;
    }
};
new Thread(task).start();

And here is the tail of the output

Point a:30
Point b:30
Point c:30
Point d:30
Point e:30
Point f:30
Point a:31

What is scratch file? Sorry, I don't understand you.






Среда,  1 июля 2015, 13:04 +02:00 от Andreas Lehmkühler andr...@lehmi.de:


 Alex Sviridov  ooo_satu...@mail.ru  hat am 1. Juli 2015 um 12:58 
 geschrieben:
 
 
  Thank you for answer. I tried pdfbox-app-2.0.0-20150630.220424-1464.jar the
 result is the same.
 
 When I create images I add them to javafx FlowPane. However, the problem is
 not in images because I repeat - I get 400mb when I do
 pdfDocument=null,pdfRenderer=null.
 
 Bedised, when I do pdfDocument = PDDocument.load(new File(fileName)) I don't
 have any problems with memory. 
 
 I'm getting problem with memory when I run in for loop getPageThumbImage.
 
 I am sure that the problem is in PdfBox. Please, help me.
Maybe, but I'm not sure at all.

Try to use the scratch file.

 Среда,  1 июля 2015, 12:48 +02:00 от Andreas Lehmkühler  andr...@lehmi.de :
 
 
  Alex Sviridov   ooo_satu...@mail.ru  hat am 1. Juli 2015 um 10:16
  geschrieben:
  
  
   I want to display all page thumbnails. However I came across memory size
  problem with PDFRenderer or PDDocument - I don't know which one. 
  
  I have the following code:
     
      private PDDocument pdfDocument;
      
      private PDFRenderer pdfRenderer;
  
      public WritableImage getPageThumbImage(int page){
      WritableImage result=null;
      try {
      BufferedImage bi=pdfRenderer.renderImageWithDPI(page, 12,
  ImageType.RGB);
      result=SwingFXUtils.toFXImage(bi, null);
      } catch (IOException ex) {
   
      }
      return result;
      }
   .
  The method getPageThumbImage I run in for loop for every page.I set java
  memory heap to 500mb. 
  And I can get about 30 images using getPageThumbImage (if I set more 
  memory
  I
  get more). 
  In my application I have real time memory graphs and they show that memory
  is
  very fast filled. 
  When there is no more free memory getPageThumbImage hangs - no exception,
  nothing. But the code stops.
  When I do pdfDocument=null,pdfRenderer=null I get about 400mb free memory.
  How
  to solve this problem?
 There are 2 possible issues and maybe both are relevant.
 
 1. PDFBox consumes more or less memory to load a pdf depending on the size
 and
 the content of the pdf.
 
 - Are you using the latest 2.0.0-SNAPSHOT? There were some improvements
 concerning the memory footprint lately
 - Try to use of a scratch file (there are load methods including a boolean
 switcht ot activate that)
 
 2. Your own implementation consumes more or less memory to process those
 thumbnails
 
 - check if you are releasing all resources (ecspecially those images you're
 creating) you are using during your process
 
 HTH,
 Andreas
 
 -
 To unsubscribe, e-mail:  users-unsubscr...@pdfbox.apache.org
 For additional commands, e-mail:  users-h...@pdfbox.apache.org
 
 
 
 -- 
 Alex Sviridov

BR
Andreas

-
To unsubscribe, e-mail:  users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail:  users-h...@pdfbox.apache.org



-- 
Alex Sviridov

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re[10]: PDFRenderer, PDDocument memory issue

2015-07-01 Thread Alex Sviridov
 Ok. Thank you again. I just don't understand one thing. What is the reason to 
keep so large data if I only need to take page images and the most important I 
DO IT BY PAGE?

Is there no way not to keep data for previous pages if I need only data for 
page N?


Среда,  1 июля 2015, 14:08 +02:00 от Andreas Lehmkühler andr...@lehmi.de:


 Alex Sviridov  ooo_satu...@mail.ru  hat am 1. Juli 2015 um 13:59 
 geschrieben:
 
 
  Ok. Thank you very much for explanation. Could you say where this scratch
 file is located linux/windows?
java.io.File.createTempFile is used to create that file. It uses the default
temp directory. It's /tmp on linux. I'm not sure for windows as different
environment variables (TMP, TEMP, USERPROFILE, ) are used to search for 
such
a directory.

You may define your own temp directory using the following parameter when
starting your application

-Djava.io.tmpdir=PATH-TO-YOUR-TEMP


 
 
 Среда,  1 июля 2015, 13:54 +02:00 от Andreas Lehmkühler  andr...@lehmi.de :
  Alex Sviridov   ooo_satu...@mail.ru  hat am 1. Juli 2015 um 13:38
  geschrieben:
  
  
   The file is here  https://yadi.sk/i/Y0fTuvHmhbZiE
 Ah, that explains a lot. The pdf is a scanned document, every page holds a
 color
 image, consuming a lot of memory when processed
 
  I tried with load (fileName,true). The result - now I don't have memory
  problems. However now I have 2 problems:
 
  1) All the thumbnail images are loaded. However, the speed is VERY SLOW.
  One
  thumbnail image is loaded about 4 seconds! 
 If it comes to huge pdfs, you have to die one death. Either you provide
 enough
 memory to do all the stuff in memory (fast) or you use a scratch file to 
 save
 memory (slow)
 
 And yes, there is room for an improvement of the memory handling (read on
 demand, remove after usage) in PDFBox, but that is some future feature.
 Patches
 are welcome.
 
  2) Besides, as you see thumbnail images are loaded in separate thread.
  While
  this thread is running and I try to
  get big image for main content using   BufferedImage
  bi=pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB); I get the
  following exception:
  
  java.io.IOException: java.util.zip.DataFormatException: unknown 
  compression
  method
      at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
      at org.apache.pdfbox.cos.COSStream.attemptDecode(COSStream.java:422)
      at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:398)
      at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:335)
      at
  org.apache.pdfbox.cos.COSStream.checkUnfilteredBuffer(COSStream.java:265)
      at
  org.apache.pdfbox.cos.COSStream.getUnfilteredRandomAccess(COSStream.java:239)
      at org.apache.pdfbox.pdfparser.BaseParser.init(BaseParser.java:146)
      at
  org.apache.pdfbox.pdfparser.PDFStreamParser.init(PDFStreamParser.java:78)
      at
  org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:451)
      at
  org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:438)
      at
  org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
      at 
  org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:180)
      at
  org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:205)
      at
  org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:136)
      at
  org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:95)
    
      at javafx.concurrent.Task$TaskCallable.call(Task.java:1423)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.lang.Thread.run(Thread.java:745)
  Caused by: java.util.zip.DataFormatException: unknown compression method
      at java.util.zip.Inflater.inflateBytes(Native Method)
      at java.util.zip.Inflater.inflate(Inflater.java:259)
      at java.util.zip.Inflater.inflate(Inflater.java:280)
      at
  org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:101)
      at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:74)
      ... 20 more
  
  How to solve these problems?
 PDFBox isn't supposed to be thread safe.
 
  
  
  Среда,  1 июля 2015, 13:17 +02:00 от Andreas Lehmkühler   
  andr...@lehmi.de
  :
  
  
   Alex Sviridov   ooo_satu...@mail.ru  hat am 1. Juli 2015 um 13:09
   geschrieben:
   
   
I decided to show all the code. I also send the pdf file - some file
   from
   internet I use for testing.
  The attachment didn't make it due to some restrictions to the mailing
  list.
  Please post a link to the origin source or another place where we can
  download
  the pdf in question.
  
   
   Task task = new Task() {
       @Override protected Integer call() throws Exception {
       for (int i=0;imodel.getTotalPages();i++){
       System.out.println(Point a:+i);
       WritableImage writableImage=model.getPageThumbImage(i);
       System.out.println(Point b:+i);
    

Re: PDFRenderer, PDDocument memory issue

2015-07-01 Thread Tilman Hausherr

Am 01.07.2015 um 10:16 schrieb Alex Sviridov:

In my application I have real time memory graphs and they show that memory is 
very fast filled.
When there is no more free memory getPageThumbImage hangs - no exception, 
nothing. But the code stops.
When I do pdfDocument=null,pdfRenderer=null I get about 400mb free memory. How 
to solve this problem?


If you're building from source, try this: in PDImageXObject.java, remove 
the line cachedImage = image;. This will consume less space if you 
have large PDFs with many images.


Tilman



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: PDFRenderer, PDDocument memory issue

2015-07-01 Thread John Hewson

 On 1 Jul 2015, at 07:52, Tilman Hausherr thaush...@t-online.de wrote:
 
 Am 01.07.2015 um 10:16 schrieb Alex Sviridov:
 In my application I have real time memory graphs and they show that memory 
 is very fast filled.
 When there is no more free memory getPageThumbImage hangs - no exception, 
 nothing. But the code stops.
 When I do pdfDocument=null,pdfRenderer=null I get about 400mb free memory. 
 How to solve this problem?
 
 If you're building from source, try this: in PDImageXObject.java, remove the 
 line cachedImage = image;. This will consume less space if you have large 
 PDFs with many images.

We don't retain XObjects across pages (anymore), so that shouldn't be the cause 
of his gradual memory increase?

 Tilman
 
 
 
 -
 To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
 For additional commands, e-mail: users-h...@pdfbox.apache.org
 

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org