Re: Re[6]: PDFRenderer, PDDocument memory issue

2015-07-01 Thread Andreas Lehmkühler
> Alex Sviridov  hat am 1. Juli 2015 um 13:38 geschrieben:
> 
> 
>  The file is here  https://yadi.sk/i/Y0fTuvHmhbZiE
Ah, that explains a lot. The pdf is a scanned document, every page holds a color
image, consuming a lot of memory when processed

> I tried with load (fileName,true). The result - now I don't have memory
> problems. However now I have 2 problems:
>
> 1) All the thumbnail images are loaded. However, the speed is VERY SLOW. One
> thumbnail image is loaded about 4 seconds! 
If it comes to huge pdfs, you have to die one death. Either you provide enough
memory to do all the stuff in memory (fast) or you use a scratch file to save
memory (slow)

And yes, there is room for an improvement of the memory handling (read on
demand, remove after usage) in PDFBox, but that is some future feature. Patches
are welcome.

> 2) Besides, as you see thumbnail images are loaded in separate thread. While
> this thread is running and I try to
> get big image for main content using   BufferedImage
> bi=pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB); I get the
> following exception:
> 
> java.io.IOException: java.util.zip.DataFormatException: unknown compression
> method
>     at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
>     at org.apache.pdfbox.cos.COSStream.attemptDecode(COSStream.java:422)
>     at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:398)
>     at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:335)
>     at
> org.apache.pdfbox.cos.COSStream.checkUnfilteredBuffer(COSStream.java:265)
>     at
> org.apache.pdfbox.cos.COSStream.getUnfilteredRandomAccess(COSStream.java:239)
>     at org.apache.pdfbox.pdfparser.BaseParser.(BaseParser.java:146)
>     at
> org.apache.pdfbox.pdfparser.PDFStreamParser.(PDFStreamParser.java:78)
>     at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:451)
>     at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:438)
>     at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
>     at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:180)
>     at
> org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:205)
>     at
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:136)
>     at
> org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:95)
>   
>     at javafx.concurrent.Task$TaskCallable.call(Task.java:1423)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.zip.DataFormatException: unknown compression method
>     at java.util.zip.Inflater.inflateBytes(Native Method)
>     at java.util.zip.Inflater.inflate(Inflater.java:259)
>     at java.util.zip.Inflater.inflate(Inflater.java:280)
>     at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:101)
>     at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:74)
>     ... 20 more
> 
> How to solve these problems?
PDFBox isn't supposed to be thread safe.

> 
> 
> Среда,  1 июля 2015, 13:17 +02:00 от Andreas Lehmkühler :
> >
> >
> >> Alex Sviridov < ooo_satu...@mail.ru > hat am 1. Juli 2015 um 13:09
> >> geschrieben:
> >> 
> >> 
> >>  I decided to show all the code. I also send the pdf file - some file from
> >> internet I use for testing.
> >The attachment didn't make it due to some restrictions to the mailing list.
> >Please post a link to the origin source or another place where we can
> >download
> >the pdf in question.
> >
> >> 
> >> Task task = new Task() {
> >>     @Override protected Integer call() throws Exception {
> >>     for (int i=0;i >>     System.out.println("Point a:"+i);
> >>     WritableImage writableImage=model.getPageThumbImage(i);
> >>     System.out.println("Point b:"+i);
> >>     ImageView imageView=new ImageView(writableImage);
> >>     System.out.println("Point c:"+i);
> >>     Label label=new Label(Integer.toString(i+1));
> >>     System.out.println("Point d:"+i);
> >>     VBox vBox=new VBox(imageView,label);
> >>     System.out.println("Point e:"+i);
> >>     vBox.setAlignment(Pos.CENTER);
> >>     vBox.setStyle("-fx-padding:5px 5px 5px
> >> 5px;-fx-background-color:red");
> >>     System.out.println("Point f:"+i);
> >>     Platform.runLater(new Runnable() {
> >>     @Override
> >>     public void run() {
> >>  thumbFlowPane.getChildren().add(vBox);
> >>     }
> >>     });
> >>     }
> >>     return null;
> >>     }
> >> };
> >> new Thread(task).start();
> >> 
> >> And here is the tail of the output
> >> 
> >> Point a:30
> >> Point b:30
> >> Point c:30
> >> Point d:30
> >> Point e:30
> >> Point f:30
> >> Point a:31
> >> 
> >> What is scratch file? Sorry, I don't understand you.

Re[6]: PDFRenderer, PDDocument memory issue

2015-07-01 Thread Alex Sviridov
 The file is here  https://yadi.sk/i/Y0fTuvHmhbZiE

I tried with load (fileName,true). The result - now I don't have memory 
problems. However now I have 2 problems:

1) All the thumbnail images are loaded. However, the speed is VERY SLOW. One 
thumbnail image is loaded about 4 seconds! 

2) Besides, as you see thumbnail images are loaded in separate thread. While 
this thread is running and I try to
get big image for main content using   BufferedImage 
bi=pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB); I get the 
following exception:

java.io.IOException: java.util.zip.DataFormatException: unknown compression 
method
    at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
    at org.apache.pdfbox.cos.COSStream.attemptDecode(COSStream.java:422)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:398)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:335)
    at org.apache.pdfbox.cos.COSStream.checkUnfilteredBuffer(COSStream.java:265)
    at 
org.apache.pdfbox.cos.COSStream.getUnfilteredRandomAccess(COSStream.java:239)
    at org.apache.pdfbox.pdfparser.BaseParser.(BaseParser.java:146)
    at 
org.apache.pdfbox.pdfparser.PDFStreamParser.(PDFStreamParser.java:78)
    at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:451)
    at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:438)
    at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
    at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:180)
    at org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:205)
    at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:136)
    at 
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:95)
  
    at javafx.concurrent.Task$TaskCallable.call(Task.java:1423)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.zip.DataFormatException: unknown compression method
    at java.util.zip.Inflater.inflateBytes(Native Method)
    at java.util.zip.Inflater.inflate(Inflater.java:259)
    at java.util.zip.Inflater.inflate(Inflater.java:280)
    at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:101)
    at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:74)
    ... 20 more

How to solve these problems?


Среда,  1 июля 2015, 13:17 +02:00 от Andreas Lehmkühler :
>
>
>> Alex Sviridov < ooo_satu...@mail.ru > hat am 1. Juli 2015 um 13:09 
>> geschrieben:
>> 
>> 
>>  I decided to show all the code. I also send the pdf file - some file from
>> internet I use for testing.
>The attachment didn't make it due to some restrictions to the mailing list.
>Please post a link to the origin source or another place where we can download
>the pdf in question.
>
>> 
>> Task task = new Task() {
>>     @Override protected Integer call() throws Exception {
>>     for (int i=0;i>     System.out.println("Point a:"+i);
>>     WritableImage writableImage=model.getPageThumbImage(i);
>>     System.out.println("Point b:"+i);
>>     ImageView imageView=new ImageView(writableImage);
>>     System.out.println("Point c:"+i);
>>     Label label=new Label(Integer.toString(i+1));
>>     System.out.println("Point d:"+i);
>>     VBox vBox=new VBox(imageView,label);
>>     System.out.println("Point e:"+i);
>>     vBox.setAlignment(Pos.CENTER);
>>     vBox.setStyle("-fx-padding:5px 5px 5px
>> 5px;-fx-background-color:red");
>>     System.out.println("Point f:"+i);
>>     Platform.runLater(new Runnable() {
>>     @Override
>>     public void run() {
>>  thumbFlowPane.getChildren().add(vBox);
>>     }
>>     });
>>     }
>>     return null;
>>     }
>> };
>> new Thread(task).start();
>> 
>> And here is the tail of the output
>> 
>> Point a:30
>> Point b:30
>> Point c:30
>> Point d:30
>> Point e:30
>> Point f:30
>> Point a:31
>> 
>> What is scratch file? Sorry, I don't understand you.
>
>PDFBox holds a lot of temporary data in the memory. To reduce the memory
>footprint one can choose to use a scratch file instead, so that some/most of
>that data will be hold in a file.
>
>To do so, simply use another load method, e.g. 
>
>load(File file, boolean useScratchFiles)
>> 
>> 
>> 
>> 
>> 
>> 
>> Среда,  1 июля 2015, 13:04 +02:00 от Andreas Lehmkühler < andr...@lehmi.de >:
>> >
>> >
>> >> Alex Sviridov <  ooo_satu...@mail.ru > hat am 1. Juli 2015 um 12:58
>> >> geschrieben:
>> >> 
>> >> 
>> >>  Thank you for answer. I tried pdfbox-app-2.0.0-20150630.220424-1464.jar
>> >> the
>> >> result is the same.
>> >> 
>> >> When I create images I add them to javafx FlowPane. However, the problem 
>> >> is
>> >> not in images because I repeat - I get 400mb when I