[jira] [Commented] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402217#comment-16402217
 ] 

Tilman Hausherr commented on PDFBOX-4151:
-

Your revised patch now works fine, thank. I tested by changing the {{-Xmx}} 
values. With the trunk, it worked with {{-Xmx420m}} and not with {{-Xmx410m}}. 
With your change, it worked with {{-Xmx380m}} and not with {{-Xmx370m}}. So 
we're saving around 40MB.

Next thing I'll do is to review this... If I commit this change, I'll wait 
after the release of 2.0.9 (which is currently planned for next week), I'm 
getting more and scared of doing last minute changes...

[~marekpribula] wrote:
{quote}The subsampling helps with our problem file - the memory usage gets to 
the half of the previous values. But I am also confused why is it because 
changes in Filters play no role.
{quote}
It helps because the result images get smaller.

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, 
> PDFBOX-2554-cmykrasterobjecttypes.pdf, TEST.pdf, gs-bugzilla690022.pdf, 
> pop-bugzilla93476.pdf, predictor_stream.patch, predictor_stream_rev2.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   

[jira] [Commented] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-16 Thread Itai Shaked (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401886#comment-16401886
 ] 

Itai Shaked commented on PDFBOX-4151:
-

Followup on the rendering issue - it was most probably this Java bug: 
[https://bugs.openjdk.java.net/browse/JDK-8048782]  so the solution is to 
either use Oracle JRE or Java>=9. 

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, 
> PDFBOX-2554-cmykrasterobjecttypes.pdf, TEST.pdf, gs-bugzilla690022.pdf, 
> pop-bugzilla93476.pdf, predictor_stream.patch, predictor_stream_rev2.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch 

[jira] [Commented] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-15 Thread Itai Shaked (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401144#comment-16401144
 ] 

Itai Shaked commented on PDFBOX-4151:
-

The issue with the file not rendering apparently has something to do with the 
JRE I'm using - it fails to render (with or without twelvemonkeys) with OpenJDK 
1.8.0_151, but renders fine with Oracle's 9.0.4. 

I'm attaching a revised patch. The differences in rendering probably resulted 
from two bugs -
 # I did not account for the last row being allowed to be shorter, and needing 
to be padded with zeroes. 
 # The file pop-bugzilla93476.pdf is almost certainly corrupted, as at some 
point the predictor values it produces are 45 then 9. In the current 
implementation this would be ignored, but in my patch that resulted in 
resetting the "different predictor per row" option. I have changed the code so 
it should match the current implementation. 

As per any improvements - rendering time seems virtually unchanged, while 
memory can be slightly improved (possibly depending on when the JVM decides to 
GC?). The best results I've seen were when making sure there was enough heap 
space for the entire render operation to complete without any GC runs (which is 
possibly when saving memory is least important?). Here is the code I used to 
test this (using the suggested test case "gs-bugzilla690022.pdf"):  

 
{code:java}
public static void main(String[] args) throws Exception {
int ntest = 20;
long t = 0;
long m = 0;
for (int i = 0; i <= ntest; i++) {
try (PDDocument doc = PDDocument.load(new 
File("gs-bugzilla690022.pdf"))) {
PDFRenderer renderer = new PDFRenderer(doc);
renderer.setSubsamplingAllowed(false);
long s = System.currentTimeMillis();
renderer.renderImage(0);
long e = System.currentTimeMillis();
long memused = Runtime.getRuntime().totalMemory() - 
Runtime.getRuntime().freeMemory();

// discard first run (warmup?)
if (i > 0) {
m += memused;
t += (e - s);
}
}
Runtime.getRuntime().gc();
}
System.err.printf("Time: %dms, Mem: %dMB%n", t / ntest, m / (ntest * 1024 * 
1024));
System.exit(0);
}
{code}
For the original version, the results were "Time: 3565ms, Mem: 1616MB", while 
with the patched version they are "Time: 3556ms, Mem: 1222MB". This is when 
running with {{-Xms2g}} or more. With smaller values the results were much less 
conclusive. 

I understand if this is not conclusive enough to merit a change, I mainly did 
it out of personal interest. 

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, 
> PDFBOX-2554-cmykrasterobjecttypes.pdf, TEST.pdf, gs-bugzilla690022.pdf, 
> pop-bugzilla93476.pdf, predictor_stream.patch, predictor_stream_rev2.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> 

[jira] [Commented] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-15 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400992#comment-16400992
 ] 

Tilman Hausherr commented on PDFBOX-4151:
-

Oops, I see what you meant, you used the link. Please use the files I attached. 
I don't know which of the files from the original 7z file I used. Sorry for the 
confusion. It should work without the twelvemonkeys plugin, although the plugin 
helps elsewhere.

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, 
> PDFBOX-2554-cmykrasterobjecttypes.pdf, TEST.pdf, gs-bugzilla690022.pdf, 
> pop-bugzilla93476.pdf, predictor_stream.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }
> }
> else
> {
> decompress(encoded, 

[jira] [Commented] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-15 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400986#comment-16400986
 ] 

Tilman Hausherr commented on PDFBOX-4151:
-

Sorry I forgot to attach these files. Re the error message, try the 
twelvemoneys jpg plugin:
{code:xml}

com.twelvemonkeys.imageio
imageio-jpeg
3.3.2

{code}


> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, 
> PDFBOX-2554-cmykrasterobjecttypes.pdf, TEST.pdf, gs-bugzilla690022.pdf, 
> pop-bugzilla93476.pdf, predictor_stream.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }
> }
> else
> {
> decompress(encoded, decoded);
> }
>  

[jira] [Commented] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-15 Thread Itai Shaked (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400974#comment-16400974
 ] 

Itai Shaked commented on PDFBOX-4151:
-

Can you clarify which file you mean by "pop-bugzilla93476.pdf"? The attachment 
in the linked issue has a few PDF files in it (although they may be all the 
same?), but all of them completely fail to render for me, on both patched trunk 
and 2.0.8 , both with the same error: "java.util.concurrent.ExecutionException: 
java.awt.image.RasterFormatException: (y + height) is outside raster". 

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, TEST.pdf, 
> gs-bugzilla690022.pdf, predictor_stream.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }
> }
> else

[jira] [Commented] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-15 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400826#comment-16400826
 ] 

Tilman Hausherr commented on PDFBOX-4151:
-

I've changed the constant in PDFBOX-4071.

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, TEST.pdf, 
> gs-bugzilla690022.pdf, predictor_stream.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // 

[jira] [Commented] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-15 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400811#comment-16400811
 ] 

Tilman Hausherr commented on PDFBOX-4151:
-

Thank you [~itai] for the patch which sounds interesting, but which I would 
only use if there's an improvement with the memory footprint.

I get a rendering difference with two files:
 - pop-bugzilla93476.pdf
 - PDFBOX-2554-cmykrasterobjecttypes.pdf

In the first one, I suspect that the object is 
{{Root/Pages/Kids/[0]/Resources/XObject/Fm3/Resources/XObject/Fm2/Resources/XObject/Im1/SMask}}.
 From the way it looks, it is probably an incorrect stream, the original issue 
( [https://bugs.freedesktop.org/show_bug.cgi?id=93476] ) mentions fuzzing. For 
the second file I can't tell where it is.

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, TEST.pdf, 
> bugzilla886049.pdf, predictor_stream.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } 

[jira] [Commented] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-15 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400744#comment-16400744
 ] 

Tilman Hausherr commented on PDFBOX-4151:
-

bugzilla886049.pdf is a file that should have a large stream with predictor.

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, TEST.pdf, 
> bugzilla886049.pdf, predictor_stream.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> 

[jira] [Commented] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-15 Thread Marek Pribula (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400568#comment-16400568
 ] 

Marek Pribula commented on PDFBOX-4151:
---

The subsampling helps with our problem file - the memory usage gets to the half 
of the previous values. But I am also confused why is it because changes in 
Filters play no role.

I have also tested supplied patch with this file. My results are same as was 
mentioned. The memory usage and also time consumption are same (or at least 
without notable changes) in comparison to using only subsampling. 

Thank you for information about new version release and also for snapshot. We 
are going to test it within our production application and hopefully it will 
help as have showned my previous test.

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, TEST.pdf, 
> predictor_stream.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   

[jira] [Commented] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-15 Thread Marek Pribula (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400569#comment-16400569
 ] 

Marek Pribula commented on PDFBOX-4151:
---

The subsampling helps with our problem file - the memory usage gets to the half 
of the previous values. But I am also confused why is it because changes in 
Filters play no role.

I have also tested supplied a patch with this file. My results are same as was 
mentioned. The memory usage and also time consumption is same (or at least 
without notable changes) in comparison to using only subsampling. 

Thank you for information about new version release and also for the snapshot. 
We are going to test it within our production application and hopefully, it 
will help as have shown my previous test.

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, TEST.pdf, 
> predictor_stream.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   

[jira] [Commented] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-15 Thread Itai Shaked (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400205#comment-16400205
 ] 

Itai Shaked commented on PDFBOX-4151:
-

I'm attaching a patch making {{Predictor}} implemented as a stream, so no extra 
byte-array streams would be created. I have tested it on a few files, but I saw 
no notable differences in either speed or memory footprint, as I couldn't find 
PDF files with really huge Flate or LZW encoded images which have a predictor 
(biggest I could find was ~1800x600 pixels, or just over 3MB, which I'm 
assuming would be hardly noticeable). 

It would be nice to test it on some really big images, but I don't know where I 
could find such examples. 

P.S: While working on it, I noticed in {{FlateFilter}} there is the constant 
{{int BUFFER_SIZE = 16348}} - I'm assuming it's a typo, and should be 16384 = 
2^14^ ? ^^ 

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, TEST.pdf, 
> predictor_stream.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
> 

[jira] [Commented] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-14 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398903#comment-16398903
 ] 

Tilman Hausherr commented on PDFBOX-4151:
-

It's like [~itai] wrote, you'll still have the memory usage you complain about, 
but you'll save memory elsewhere.

A snapshot can be found here:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.9-SNAPSHOT/

We're currently doing pre-release tests and fixing one last bug, so the new 
version should come within a few weeks. Please try also 2.0.8, each version is 
better than the one before.

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, TEST.pdf
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }
> }
> else
> 

[jira] [Commented] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-14 Thread Itai Shaked (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398382#comment-16398382
 ] 

Itai Shaked commented on PDFBOX-4151:
-

Note that the subsampling in filters is implemented only for JPEG (DCT), 
JBIG2000 and JPX, as it relies on ImageIO subsampling. Since Flate and LZW can 
be used for non-images, and since at the filter level they have no real 
image-decoding mechanisms, they ignore subsampling options. I believe I pointed 
it in my original mail about the subsampling feature, but I may have neglected 
to point it out in the actual issue (PDFBOX-4137)  . 

{{SampledImageReader}} would still allocate a smaller {{BufferedImage}} if 
subsampling is enabled and used (and will effectively perform the subsampling 
itself), but the memory and time savings won't be as dramatic as in the case of 
JPEG, JBIG2000 and JPX streams. 

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, TEST.pdf
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   

[jira] [Commented] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-14 Thread Marek Pribula (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398371#comment-16398371
 ] 

Marek Pribula commented on PDFBOX-4151:
---

1) Currently, we are using 2.0.7.

2) We are using scratch files in our production, but I am not sure it would 
help with filters since they allocate double size of the original file.

3) Subsampling sounds very good, that could work for us. Can you please tell me 
when is planned the release of 2.0.9? Is there any beta version with this 
feature so we can start testing it and see the real impact (for example on the 
file we had the problem with). Thank you very much.

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, TEST.pdf
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }

[jira] [Commented] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-13 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397352#comment-16397352
 ] 

Tilman Hausherr commented on PDFBOX-4151:
-

1) Please tell what version you're using

2) We can't use your patch. It writes into a temp directory. This will make 
things slower. You can have that without making a change, by using a scratch 
file when opening your document.

3) In upcoming version 2.0.9 you can turn on subsampling in the PDFRenderer 
class. This will reduce memory usage.

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, TEST.pdf
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }
> }
> else
> {
> decompress(encoded, decoded);
> }
>