[jira] [Updated] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-19 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4151:

Attachment: PDFJS-9581-hugeimage.pdf

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.8
>Reporter: Marek Pribula
>Priority: Major
>  Labels: Predictor
> Attachments: ModifiedFilters.png, OriginalFilters.png, 
> PDFBOX-2554-cmykrasterobjecttypes.pdf, PDFJS-9581-hugeimage.pdf, TEST.pdf, 
> gs-bugzilla690022.pdf, pop-bugzilla93476.pdf, predictor_stream.patch, 
> predictor_stream_rev2.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormat

[jira] [Updated] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-15 Thread Itai Shaked (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Itai Shaked updated PDFBOX-4151:

Attachment: predictor_stream_rev2.patch

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, 
> PDFBOX-2554-cmykrasterobjecttypes.pdf, TEST.pdf, gs-bugzilla690022.pdf, 
> pop-bugzilla93476.pdf, predictor_stream.patch, predictor_stream_rev2.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 

[jira] [Updated] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-15 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4151:

Attachment: PDFBOX-2554-cmykrasterobjecttypes.pdf

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, 
> PDFBOX-2554-cmykrasterobjecttypes.pdf, TEST.pdf, gs-bugzilla690022.pdf, 
> pop-bugzilla93476.pdf, predictor_stream.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataForma

[jira] [Updated] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-15 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4151:

Attachment: pop-bugzilla93476.pdf

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, TEST.pdf, 
> gs-bugzilla690022.pdf, pop-bugzilla93476.pdf, predictor_stream.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
>   

[jira] [Updated] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-15 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4151:

Attachment: gs-bugzilla690022.pdf

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, TEST.pdf, 
> gs-bugzilla690022.pdf, predictor_stream.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOE

[jira] [Updated] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-15 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4151:

Attachment: (was: bugzilla886049.pdf)

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, TEST.pdf, 
> predictor_stream.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> 

[jira] [Updated] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-15 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4151:

Attachment: bugzilla886049.pdf

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, TEST.pdf, 
> bugzilla886049.pdf, predictor_stream.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOExcepti

[jira] [Updated] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-15 Thread Itai Shaked (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Itai Shaked updated PDFBOX-4151:

Attachment: predictor_stream.patch

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, TEST.pdf, 
> predictor_stream.patch
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
>

[jira] [Updated] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-13 Thread Marek Pribula (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek Pribula updated PDFBOX-4151:
--
Attachment: OriginalFilters.png
ModifiedFilters.png

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, TEST.pdf
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
>  

[jira] [Updated] (PDFBOX-4151) FlateFilter, LZWFilter causes double memory usage

2018-03-13 Thread Marek Pribula (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek Pribula updated PDFBOX-4151:
--
Attachment: TEST.pdf

> FlateFilter, LZWFilter causes double memory usage
> -
>
> Key: PDFBOX-4151
> URL: https://issues.apache.org/jira/browse/PDFBOX-4151
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Marek Pribula
>Priority: Major
> Attachments: ModifiedFilters.png, OriginalFilters.png, TEST.pdf
>
>
> The problem occurred in our production during processing file with size 
> 400kB. The file was generated by the scanner with resolution 5960 x 8430 
> pixels with 8 bit per pixel (unfortunately we have no control over files 
> which should be processed). Our analysis showed that problem is in 
> FlateFilter.decode where uncompressed data are written into 
> ByteArrayOutputStream. Since the final size of the file is unknown to 
> OutputStream its size is growing by internal call Arrays.copyOf. By the end 
> of processing file, this leads to usage of memory at two times file size.
> What we have tried and helped in our case was slightly modification of 
> FlateFilter and LZWFilter decode method implementation. Here is the code 
> snippet of original method body:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
> COSDictionary parameters, int index) throws IOException
> {
> int predictor = -1;
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> if (decodeParams != null)
> {
> predictor = decodeParams.getInt(COSName.PREDICTOR);
> }
> try
> {
> if (predictor > 1)
> {
> int colors = Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
> int bitsPerPixel = decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
> int columns = decodeParams.getInt(COSName.COLUMNS, 1);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> decompress(encoded, baos);
> ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
> Predictor.decodePredictor(predictor, colors, bitsPerPixel, columns, bais, 
> decoded);
> decoded.flush();
> baos.reset();
> bais.reset();
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(parameters);
> }
> {code}
> and here is our implementation:
> {code:java}
> @Override
> public DecodeResult decode(InputStream encoded, OutputStream decoded,
>  COSDictionary parameters, int index) 
> throws IOException
> {
> final COSDictionary decodeParams = getDecodeParams(parameters, index);
> int predictor = decodeParams.getInt(COSName.PREDICTOR);
> try
> {
> if (predictor > 1)
> {
>   File tempFile = null;
> FileOutputStream fos = null;
> FileInputStream fis = null;
> try {
>   int colors = 
> Math.min(decodeParams.getInt(COSName.COLORS, 1), 32);
>   int bitsPerPixel = 
> decodeParams.getInt(COSName.BITS_PER_COMPONENT, 8);
>   int columns = decodeParams.getInt(COSName.COLUMNS, 1);
>   tempFile = File.createTempFile("tmpPdf", null);
>   fos = new FileOutputStream(tempFile);
>   decompress(encoded, fos);
>   fos.close();
>   fis = new FileInputStream(tempFile);
>   Predictor.decodePredictor(predictor, colors, 
> bitsPerPixel, columns, fis, decoded);
>   decoded.flush();
> } finally {
>   IOUtils.closeQuietly(fos);
>   IOUtils.closeQuietly(fis);
>   try { 
>   // try to delete but don't care if it fails
>   tempFile.delete();
>   } catch(Exception e) {
>   LOG.error("Could not delete 
> temp data file", e);
>   }
> }
> }
> else
> {
> decompress(encoded, decoded);
> }
> } 
> catch (DataFormatException e)
> {
> // if the stream is corrupt a DataFormatException may occur
> LOG.error("FlateFilter: stop reading corrupt stream due to a 
> DataFormatException");
> // re-throw the exception
> throw new IOException(e);
> }
> return new DecodeResult(paramete