Re: [PDFBOX-2.0] PDF Size after Signature

2015-02-26 Thread Andreas Lehmkühler
Hi,

 Tilman Hausherr thaush...@t-online.de hat am 27. Februar 2015 um 07:35
 geschrieben:
 
 
 Did you just start with signing or is this a recent phenomenon, i.e. 
 didn't happen a month ago?
 
 I looked at both files - in the 1.8 one, only the changed objects appear 
 after EOF. In the 2.0 one, all objects are there ?!
Correct, something went wrong when appending the changed objects only. It work
for 
me when I fixed the encryption stuff. I seems as if some recent change
introduced
this regression.

@Isaias
Which exact version/revision of the trunk are you using?

BR
Andreas Lehmkühler
 
 Tilman
 
 Am 27.02.2015 um 05:44 schrieb Isaias Barroso:
  Hi all,
 
  I'm using PDFBOX 2.0 to sign some documents and I found that the size of
  signed file is too big if compared with 1.8 version, sometimes those files
  get their sizes  increased in 100% or more. When the same file is signed
  using 1.8 the file is increased in a expected way.
 
  Original File: https://www.dropbox.com/s/s8p40ukorhchtcu/sign_me.pdf?dl=0
 
  Signed With 1.8:
  https://www.dropbox.com/s/ty8axylq8ol6204/sign_me_signed_1.8.pdf?dl=0
 
  Signed With 2.0:
  https://www.dropbox.com/s/ge1x3mdpqlalnvq/sign_me_signed_2.0.pdf?dl=0
 
  There is some option to reduce the signed file size on 2.0?
 
  Best regards
 
 
 -
 To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
 For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: [PDFBOX-2.0] PDF Size after Signature

2015-02-26 Thread Tilman Hausherr
Did you just start with signing or is this a recent phenomenon, i.e. 
didn't happen a month ago?


I looked at both files - in the 1.8 one, only the changed objects appear 
after EOF. In the 2.0 one, all objects are there ?!


Tilman

Am 27.02.2015 um 05:44 schrieb Isaias Barroso:

Hi all,

I'm using PDFBOX 2.0 to sign some documents and I found that the size of
signed file is too big if compared with 1.8 version, sometimes those files
get their sizes  increased in 100% or more. When the same file is signed
using 1.8 the file is increased in a expected way.

Original File: https://www.dropbox.com/s/s8p40ukorhchtcu/sign_me.pdf?dl=0

Signed With 1.8:
https://www.dropbox.com/s/ty8axylq8ol6204/sign_me_signed_1.8.pdf?dl=0

Signed With 2.0:
https://www.dropbox.com/s/ge1x3mdpqlalnvq/sign_me_signed_2.0.pdf?dl=0

There is some option to reduce the signed file size on 2.0?

Best regards



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: https://issues.apache.org/jira/browse/PDFBOX-2523 still present (or variation of it still present)

2015-02-26 Thread Andreas Lehmkühler
Hi,

 Steve Antoch sant...@yuzu.com hat am 25. Februar 2015 um 00:04 geschrieben:
 
 
 Hi Andreas-
 
 Thanks again.
 
 I downloaded and built the latest from trunk.  
 There was no change for the book I was testing.  I first tried it after taking
 out my if (streamOffset  0) test, but the null reference exception still
 occurred.
OK, thanks again for testing. I've fixed the issue based on your analysis.

 We are planning on running a large breadth test on approximately 108,000 pdfs
 starting tonight.  I will let you know how this test goes.  It will take about
 4 days to complete.
Cool, I'm looking forward to see the results.

 With respect to the small change I made in my fork:
 https://github.com/santoch/pdfbox/commit/75cc32ab8307062709c30f1cfea5e2fdb8c00ddd
 
 The issue was a separate but fairly rare failure that we found in a small
 number (about 10) of our pdfs.
 Adobe and Pdfium (Chrome) were both able to open them but pdfBox was not due
 to disallowing nesting.  I figured that if Pdfium allows 64 levels of nesting,
 we might be able to relax this test from 0 levels to allowing 1 level and see
 if it worked.  Since it did, I wanted to run those changes by you for your
 comments.
Is there any chance to get a hand on a sample pdf? I would be good enough to
send it via private mail to me:

BR
Andreas Lehmkühler

 
 Thanks-
 Steve
 
 
 From: Andreas Lehmkühler andr...@lehmi.de
 Sent: Tuesday, February 24, 2015 3:30 AM
 To: users@pdfbox.apache.org
 Subject: Re: https://issues.apache.org/jira/browse/PDFBOX-2523 still present
 (or variation of it still present)
 
 Hi Steve,
 
  Steve Antoch sant...@yuzu.com hat am 23. Februar 2015 um 19:42
  geschrieben:
 
 
  @Andreas-
 
  I have downloaded the latest trunk and came close (it got much further)
  before
  failing.
  However, I think I may have a fix for that failure:
 Thanks for the test
 
  The code is returning 0 when the xrefstm fixedOffset is not found.  However,
  the code still tries to load and parse from xref 0, resulting in a null
  reference exception later in parser.parse().
 Your analysis is correct, but I hope that my last improvements should
 eliminate
 such cases, see PDFBOX-2572 for details. Could you give the latest trunk
 (r1661747) a try?
 
  However, thinking about this, I came up with this:
 
  // check for a XRef stream, it may contain some object ids
  of
  compressed objects
  if(trailer.containsKey(COSName.XREF_STM))
  {
  int streamOffset = trailer.getInt(COSName.XREF_STM);
  // check the xref stream reference
  fixedOffset = checkXRefStreamOffset(streamOffset,
  false);
//== fixedoffset comes back as 0 = not found
  if (fixedOffset  -1  fixedOffset != streamOffset)
  {
  streamOffset = (int)fixedOffset;
// == streamOffset gets set
  to
  0 here
  trailer.setInt(COSName.XREF_STM, streamOffset);
  }
 
  if (streamOffset  0)//  I added this test
  because an xref stream starting at
 //  offset 0 can
  never happen, so we should simply skip it
  {
  pdfSource.seek(streamOffset);
  skipSpaces();
  parseXrefObjStream(prev, false);  == this call
  ultimately throws a null ref exception if streamOffset == 0 on entry
  }
  }
 
  Adding that, the file successfully parses.
 
  Also, there was this proposal that I put up on github in a repo that I
  directly forked from pdfbox (it is the only change)
  It relaxes the looping a bit to allow limited recursion.  I would appreciate
  your thoughts on it.
 Is this change related to the discussed issue above?
 
  https://github.com/santoch/pdfbox/commit/75cc32ab8307062709c30f1cfea5e2fdb8c00ddd
 
  Thank you so much!  You have been tremendously helpful.  I wish I could have
  given you the files, but unfortunately, they are proprietary and we cannot
  release them.  :-(
 No need to worry, you are not the only one who is not allowed to share a
 specific pdf.
 
  Best regards-
  Steve
 
 BR
 Andreas Lehmkühler
 
 
  
  From: Andreas Lehmkühler andr...@lehmi.de
  Sent: Monday, February 23, 2015 3:43 AM
  To: users@pdfbox.apache.org
  Subject: Re: https://issues.apache.org/jira/browse/PDFBOX-2523 still present
  (or variation of it still present)
 
  Hi,
 
  I've improved the self repair mechnism of the trunk based on Steves report.
 
  @Steve Please give the newest trunk version/SNAPSHOT a try. Does the issue
  still
  persist?
 
  BR
  Andreas Lehmkühler
 
   Steve Antoch sant...@yuzu.com hat am 17. Februar 2015 um 00:05
   geschrieben:
  
  
 

Re: Error on PDFRenderer.renderImage (PDFBox 2.0)

2015-02-26 Thread Tilman Hausherr

Hi Kevin,

A new snapshot should be available within a few hours. It is slightly 
faster and your file will be processed if you use the JPEG reader of 
https://github.com/haraldk/TwelveMonkeys . Either add this to your pom 
file (if you use maven)


dependency
groupIdcom.twelvemonkeys.imageio/groupId
artifactIdimageio-jpeg/artifactId
version3.0.2/version
/dependency

or add six(!) files into your classpath as described on the URL.

My understanding is that twelvemonkeys tries to process every broken 
JPEG file, similar to us trying to process every broken PDF file.


Tilman


Am 25.02.2015 um 15:04 schrieb Kevin Morin:

Hi Tilman,

great news! When do you think this will be available in the snapshot?

BR

Kevin

Le 24/02/2015 00:36, Tilman Hausherr a écrit :

Some good news: your file can be rendered with twelvemonkeys. See at
the bottom of
https://issues.apache.org/jira/browse/PDFBOX-2128

Tilman

Am 03.02.2015 um 11:34 schrieb Kevin Morin:

Hi Tilman,

I tried with JPedal and it works... and it also works with java 8. 
Do you have a clue on how to solve this?


BR

Kevin

On 01/02/2015 16:06, Tilman Hausherr wrote:
No good news - I extracted the JPEG file with NOTEPAD++ and get the 
same

error with ImageIO.

What does work is
JPEGImageDecoder imageDecoder = JPEGCodec.createJPEGDecoder(...);
raster = imageDecoder.decodeAsRaster();

but then the colors are black and violet :-(

Tilman


Am 30.01.2015 um 12:00 schrieb Kevin Morin:

Hi,

I have the following error when I try to render a PDF file (I cannot
send it on a plublic list, but I can send it in private if 
needed). It
happens with PdfBox 2.0 under Linux and Windows, Java 7 but not 
Java 8.


Invalid image format
sun.java2d.cmm.kcms.CMM.checkStatus(CMM.java:180)
sun.java2d.cmm.kcms.CMM.createTransform(CMM.java:134)
java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:540)


com.sun.imageio.plugins.jpeg.JPEGImageReader.acceptPixels(JPEGImageReader.java:1263) 




com.sun.imageio.plugins.jpeg.JPEGImageReader.readImage(Native Method)


com.sun.imageio.plugins.jpeg.JPEGImageReader.readInternal(JPEGImageReader.java:1231) 






com.sun.imageio.plugins.jpeg.JPEGImageReader.read(JPEGImageReader.java:1034) 




javax.imageio.ImageReader.read(ImageReader.java:940)
org.apache.pdfbox.filter.DCTFilter.decode(DCTFilter.java:72)
org.apache.pdfbox.cos.COSStream.attemptDecode(COSStream.java:463)
org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:417)
org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:363)

org.apache.pdfbox.cos.COSStream.getDecodeResult(COSStream.java:303)


org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.init(PDImageXObject.java:115) 






org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObject(PDXObject.java:65) 





org.apache.pdfbox.pdmodel.PDResources.getXObject(PDResources.java:193) 




org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:42) 






org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:803) 






org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:465) 






org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:439) 






org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149) 





org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:163)


org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:204) 





org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:137) 





org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:70) 




Thanks for your help
Best

Kevin


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org





-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org





-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: font errors when reading PDF (not writing)

2015-02-26 Thread Steve Antoch

John-

(sorry to hijack -  I think this is related enough that it warrants asking here)


If we run pdfbox on a headless server, will the Encrypt() class still function 
properly?  We do not render anything, just encrypt the document.

My suspicion is that this is not an issue, though it would be nice to be sure.

Thanks-
Steve


From: John Hewson j...@jahewson.com
Sent: Wednesday, February 25, 2015 1:27 PM
To: users@pdfbox.apache.org
Subject: Re: font errors when reading PDF (not writing)

Are you running on a headless system, such as a server? If so, you probably 
don’t have any fonts installed. Even though you’re just doing text extraction, 
this matters because the dimensions of the characters need to be taken into 
account and many PDFs do not embed the fonts which they depend on.

At a bare minimum I’d recommend installing the liberation fonts and whichever 
Microsoft fonts are available in your distribution’s package manager.

— John

 On 25 Feb 2015, at 06:12, Juan M Uys opy...@gmail.com wrote:

 Hello,

 I'm extracting text from PDFs using PDFTextStripperByArea and get a  lot of
 these in the log:

 Feb 25, 2015 2:01:44 PM org.apache.pdfbox.pdmodel.font.ExternalFonts
 getTrueTypeFallbackFont
 SEVERE: No TTF fallback font for 'Helvetica'
 Feb 25, 2015 2:01:44 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont init
 WARNING: Using fallback font 'LiberationSans' for 'ArialMT'

 I've searched the documentation for font-related advice, which seems to
 pertain to WRITING PDFs, whereas I'm merely extracting text.

 Please let me know how to get around this problem.

 Do I need to install extra font packages?
 If so, how? Where from?

 At the very least, I'd like to know how to remove these statements from my
 log. (I've tried throwing logback.xml and log4j.properties into my
 resources folder, setting package org.apache.pdfbox to INFO, to no avail)

 The system running my extractor code is stock Ubuntu 14.04 with Azul
 openjdk 7 (see
 https://registry.hub.docker.com/u/azul/zulu-openjdk/dockerfile/)

 Thanks,
 Juan


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: font errors when reading PDF (not writing)

2015-02-26 Thread John Hewson
Yep, it’ll work fine. If you’re using AES 256 you’ll need the Java unlimited 
security” files installed with your JVM.

— John

 On 26 Feb 2015, at 12:35, Steve Antoch sant...@yuzu.com wrote:
 
 
 John-
 
 (sorry to hijack -  I think this is related enough that it warrants asking 
 here)
 
 
 If we run pdfbox on a headless server, will the Encrypt() class still 
 function properly?  We do not render anything, just encrypt the document.
 
 My suspicion is that this is not an issue, though it would be nice to be sure.
 
 Thanks-
 Steve
 
 
 From: John Hewson j...@jahewson.com
 Sent: Wednesday, February 25, 2015 1:27 PM
 To: users@pdfbox.apache.org
 Subject: Re: font errors when reading PDF (not writing)
 
 Are you running on a headless system, such as a server? If so, you probably 
 don’t have any fonts installed. Even though you’re just doing text 
 extraction, this matters because the dimensions of the characters need to be 
 taken into account and many PDFs do not embed the fonts which they depend on.
 
 At a bare minimum I’d recommend installing the liberation fonts and whichever 
 Microsoft fonts are available in your distribution’s package manager.
 
 — John
 
 On 25 Feb 2015, at 06:12, Juan M Uys opy...@gmail.com wrote:
 
 Hello,
 
 I'm extracting text from PDFs using PDFTextStripperByArea and get a  lot of
 these in the log:
 
 Feb 25, 2015 2:01:44 PM org.apache.pdfbox.pdmodel.font.ExternalFonts
 getTrueTypeFallbackFont
 SEVERE: No TTF fallback font for 'Helvetica'
 Feb 25, 2015 2:01:44 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont init
 WARNING: Using fallback font 'LiberationSans' for 'ArialMT'
 
 I've searched the documentation for font-related advice, which seems to
 pertain to WRITING PDFs, whereas I'm merely extracting text.
 
 Please let me know how to get around this problem.
 
 Do I need to install extra font packages?
 If so, how? Where from?
 
 At the very least, I'd like to know how to remove these statements from my
 log. (I've tried throwing logback.xml and log4j.properties into my
 resources folder, setting package org.apache.pdfbox to INFO, to no avail)
 
 The system running my extractor code is stock Ubuntu 14.04 with Azul
 openjdk 7 (see
 https://registry.hub.docker.com/u/azul/zulu-openjdk/dockerfile/)
 
 Thanks,
 Juan
 
 
 -
 To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
 For additional commands, e-mail: users-h...@pdfbox.apache.org
 



[PDFBOX-2.0] PDF Size after Signature

2015-02-26 Thread Isaias Barroso
Hi all,

I'm using PDFBOX 2.0 to sign some documents and I found that the size of
signed file is too big if compared with 1.8 version, sometimes those files
get their sizes  increased in 100% or more. When the same file is signed
using 1.8 the file is increased in a expected way.

Original File: https://www.dropbox.com/s/s8p40ukorhchtcu/sign_me.pdf?dl=0

Signed With 1.8:
https://www.dropbox.com/s/ty8axylq8ol6204/sign_me_signed_1.8.pdf?dl=0

Signed With 2.0:
https://www.dropbox.com/s/ge1x3mdpqlalnvq/sign_me_signed_2.0.pdf?dl=0

There is some option to reduce the signed file size on 2.0?

Best regards
-- 
Isaías Barroso
Belo Horizonte - MG