Another PDF Merging Error

2015-01-21 Thread Marc Davis
Trying to merge this file with itself:

https://www.dropbox.com/s/6dd0wk5ri80sn2v/claire%20dempster.pdf?dl=0 
https://www.dropbox.com/s/6dd0wk5ri80sn2v/claire%20dempster.pdf?dl=0

I issue this command and receive the error below:

~: java -jar /Users/marlon/Desktop/pdfbox-app-1.8.8.jar PDFMerger 
/Users/marlon/Dropbox/Taras-Marlon/bpm/problem\ PDF\ files/claire\ dempster.pdf 
/Users/marlon/Dropbox/Taras-Marlon/bpm/problem\ PDF\ files/claire\ dempster.pdf 
/Users/marlon/Desktop/untitled\ folder/test.pdf 
Jan 21, 2015 4:38:49 PM org.apache.pdfbox.pdfparser.PDFParser parseObject
WARNING: expected='%%EOF' actual=''
Jan 21, 2015 4:38:49 PM org.apache.pdfbox.pdfparser.XrefTrailerResolver 
setStartxref
WARNING: Did not found XRef object at specified startxref position 0
Jan 21, 2015 4:38:49 PM org.apache.pdfbox.pdfparser.PDFParser parseObject
WARNING: expected='%%EOF' actual=''
Jan 21, 2015 4:38:49 PM org.apache.pdfbox.pdfparser.XrefTrailerResolver 
setStartxref
WARNING: Did not found XRef object at specified startxref position 0

Thanks,
Marc






Re: Not merging all pages

2015-01-21 Thread Marc Davis
Thanks Tilman…awesome!  

Thanks,
Marc




 On Jan 21, 2015, at 5:14 PM, Tilman Hausherr thaush...@t-online.de wrote:
 
 Use the nonSeq parser option, then you get 53 :-)
 
 Tilman
 
 Am 21.01.2015 um 23:00 schrieb Marc Davis:
 Yet another merging problem.  After merging this 53 page PDF file with 
 itself using PDFBox 1.8.8, I only get 16 pages total (8 pages recognized by 
 PDFBox):
 
 https://www.dropbox.com/s/s6pybruhrm3bvki/helen%20shih.pdf?dl=0
 
 Thanks,
 Marc
 
 
 
 
 
 



Not merging all pages

2015-01-21 Thread Marc Davis
Yet another merging problem.  After merging this 53 page PDF file with itself 
using PDFBox 1.8.8, I only get 16 pages total (8 pages recognized by PDFBox):

https://www.dropbox.com/s/s6pybruhrm3bvki/helen%20shih.pdf?dl=0 

Thanks,
Marc






Re: PDFBox-generated PDF/A-1b not validated using PDFBox validation

2015-01-21 Thread Tilman Hausherr
I didn't test the file from your code, only the one from the example, 
both with preflight and with

http://www.pdf-tools.com/pdf/validate-pdfa-online.aspx

If you want to be really sure, when you continue with development, check 
your files with the 2.0 version of the preflight app command line:


https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/preflight-app/2.0.0-SNAPSHOT/

because some of the improvements in preflight haven't been made to 1.8.

Tilman

Am 21.01.2015 um 21:39 schrieb Tilman Hausherr:

Am 21.01.2015 um 12:31 schrieb Julien Béti:

serializer.serialize(xmp, baos, false);


change false to true, it should work then.

Tilman






Re: Another PDF Merging Error

2015-01-21 Thread Tilman Hausherr
The file is broken... pdf.js doesn't display it at all. The end of the 
file has this:



trailer
/Size 426/Root 1 0 R
xref
0 0
trailer
/Size 426/Prev 374525/XRefStm 4291/Root 1 0 R/Info 3 0 
R/ID[2871C4EA28E5E14CA4CC8054ABC9A97364AEDCA53786314891752768DFE1ADC1]

star


And yes, it has all the problems reported. And PDFBox does merge the files.

Tilman


Am 21.01.2015 um 22:49 schrieb Marc Davis:

Trying to merge this file with itself:

https://www.dropbox.com/s/6dd0wk5ri80sn2v/claire%20dempster.pdf?dl=0 
https://www.dropbox.com/s/6dd0wk5ri80sn2v/claire%20dempster.pdf?dl=0

I issue this command and receive the error below:

~: java -jar /Users/marlon/Desktop/pdfbox-app-1.8.8.jar PDFMerger 
/Users/marlon/Dropbox/Taras-Marlon/bpm/problem\ PDF\ files/claire\ dempster.pdf 
/Users/marlon/Dropbox/Taras-Marlon/bpm/problem\ PDF\ files/claire\ dempster.pdf 
/Users/marlon/Desktop/untitled\ folder/test.pdf
Jan 21, 2015 4:38:49 PM org.apache.pdfbox.pdfparser.PDFParser parseObject
WARNING: expected='%%EOF' actual=''
Jan 21, 2015 4:38:49 PM org.apache.pdfbox.pdfparser.XrefTrailerResolver 
setStartxref
WARNING: Did not found XRef object at specified startxref position 0
Jan 21, 2015 4:38:49 PM org.apache.pdfbox.pdfparser.PDFParser parseObject
WARNING: expected='%%EOF' actual=''
Jan 21, 2015 4:38:49 PM org.apache.pdfbox.pdfparser.XrefTrailerResolver 
setStartxref
WARNING: Did not found XRef object at specified startxref position 0

Thanks,
Marc









Re: Not merging all pages

2015-01-21 Thread Tilman Hausherr

Use the nonSeq parser option, then you get 53 :-)

Tilman

Am 21.01.2015 um 23:00 schrieb Marc Davis:

Yet another merging problem.  After merging this 53 page PDF file with itself 
using PDFBox 1.8.8, I only get 16 pages total (8 pages recognized by PDFBox):

https://www.dropbox.com/s/s6pybruhrm3bvki/helen%20shih.pdf?dl=0

Thanks,
Marc









Re: merging docs with form fields and tags

2015-01-21 Thread Ray Morris

I really want to be dropped from the pdfbox email list.

Ray Morris
ray.morris.brisb...@bigpond.com


-Original Message- 
From: David Hill

Sent: Thursday, January 22, 2015 5:32 AM
To: users@pdfbox.apache.org
Subject: merging docs with form fields and tags

We are trying to merge multiple PDFs together and we are having issues with 
form fields and tags vanishing. Tags and editable form fields will not exist 
in the output.


This seems related to issues 1031 and 930.

The problem seems to happen if the first document does not have fields or 
tags. In the case where the first document does not have form fields the 
form fields for the second document will be missing but all subsequent form 
fields seem to exist properly in the resulting document. In the case where 
the first document does not have tags, all tags on all pages are missing in 
the resulting document.


This is a high priority for our current 508 compliance project, we may 
attempt to correct the issue ourselves. Does anyone have any ideas what the 
problem might be and where we might start looking for a solution?


Thanks!

Dave Hill
Lead Developer
Iowa Student Loan Liquidity Corp



This e-mail and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. 
If you have received this e-mail in error please notify the originator of 
the message. This footer also confirms that this e-mail message has been 
scanned for the presence of computer viruses. Any views expressed in this 
message are those of the individual sender, except where the sender 
specifies and with authority, states them to be the views of Iowa Student 
Loan.





Re: PDFBox-generated PDF/A-1b not validated using PDFBox validation

2015-01-21 Thread Tilman Hausherr

Am 21.01.2015 um 12:31 schrieb Julien Béti:

serializer.serialize(xmp, baos, false);


change false to true, it should work then.

Tilman




Re: Runtime exception in PDFParser.parse()

2015-01-21 Thread John Hewson
You don’t have to do anything to log runtime exceptions in Java, they are passed
up the stack until the process terminates and prints a stack trace. Trying to 
catch
runtime exceptions yourself isn’t necessary or advised.

Remove your exception handling code and have main() throw IOException.

-- John

 On 21 Jan 2015, at 06:24, Chris Bamford cbamf...@mimecast.com wrote:
 
 
 Hi
 
 Recently I started getting runtime exceptions in PDFParser.parse(), but I 
 can't seem to get any information out as to why.  The weird thing is that it 
 only happens on Linux - on my laptop it is fine!
 I have tried to get a stack trace (my code uses log4j), but all I get is:
 
 org.slf4j.impl.Log4jLoggerAdapter.log(Lorg/slf4j/Marker;Ljava/lang/String;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V
 
 This is produced by my code:
 
 try {
 log.info(Parsing PDF document ...);
 document = PDDocument.load(is, true); // skip corrupt pdf objects
 
 } catch (Throwable t) {
 message = Failure when parsing PDF  + filename + :  + 
 t.getMessage();
 if (t1.getCause() != null) {
 message +=  | CAUSE:  + t1.getCause().getMessage();
 }
 
 log.warn(t);
 log.debug(message);
 throw new Exception(message);
 }
 I have also tried to turn logging up in my log4j.properties:
 
 log4j.logger.org.apache.pdfbox=TRACE
 
 But I see nothing except the slf4j line above.
 
 Can anyone tell me what I am doing wrong?
 
 Thanks
 
 - Chris
 
 
  
 https://serviceb.mimecast.com/mimecast/click?account=C1A1code=656191af743569eae379793cda49427f
 Chris Bamford m: +44 7860 405292  w: www.mimecast.com 
 http://www.mimecast.com/
 Senior Developer  p: +44 207 847 8700 Address click here 
 http://www.mimecast.com/About-us/Contact-us/
  
 https://serviceb.mimecast.com/mimecast/click?account=C1A1code=c01462327e2d761c3b717eed2058e1ac
 
   
  
 https://serviceb.mimecast.com/mimecast/click?account=C1A1code=6540a5ba8d3a7ff2665c11f28f8051e6
 
  
 https://serviceb.mimecast.com/mimecast/click?account=C1A1code=09b8c0f1a6d966cb67b03d293176fc22
 
  
 https://serviceb.mimecast.com/mimecast/click?account=C1A1code=f1e6756b7727ee7b1d3b5c55d10a456d
 
  
 https://serviceb.mimecast.com/mimecast/click?account=C1A1code=4d6b5c1310552b6f6f8dc52c397041ba
 
  
 https://serviceb.mimecast.com/mimecast/click?account=C1A1code=7fb112818d2f83f403ac7d7380fc47c9
 Disclaimer
 The information contained in this communication from cbamf...@mimecast.com 
 sent at 2015-01-21 14:24:10 is confidential and may be legally privileged. It 
 is intended solely for use by users@pdfbox.apache.org and others authorized 
 to receive it. If you are not users@pdfbox.apache.org you are hereby notified 
 that any disclosure, copying, distribution or taking action in reliance of 
 the contents of this information is strictly prohibited and may be unlawful.
 
 Mimecast Ltd. is a company registered in England and Wales with the company 
 number 4698693 VAT No. GB 123 4197 34 Registered Office: CityPoint, One 
 Ropemaker Street, Moorgate, London, EC2Y 9AW
 
 This email message has been scanned for viruses by Mimecast. Mimecast 
 delivers a complete managed email solution from a single web based platform. 
 For more information please visit http://www.mimecast.com 
 http://www.mimecast.com/



Re: merging docs with form fields and tags

2015-01-21 Thread John Hewson
You can unsubscribe at http://pdfbox.apache.org/mailinglists.html 
http://pdfbox.apache.org/mailinglists.html 

-- John

 On 21 Jan 2015, at 12:15, Ray Morris ray.morris.brisb...@bigpond.com wrote:
 
 I really want to be dropped from the pdfbox email list.
 
 Ray Morris
 ray.morris.brisb...@bigpond.com
 
 
 -Original Message- From: David Hill
 Sent: Thursday, January 22, 2015 5:32 AM
 To: users@pdfbox.apache.org
 Subject: merging docs with form fields and tags
 
 We are trying to merge multiple PDFs together and we are having issues with 
 form fields and tags vanishing. Tags and editable form fields will not exist 
 in the output.
 
 This seems related to issues 1031 and 930.
 
 The problem seems to happen if the first document does not have fields or 
 tags. In the case where the first document does not have form fields the form 
 fields for the second document will be missing but all subsequent form fields 
 seem to exist properly in the resulting document. In the case where the first 
 document does not have tags, all tags on all pages are missing in the 
 resulting document.
 
 This is a high priority for our current 508 compliance project, we may 
 attempt to correct the issue ourselves. Does anyone have any ideas what the 
 problem might be and where we might start looking for a solution?
 
 Thanks!
 
 Dave Hill
 Lead Developer
 Iowa Student Loan Liquidity Corp
 
 
 
 This e-mail and any files transmitted with it are confidential and intended 
 solely for the use of the individual or entity to whom they are addressed. If 
 you have received this e-mail in error please notify the originator of the 
 message. This footer also confirms that this e-mail message has been scanned 
 for the presence of computer viruses. Any views expressed in this message are 
 those of the individual sender, except where the sender specifies and with 
 authority, states them to be the views of Iowa Student Loan.
 
 



Re: Error on PDDocument.load

2015-01-21 Thread Maruan Sahyoun
Hi Kevin

works for me - what's your Java Version?

BR
Maruan

Am 21.01.2015 um 11:24 schrieb Kevin Morin mo...@codelutin.com:

 Hi,
 
 it does not work with PDFToImage either, I still get a blank image. Plus, I 
 did not set the nonSeq option however it seems to be using the non sequential 
 parser. And I have the following traces:
 janv. 21, 2015 11:20:02 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser 
 ch
 eckXrefOffsets
 GRAVE: Can't find the object 7 0 (origin offset 359138)
 janv. 21, 2015 11:20:03 AM org.apache.pdfbox.contentstream.PDFStreamEngine 
 opera
 torException
 GRAVE: Missing XObject: Im1
 
 BR
 
 Kevin
 
 On 21/01/2015 11:11, Maruan Sahyoun wrote:
 Hi Kevin,
 
 you can test with the PDFToImage command [1] available in from the 
 pdfbox-app [2] if the issue happens there. The source for PDFToImage is 
 available in the tools section of the SVN repo or online viewable [3].
 
 BR
 Maruan
 
 [1] https://pdfbox.apache.org/1.8/commandline.html#pdfToImage
 [2] 
 https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/
 [3] 
 http://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/java/org/apache/pdfbox/tools/PDFToImage.java?view=markup
 
 Am 21.01.2015 um 11:00 schrieb Kevin Morin mo...@codelutin.com:
 
 Hi Andreas,
 
 I am using the latest snapshot available on the maven repository. And I am 
 running my app on Windows Server 2008 R2 Standard and it does not work 
 (white page). Could send me the code or a jar to test on this server to 
 check if it does not come from my code?
 
 BR
 
 Kevin
 
 On 19/01/2015 19:13, Andreas Lehmkuehler wrote:
 Hi,
 
 Am 19.01.2015 um 12:45 schrieb Kevin Morin:
 Actually, the issue is not only these traces. The real issue is that I
 have a
 blank image when I try to render the document.
 I've checked your PDF and everything renders fine. I've tried
 SNAPSHOT-891 on linux (running java 1.8, 1.7  and 1.6) and the latest
 SNAPSHOT-947 on win7 running java 1.7
 
 Maybe your SNAPSHOT is outdated?
 
 BR
 Andreas Lehmkühler
 
 On 19/01/2015 12:39, Kevin Morin wrote:
 Hi,
 
 I am using the 2.0 snapshot version to images of pdfs, but on some
 documents, I have the following error when I call PDDocument.load(file):
 2015/01/19 12:32:48 ERROR
 (org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1864) - Can't find
 the object 7 0 (origin offset 359138)
 2015/01/19 12:32:48 ERROR
 (org.apache.pdfbox.contentstream.PDFStreamEngine:840) - Missing XObject:
 Im1
 
 I first had it a few days ago (I did not report it, shame on me) but the
 error did not occur when I called the loadLegacy method on PDDocument.
 But the loadLegacy method is not available anymore...
 
 The issue happens on Windows (works fine on Debian).
 
 Thanks fo your help
 
 Kevin
 
 
 
 
 
 



Re: Error on PDDocument.load

2015-01-21 Thread Maruan Sahyoun
Hi Kevin,

you can test with the PDFToImage command [1] available in from the pdfbox-app 
[2] if the issue happens there. The source for PDFToImage is available in the 
tools section of the SVN repo or online viewable [3].

BR
Maruan

[1] https://pdfbox.apache.org/1.8/commandline.html#pdfToImage
[2] 
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/
[3] 
http://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/java/org/apache/pdfbox/tools/PDFToImage.java?view=markup

Am 21.01.2015 um 11:00 schrieb Kevin Morin mo...@codelutin.com:

 Hi Andreas,
 
 I am using the latest snapshot available on the maven repository. And I am 
 running my app on Windows Server 2008 R2 Standard and it does not work (white 
 page). Could send me the code or a jar to test on this server to check if it 
 does not come from my code?
 
 BR
 
 Kevin
 
 On 19/01/2015 19:13, Andreas Lehmkuehler wrote:
 Hi,
 
 Am 19.01.2015 um 12:45 schrieb Kevin Morin:
 Actually, the issue is not only these traces. The real issue is that I
 have a
 blank image when I try to render the document.
 I've checked your PDF and everything renders fine. I've tried
 SNAPSHOT-891 on linux (running java 1.8, 1.7  and 1.6) and the latest
 SNAPSHOT-947 on win7 running java 1.7
 
 Maybe your SNAPSHOT is outdated?
 
 BR
 Andreas Lehmkühler
 
 On 19/01/2015 12:39, Kevin Morin wrote:
 Hi,
 
 I am using the 2.0 snapshot version to images of pdfs, but on some
 documents, I have the following error when I call PDDocument.load(file):
 2015/01/19 12:32:48 ERROR
 (org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1864) - Can't find
 the object 7 0 (origin offset 359138)
 2015/01/19 12:32:48 ERROR
 (org.apache.pdfbox.contentstream.PDFStreamEngine:840) - Missing XObject:
 Im1
 
 I first had it a few days ago (I did not report it, shame on me) but the
 error did not occur when I called the loadLegacy method on PDDocument.
 But the loadLegacy method is not available anymore...
 
 The issue happens on Windows (works fine on Debian).
 
 Thanks fo your help
 
 Kevin
 
 
 



Re: Error on PDDocument.load

2015-01-21 Thread Kevin Morin

Hi,

it does not work with PDFToImage either, I still get a blank image. 
Plus, I did not set the nonSeq option however it seems to be using the 
non sequential parser. And I have the following traces:
janv. 21, 2015 11:20:02 AM 
org.apache.pdfbox.pdfparser.NonSequentialPDFParser ch

eckXrefOffsets
GRAVE: Can't find the object 7 0 (origin offset 359138)
janv. 21, 2015 11:20:03 AM 
org.apache.pdfbox.contentstream.PDFStreamEngine opera

torException
GRAVE: Missing XObject: Im1

BR

Kevin

On 21/01/2015 11:11, Maruan Sahyoun wrote:

Hi Kevin,

you can test with the PDFToImage command [1] available in from the pdfbox-app 
[2] if the issue happens there. The source for PDFToImage is available in the 
tools section of the SVN repo or online viewable [3].

BR
Maruan

[1] https://pdfbox.apache.org/1.8/commandline.html#pdfToImage
[2] 
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/
[3] 
http://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/java/org/apache/pdfbox/tools/PDFToImage.java?view=markup

Am 21.01.2015 um 11:00 schrieb Kevin Morin mo...@codelutin.com:


Hi Andreas,

I am using the latest snapshot available on the maven repository. And I am 
running my app on Windows Server 2008 R2 Standard and it does not work (white 
page). Could send me the code or a jar to test on this server to check if it 
does not come from my code?

BR

Kevin

On 19/01/2015 19:13, Andreas Lehmkuehler wrote:

Hi,

Am 19.01.2015 um 12:45 schrieb Kevin Morin:

Actually, the issue is not only these traces. The real issue is that I
have a
blank image when I try to render the document.

I've checked your PDF and everything renders fine. I've tried
SNAPSHOT-891 on linux (running java 1.8, 1.7  and 1.6) and the latest
SNAPSHOT-947 on win7 running java 1.7

Maybe your SNAPSHOT is outdated?

BR
Andreas Lehmkühler


On 19/01/2015 12:39, Kevin Morin wrote:

Hi,

I am using the 2.0 snapshot version to images of pdfs, but on some
documents, I have the following error when I call PDDocument.load(file):
2015/01/19 12:32:48 ERROR
(org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1864) - Can't find
the object 7 0 (origin offset 359138)
2015/01/19 12:32:48 ERROR
(org.apache.pdfbox.contentstream.PDFStreamEngine:840) - Missing XObject:
Im1

I first had it a few days ago (I did not report it, shame on me) but the
error did not occur when I called the loadLegacy method on PDDocument.
But the loadLegacy method is not available anymore...

The issue happens on Windows (works fine on Debian).

Thanks fo your help

Kevin













Re: Error on PDDocument.load

2015-01-21 Thread Kevin Morin
I thought I was running java 7 but it's java 8... I tried with java 7 
and it works. I do not need it to work with java 8, java 7 is ok for me.


Thanks for your help and for all your work.

Kevin

On 21/01/2015 11:54, Maruan Sahyoun wrote:

Hi Kevin

works for me - what's your Java Version?

BR
Maruan

Am 21.01.2015 um 11:24 schrieb Kevin Morin mo...@codelutin.com:


Hi,

it does not work with PDFToImage either, I still get a blank image. Plus, I did 
not set the nonSeq option however it seems to be using the non sequential 
parser. And I have the following traces:
janv. 21, 2015 11:20:02 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser ch
eckXrefOffsets
GRAVE: Can't find the object 7 0 (origin offset 359138)
janv. 21, 2015 11:20:03 AM org.apache.pdfbox.contentstream.PDFStreamEngine opera
torException
GRAVE: Missing XObject: Im1

BR

Kevin

On 21/01/2015 11:11, Maruan Sahyoun wrote:

Hi Kevin,

you can test with the PDFToImage command [1] available in from the pdfbox-app 
[2] if the issue happens there. The source for PDFToImage is available in the 
tools section of the SVN repo or online viewable [3].

BR
Maruan

[1] https://pdfbox.apache.org/1.8/commandline.html#pdfToImage
[2] 
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/
[3] 
http://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/java/org/apache/pdfbox/tools/PDFToImage.java?view=markup

Am 21.01.2015 um 11:00 schrieb Kevin Morin mo...@codelutin.com:


Hi Andreas,

I am using the latest snapshot available on the maven repository. And I am 
running my app on Windows Server 2008 R2 Standard and it does not work (white 
page). Could send me the code or a jar to test on this server to check if it 
does not come from my code?

BR

Kevin

On 19/01/2015 19:13, Andreas Lehmkuehler wrote:

Hi,

Am 19.01.2015 um 12:45 schrieb Kevin Morin:

Actually, the issue is not only these traces. The real issue is that I
have a
blank image when I try to render the document.

I've checked your PDF and everything renders fine. I've tried
SNAPSHOT-891 on linux (running java 1.8, 1.7  and 1.6) and the latest
SNAPSHOT-947 on win7 running java 1.7

Maybe your SNAPSHOT is outdated?

BR
Andreas Lehmkühler


On 19/01/2015 12:39, Kevin Morin wrote:

Hi,

I am using the 2.0 snapshot version to images of pdfs, but on some
documents, I have the following error when I call PDDocument.load(file):
2015/01/19 12:32:48 ERROR
(org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1864) - Can't find
the object 7 0 (origin offset 359138)
2015/01/19 12:32:48 ERROR
(org.apache.pdfbox.contentstream.PDFStreamEngine:840) - Missing XObject:
Im1

I first had it a few days ago (I did not report it, shame on me) but the
error did not occur when I called the loadLegacy method on PDDocument.
But the loadLegacy method is not available anymore...

The issue happens on Windows (works fine on Debian).

Thanks fo your help

Kevin

















Re: Error on PDDocument.load

2015-01-21 Thread Andreas Lehmkühler
Hi,

 Kevin Morin mo...@codelutin.com hat am 21. Januar 2015 um 12:14 geschrieben:
 
 
 I thought I was running java 7 but it's java 8... I tried with java 7 
 and it works. I do not need it to work with java 8, java 7 is ok for me.
It works for me using java 8 on win7 and linux as well. I guess, the issue has
to be something else


BR
Andreas Lehmkühler

 Thanks for your help and for all your work.
 
 Kevin
 
 On 21/01/2015 11:54, Maruan Sahyoun wrote:
  Hi Kevin
 
  works for me - what's your Java Version?
 
  BR
  Maruan
 
  Am 21.01.2015 um 11:24 schrieb Kevin Morin mo...@codelutin.com:
 
  Hi,
 
  it does not work with PDFToImage either, I still get a blank image. Plus, I
  did not set the nonSeq option however it seems to be using the non
  sequential parser. And I have the following traces:
  janv. 21, 2015 11:20:02 AM
  org.apache.pdfbox.pdfparser.NonSequentialPDFParser ch
  eckXrefOffsets
  GRAVE: Can't find the object 7 0 (origin offset 359138)
  janv. 21, 2015 11:20:03 AM org.apache.pdfbox.contentstream.PDFStreamEngine
  opera
  torException
  GRAVE: Missing XObject: Im1
 
  BR
 
  Kevin
 
  On 21/01/2015 11:11, Maruan Sahyoun wrote:
  Hi Kevin,
 
  you can test with the PDFToImage command [1] available in from the
  pdfbox-app [2] if the issue happens there. The source for PDFToImage is
  available in the tools section of the SVN repo or online viewable [3].
 
  BR
  Maruan
 
  [1] https://pdfbox.apache.org/1.8/commandline.html#pdfToImage
  [2]
  https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/
  [3]
  http://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/java/org/apache/pdfbox/tools/PDFToImage.java?view=markup
 
  Am 21.01.2015 um 11:00 schrieb Kevin Morin mo...@codelutin.com:
 
  Hi Andreas,
 
  I am using the latest snapshot available on the maven repository. And I
  am running my app on Windows Server 2008 R2 Standard and it does not work
  (white page). Could send me the code or a jar to test on this server to
  check if it does not come from my code?
 
  BR
 
  Kevin
 
  On 19/01/2015 19:13, Andreas Lehmkuehler wrote:
  Hi,
 
  Am 19.01.2015 um 12:45 schrieb Kevin Morin:
  Actually, the issue is not only these traces. The real issue is that I
  have a
  blank image when I try to render the document.
  I've checked your PDF and everything renders fine. I've tried
  SNAPSHOT-891 on linux (running java 1.8, 1.7  and 1.6) and the latest
  SNAPSHOT-947 on win7 running java 1.7
 
  Maybe your SNAPSHOT is outdated?
 
  BR
  Andreas Lehmkühler
 
  On 19/01/2015 12:39, Kevin Morin wrote:
  Hi,
 
  I am using the 2.0 snapshot version to images of pdfs, but on some
  documents, I have the following error when I call
  PDDocument.load(file):
  2015/01/19 12:32:48 ERROR
  (org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1864) - Can't find
  the object 7 0 (origin offset 359138)
  2015/01/19 12:32:48 ERROR
  (org.apache.pdfbox.contentstream.PDFStreamEngine:840) - Missing
  XObject:
  Im1
 
  I first had it a few days ago (I did not report it, shame on me) but
  the
  error did not occur when I called the loadLegacy method on PDDocument.
  But the loadLegacy method is not available anymore...
 
  The issue happens on Windows (works fine on Debian).
 
  Thanks fo your help
 
  Kevin
 
 
 
 
 
 
 



PDFBox-generated PDF/A-1b not validated using PDFBox validation

2015-01-21 Thread Julien Béti

Hello,

Attached, you'll find a sample java code which creates a PDF/A-1b file  
using CreatePDFA sample [1], and then immediately tries to validate it  
using Preflight as described in Cookbook [2]


The validation fails with the following error:
--8
The file/tmp/test.pdf is not valid, error(s) :
7.1 : Error on MetaData, xmp should start with a processing instruction
--8

I'm using PDFBox to generates PDF/A files which are validated by an  
application which obviously uses PDFBox for validation, and my PDF  
files are rejected for the moment, with the same error.


Could you please tell me what's missing in CreatePDFA sample to make  
it pass the validation?


Kind Regards,

Julien.

[1]  
http://svn.apache.org/viewvc/pdfbox/branches/1.8/examples/src/main/java/org/apache/pdfbox/examples/pdfa/CreatePDFA.java?revision=1620380view=markup

[2] http://pdfbox.apache.org/1.8/cookbook/pdfavalidation.html
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the License); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an AS IS BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import java.io.ByteArrayOutputStream;
import java.io.InputStream;
import javax.activation.FileDataSource;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDMetadata;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDTrueTypeFont;
import org.apache.pdfbox.pdmodel.graphics.color.PDOutputIntent;
import org.apache.pdfbox.preflight.PreflightDocument;
import org.apache.pdfbox.preflight.ValidationResult;
import org.apache.pdfbox.preflight.exception.SyntaxValidationException;
import org.apache.pdfbox.preflight.parser.PreflightParser;
import org.apache.xmpbox.XMPMetadata;
import org.apache.xmpbox.schema.PDFAIdentificationSchema;
import org.apache.xmpbox.type.BadFieldValueException;
import org.apache.xmpbox.xml.XmpSerializationException;
import org.apache.xmpbox.xml.XmpSerializer;

/**
 * This is an example that creates a simple PDF/A document.
 *
 */
public class CreateAndCheckPDFA {

	/**
	 * Constructor.
	 */
	public CreateAndCheckPDFA() {
		super();
	}

	/**
	 * Create a simple PDF/A document.
	 *
	 * This example is based on HelloWorld example.
	 *
	 * As it is a simple case, to conform the PDF/A norm, are added : - the font used in the document - a light xmp
	 * block with only PDF identification schema (the only mandatory) - an output intent
	 *
	 * @param file The file to write the PDF to.
	 * @param message The message to write in the file.
	 *
	 * @throws Exception If something bad occurs
	 */
	public void doIt(String file, String message) throws Exception {
		// the document
		PDDocument doc = null;
		try {
			doc = new PDDocument();

			PDPage page = new PDPage();
			doc.addPage(page);

			// load the font from pdfbox.jar
			InputStream fontStream = CreateAndCheckPDFA.class.getResourceAsStream(/org/apache/pdfbox/resources/ttf/ArialMT.ttf);
			PDFont font = PDTrueTypeFont.loadTTF(doc, fontStream);

			// create a page with the message where needed
			PDPageContentStream contentStream = new PDPageContentStream(doc, page);
			contentStream.beginText();
			contentStream.setFont(font, 12);
			contentStream.moveTextPositionByAmount(100, 700);
			contentStream.drawString(message);
			contentStream.endText();
			contentStream.saveGraphicsState();
			contentStream.close();

			PDDocumentCatalog cat = doc.getDocumentCatalog();
			PDMetadata metadata = new PDMetadata(doc);
			cat.setMetadata(metadata);

			XMPMetadata xmp = XMPMetadata.createXMPMetadata();
			try {
PDFAIdentificationSchema pdfaid = xmp.createAndAddPFAIdentificationSchema();
pdfaid.setConformance(B);
pdfaid.setPart(1);
pdfaid.setAboutAsSimple(PDFBox PDFA sample);
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, false);
metadata.importXMPMetadata(baos.toByteArray());
			} catch (BadFieldValueException badFieldexception) {
// can't happen here, as the provided value is valid
			} catch (XmpSerializationException xmpException) {
System.err.println(xmpException.getMessage());
			}

			

Re: Error on PDDocument.load

2015-01-21 Thread Kevin Morin

Hi Andreas,

I am using the latest snapshot available on the maven repository. And I 
am running my app on Windows Server 2008 R2 Standard and it does not 
work (white page). Could send me the code or a jar to test on this 
server to check if it does not come from my code?


BR

Kevin

On 19/01/2015 19:13, Andreas Lehmkuehler wrote:

Hi,

Am 19.01.2015 um 12:45 schrieb Kevin Morin:

Actually, the issue is not only these traces. The real issue is that I
have a
blank image when I try to render the document.

I've checked your PDF and everything renders fine. I've tried
SNAPSHOT-891 on linux (running java 1.8, 1.7  and 1.6) and the latest
SNAPSHOT-947 on win7 running java 1.7

Maybe your SNAPSHOT is outdated?

BR
Andreas Lehmkühler


On 19/01/2015 12:39, Kevin Morin wrote:

Hi,

I am using the 2.0 snapshot version to images of pdfs, but on some
documents, I have the following error when I call PDDocument.load(file):
2015/01/19 12:32:48 ERROR
(org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1864) - Can't find
the object 7 0 (origin offset 359138)
2015/01/19 12:32:48 ERROR
(org.apache.pdfbox.contentstream.PDFStreamEngine:840) - Missing XObject:
Im1

I first had it a few days ago (I did not report it, shame on me) but the
error did not occur when I called the loadLegacy method on PDDocument.
But the loadLegacy method is not available anymore...

The issue happens on Windows (works fine on Debian).

Thanks fo your help

Kevin








Runtime exception in PDFParser.parse()

2015-01-21 Thread Chris Bamford
Hi

Recently I started getting runtime exceptions in PDFParser.parse(), but I can't 
seem to get any information out as to why.  The weird thing is that it only 
happens on Linux - on my laptop it is fine!
I have tried to get a stack trace (my code uses log4j), but all I get is:


org.slf4j.impl.Log4jLoggerAdapter.log(Lorg/slf4j/Marker;Ljava/lang/String;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V

This is produced by my code:


try {
log.info(Parsing PDF document ...);
document = PDDocument.load(is, true); // skip corrupt pdf objects

} catch (Throwable t) {
message = Failure when parsing PDF  + filename + :  + t.getMessage();
if (t1.getCause() != null) {
message +=  | CAUSE:  + t1.getCause().getMessage();
}

log.warn(t);
log.debug(message);
throw new Exception(message);
}

I have also tried to turn logging up in my log4j.properties:


log4j.logger.org.apache.pdfbox=TRACE

But I see nothing except the slf4j line above.

Can anyone tell me what I am doing wrong?

Thanks

- Chris
   
Chris Bamford
m: +44 7860 405292
www.mimecast.com
 
Mimecast
CityPoint
One Ropemaker Street, London, EC2Y 9AW
+44 (0) 207 847 8700
 

 

Disclaimer

cbamf...@mimecast.com sent at 2015-01-21 14:24:10 is confidential and may be 
legally privileged. It is intended solely for use by users@pdfbox.apache.org 
and others authorized to receive it. If you are not users@pdfbox.apache.org
you are hereby notified that any disclosure, copying, distribution or taking 
action in reliance of the contents of this information is strictly prohibited 
and may be unlawful.

Mimecast Ltd. is a company registered in England and Wales with the company 
number 4698693 VAT No. GB 123 4197 34
Registered Office: CityPoint, One Ropemaker Street, Moorgate, London, EC2Y 9AW

This email message has been scanned for viruses by Mimecast. Mimecast delivers 
a complete managed email solution from a single web based platform. For more 
information please visit www.mimecast.com

mcst2013