[jira] [Commented] (PDFBOX-1975) Improve TestImageIOUtils unit tests to check image resolution and compression

2014-03-17 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937518#comment-13937518
 ] 

Tilman Hausherr commented on PDFBOX-1975:
-

I added an error log output if writeImage() returns false in rev 1578259.

 Improve TestImageIOUtils unit tests to check image resolution and compression
 -

 Key: PDFBOX-1975
 URL: https://issues.apache.org/jira/browse/PDFBOX-1975
 Project: PDFBox
  Issue Type: Task
  Components: Utilities
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
Priority: Minor
  Labels: imageio, test, tiff
 Fix For: 2.0.0


 Because of the problems with recent changes (see PDFBOX-1963), I will improve 
 the unit tests so that image resolution and compression is checked.
 I found out that JPEGs don't have a resolution, BMP had the wrong resolution. 
 The fault wasn't in the java TIFF writer as I thought before, it is in the 
 java PNG writer, which uses the PixelSize values wrongly, i.e. it interprets 
 them as pixels per mm instead of mm per pixel as per specification. The 
 JPEG writer throws an exception JFIF APP0 must be first marker after SOI. 
 The BMP writer can set the resolution, but the BMP reader doesn't read it.
 (Some of this might be different depending on the version)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Visible signature image

2014-03-17 Thread Vakhtang koroghlishvili
I'have just updated pdfbox and test this feature. Everything works well.







On Sat, Mar 15, 2014 at 10:35 AM, Tilman Hausherr thaush...@t-online.dewrote:

 I believe that somebody mentioned somewhere that creating the signature
 image didn't work properly, but I just can't find out who it was. While
 working on a test for JPEGFactory (PDFBOX-1969) I noticed that
 JPEGFactory.createFromImage() was temporarly broken (now hopefully no
 more), and this method is only used by PDVisibleSigBuilder.
 createSignatureImage().

 I see now that this was created in PDFBOX-1766 by Thomas and Vakhtang -
 please test whether it still works.

 Tilman



[jira] [Commented] (PDFBOX-1988) PDFBox ExtractText issue of PDF with no embedded fonts

2014-03-17 Thread Craig Strong (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937894#comment-13937894
 ] 

Craig Strong commented on PDFBOX-1988:
--

Thank you John and Tilman.  That was very quick and effective work.

 PDFBox ExtractText issue of PDF with no embedded fonts
 --

 Key: PDFBOX-1988
 URL: https://issues.apache.org/jira/browse/PDFBOX-1988
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering, Text extraction
Affects Versions: 1.8.4
 Environment: Windows 7
 Also, PASE on IBM i
Reporter: Craig Strong
  Labels: patch
 Fix For: 1.8.5, 2.0.0

 Attachments: Test1.pdf

   Original Estimate: 120h
  Remaining Estimate: 120h

 I have been using PDFBox 1.8.4 to extract text from several different PDF 
 files fine.  I use the latest PDFBox app with ExtractText command line.  
 There is one PDF that PDFBox (and iText) fails to extract any text even 
 though I can extract the text with Adobe Reader and also pdftotext.exe part 
 of XPdf.  java -jar pdfbox-app-1.8.4.jar ExtractText Test1.pdf Out.txt.  I 
 don't want to have to rely on using pdftotext.exe from a PC since this is 
 part of an automated application.  I think the error relates to an unknown 
 font type and having to use the few fonts installed in the jar file.  I tried 
 running the API classes and trying to force a font from a certain location 
 but I still got errors.  I thought I loaded the font with the loadTTF method 
 but I don't know if that did anything with the font.  I would really like to 
 have this working straight from the ExtractText class anyway.
 Here are the errors I am getting.  I tried this from both a Windows 7 PC and 
 our IBM i in the PASE environment but I get the same errors.  The section 
 starting processEncodedText and on repeats a few times so I just included the 
 first entries.
  
 Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory 
 createFont   
 WARNING: Substituting TrueType for unknown font subtype=  
 
 Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
 processOperator
 WARNING: java.lang.NullPointerException   
 
 Throwable occurred: java.lang.NullPointerException
 
 at 
 org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
 at 
 org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)
 
 at 
 org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.init(PDTrueTypeFont.java:119) 

 at 
 org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)
   
 at 
 org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)  

 at 
 org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604) 

 at 
 org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)  

 at 
 org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
  
 at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
 at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
 at 
 org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)

 at 
 org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)  

 at 
 org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) 

 at 
 org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)

 at 
 org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)   

 at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
   
 at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)  
   
 Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
 processEncodedText   
 WARNING: java.lang.NullPointerException   
   
 Throwable occurred: java.lang.NullPointerException
 
 at 
 org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
 at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) 
 
 at 
 org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)

 at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
   
  

[jira] [Created] (PDFBOX-1989) Save LZW and other encoded PDImageXObject resources

2014-03-17 Thread Tilman Hausherr (JIRA)
Tilman Hausherr created PDFBOX-1989:
---

 Summary: Save LZW and other encoded PDImageXObject resources
 Key: PDFBOX-1989
 URL: https://issues.apache.org/jira/browse/PDFBOX-1989
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
Priority: Minor
 Fix For: 2.0.0


The logo image of the file from PDFBOX-1147.png isn't extracted because 
PDImageXObject.getSuffix() returns null. Changing getSuffix() so that it 
returns png brings us a correct file.

With some other images, e.g. the raw_image_demo.pdf file, getSuffix() brings an 
NPE when getPDStream().getFilters() returns null. This happens with images that 
are uncompressed. Returning png for this case also brings us a nice image.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-1989) Save LZW and other encoded PDImageXObject resources

2014-03-17 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-1989.
-

Resolution: Fixed

Done in rev 1578481.

 Save LZW and other encoded PDImageXObject resources
 ---

 Key: PDFBOX-1989
 URL: https://issues.apache.org/jira/browse/PDFBOX-1989
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
Priority: Minor
 Fix For: 2.0.0


 The logo image of the file from PDFBOX-1147.png isn't extracted because 
 PDImageXObject.getSuffix() returns null. Changing getSuffix() so that it 
 returns png brings us a correct file.
 With some other images, e.g. the raw_image_demo.pdf file, getSuffix() brings 
 an NPE when getPDStream().getFilters() returns null. This happens with images 
 that are uncompressed. Returning png for this case also brings us a nice 
 image.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-1990) Support creating PDF from lossless encoded images

2014-03-17 Thread Tilman Hausherr (JIRA)
Tilman Hausherr created PDFBOX-1990:
---

 Summary: Support creating PDF from lossless encoded images
 Key: PDFBOX-1990
 URL: https://issues.apache.org/jira/browse/PDFBOX-1990
 Project: PDFBox
  Issue Type: Improvement
Reporter: Tilman Hausherr
Priority: Minor


Currently we support the insertion of TIFF and JPEG into a PDF, but not PNG. We 
can pass a BufferedImage, but this one will be JPEG compressed which is not a 
good thing for graphics with sharp edges. I suggest that we support PNG as 
well. It is possible because the Flate Filter supports both directions.

My implementation (coming in a few minutes) is just an RGB based start that 
begs for improvement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1990) Support creating PDF from lossless encoded images

2014-03-17 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938188#comment-13938188
 ] 

Tilman Hausherr commented on PDFBOX-1990:
-

Done in rev 1578489 and 1578492 and 1578503. I also added a NullOutputStream.

 Support creating PDF from lossless encoded images
 -

 Key: PDFBOX-1990
 URL: https://issues.apache.org/jira/browse/PDFBOX-1990
 Project: PDFBox
  Issue Type: Improvement
Reporter: Tilman Hausherr
Priority: Minor

 Currently we support the insertion of TIFF and JPEG into a PDF, but not PNG. 
 We can pass a BufferedImage, but this one will be JPEG compressed which is 
 not a good thing for graphics with sharp edges. I suggest that we support PNG 
 as well. It is possible because the Flate Filter supports both directions.
 My implementation (coming in a few minutes) is just an RGB based start that 
 begs for improvement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1990) Support creating PDF from lossless encoded images

2014-03-17 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938188#comment-13938188
 ] 

Tilman Hausherr edited comment on PDFBOX-1990 at 3/17/14 6:43 PM:
--

Done in rev 1578489 and 1578492 and 1578503 and 1578505. I also added a 
NullOutputStream.


was (Author: tilman):
Done in rev 1578489 and 1578492 and 1578503. I also added a NullOutputStream.

 Support creating PDF from lossless encoded images
 -

 Key: PDFBOX-1990
 URL: https://issues.apache.org/jira/browse/PDFBOX-1990
 Project: PDFBox
  Issue Type: Improvement
Reporter: Tilman Hausherr
Priority: Minor

 Currently we support the insertion of TIFF and JPEG into a PDF, but not PNG. 
 We can pass a BufferedImage, but this one will be JPEG compressed which is 
 not a good thing for graphics with sharp edges. I suggest that we support PNG 
 as well. It is possible because the Flate Filter supports both directions.
 My implementation (coming in a few minutes) is just an RGB based start that 
 begs for improvement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1975) Improve TestImageIOUtils unit tests to check image resolution and compression

2014-03-17 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938322#comment-13938322
 ] 

Tilman Hausherr commented on PDFBOX-1975:
-

I added a test to save PDImageXObject objects from PDF within TestImageIOUtils 
in rev 1578544.

 Improve TestImageIOUtils unit tests to check image resolution and compression
 -

 Key: PDFBOX-1975
 URL: https://issues.apache.org/jira/browse/PDFBOX-1975
 Project: PDFBox
  Issue Type: Task
  Components: Utilities
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
Priority: Minor
  Labels: imageio, test, tiff
 Fix For: 2.0.0


 Because of the problems with recent changes (see PDFBOX-1963), I will improve 
 the unit tests so that image resolution and compression is checked.
 I found out that JPEGs don't have a resolution, BMP had the wrong resolution. 
 The fault wasn't in the java TIFF writer as I thought before, it is in the 
 java PNG writer, which uses the PixelSize values wrongly, i.e. it interprets 
 them as pixels per mm instead of mm per pixel as per specification. The 
 JPEG writer throws an exception JFIF APP0 must be first marker after SOI. 
 The BMP writer can set the resolution, but the BMP reader doesn't read it.
 (Some of this might be different depending on the version)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


PDFTextStripper.pageSeparator has no effect

2014-03-17 Thread Musall Maik
Hi,

I tried to use the parameter pageSeparator on PDFTextStripper and noticed that 
it has no effect. I checked the sources and discovered that in all versions up 
to the current trunk, the setting is simply not used anywhere.

The only method using a set separator is writePageSeperator(), which also 
includes a typo worth fixing, but this method isn’t called anywhere. It should 
probably be called in processPages(). However, and this is why I didn’t go 
ahead and submit a patch myself, what does happen is that the pageEnd marker is 
written, which is initialized to the value of pageSeparator. So if both get 
used, this will probably end up in the same marker emitted twice on each page 
break.

As a result, I’m unsure what to do about this and thought I’d leave it to the 
core team maintaining this, so I’m just reporting it here.

Regards
Maik



[jira] [Commented] (PDFBOX-1847) TSA Time Signature

2014-03-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938516#comment-13938516
 ] 

John Hewson commented on PDFBOX-1847:
-

[~v.koroghlishvili] Ok, I applied the changes discussed in revision 1578650. I 
made some significant changes to the patch so that the singing functionality 
can be moved into pdfbox proper, rather than being part of the examples. 
Currently the code remains part of the examples until we're sure it works. Can 
you test out the new code and see if signing is working as you expected?

*Technical Notes*
Revision 1578650 includes changes to various other files, 
COSStandardOutputStream assumed that the OutputStream was always a 
FileOutputStream, which is obviously an unsafe assumption, in fact, output 
streams do not generally have a position at all, so I removed all code which 
broke that contract. COSWriter was treating its incremental update streams in a 
strange manner, it wanted the InputStream and OutputStream to be backed by the 
same underlying data, which is not generally possible, so I had to write new 
code to perform incremental writing in order not to break the Input/Output 
stream contract. This allows the incremental file to be written to a different 
stream from the one which was read. I also added some new loading and saving 
methods to PDDocument to make incremental updating easier, and to automatically 
keep track of File objects, when relevant.

 TSA Time Signature
 --

 Key: PDFBOX-1847
 URL: https://issues.apache.org/jira/browse/PDFBOX-1847
 Project: PDFBox
  Issue Type: Improvement
  Components: Signing
Affects Versions: 2.0.0
Reporter: vakhtang koroghlishvili
Assignee: John Hewson
 Fix For: 2.0.0

 Attachments: CreateSignature-updated.java.patch, 
 TSATimeSignature.patch, resultOfSigning.jpg


 When we was signing document, we was using time from our time. For more 
 security we can use Time Stamp server. 
 Trusted timestamping is the process of securely keeping track of the 
 creation and modification time of a document. Security here means that no one 
 — not even the owner of the document — should be able to change it once it 
 has been recorded provided that the timestamper's integrity is never 
 compromised.(wiki)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1847) TSA Time Signature

2014-03-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938516#comment-13938516
 ] 

John Hewson edited comment on PDFBOX-1847 at 3/17/14 10:55 PM:
---

[~v.koroghlishvili] Ok, I applied the changes discussed in revision 1578650. I 
made some significant changes to the patch so that the singing functionality 
can be moved into pdfbox proper, rather than being part of the examples. 
Currently the code remains part of the examples until we're sure it works. Can 
you test out the new code and see if signing is working as you expected?

I've added a command line flag to CreateSignature to allow passing a TSA server 
URL:

{code}
usage: java org.apache.pdfbox.examples.signature.CreateSignature 
pkcs12_keystore password pdf_to_sign
options:
  -tsa urlsign timestamp using the given TSA server
{code}

*Technical Notes*
Revision 1578650 includes changes to various other files, 
COSStandardOutputStream assumed that the OutputStream was always a 
FileOutputStream, which is obviously an unsafe assumption, in fact, output 
streams do not generally have a position at all, so I removed all code which 
broke that contract. COSWriter was treating its incremental update streams in a 
strange manner, it wanted the InputStream and OutputStream to be backed by the 
same underlying data, which is not generally possible, so I had to write new 
code to perform incremental writing in order not to break the Input/Output 
stream contract. This allows the incremental file to be written to a different 
stream from the one which was read. I also added some new loading and saving 
methods to PDDocument to make incremental updating easier, and to automatically 
keep track of File objects, when relevant.


was (Author: jahewson):
[~v.koroghlishvili] Ok, I applied the changes discussed in revision 1578650. I 
made some significant changes to the patch so that the singing functionality 
can be moved into pdfbox proper, rather than being part of the examples. 
Currently the code remains part of the examples until we're sure it works. Can 
you test out the new code and see if signing is working as you expected?

*Technical Notes*
Revision 1578650 includes changes to various other files, 
COSStandardOutputStream assumed that the OutputStream was always a 
FileOutputStream, which is obviously an unsafe assumption, in fact, output 
streams do not generally have a position at all, so I removed all code which 
broke that contract. COSWriter was treating its incremental update streams in a 
strange manner, it wanted the InputStream and OutputStream to be backed by the 
same underlying data, which is not generally possible, so I had to write new 
code to perform incremental writing in order not to break the Input/Output 
stream contract. This allows the incremental file to be written to a different 
stream from the one which was read. I also added some new loading and saving 
methods to PDDocument to make incremental updating easier, and to automatically 
keep track of File objects, when relevant.

 TSA Time Signature
 --

 Key: PDFBOX-1847
 URL: https://issues.apache.org/jira/browse/PDFBOX-1847
 Project: PDFBox
  Issue Type: Improvement
  Components: Signing
Affects Versions: 2.0.0
Reporter: vakhtang koroghlishvili
Assignee: John Hewson
 Fix For: 2.0.0

 Attachments: CreateSignature-updated.java.patch, 
 TSATimeSignature.patch, resultOfSigning.jpg


 When we was signing document, we was using time from our time. For more 
 security we can use Time Stamp server. 
 Trusted timestamping is the process of securely keeping track of the 
 creation and modification time of a document. Security here means that no one 
 — not even the owner of the document — should be able to change it once it 
 has been recorded provided that the timestamper's integrity is never 
 compromised.(wiki)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1983) Unable to add TIF images, CCITTFactory not working

2014-03-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938547#comment-13938547
 ] 

John Hewson commented on PDFBOX-1983:
-

Cool, it looks like PDMemoryStream is the weak link, it's not really doing what 
it says it is.

 Unable to add TIF images, CCITTFactory not working
 --

 Key: PDFBOX-1983
 URL: https://issues.apache.org/jira/browse/PDFBOX-1983
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 2.0.0
Reporter: Joel Kääpä
Assignee: Tilman Hausherr
 Fix For: 2.0.0

 Attachments: G4.tif, huhu.pdf


 As used in the AddImageToPDF example, the following line generates an error 
 with tif image:
 PDImageXObject ximage =  CCITTFactory.createFromRandomAccess(document, new 
 RandomAccessFile(new File(imagePath), r));
 java.io.IOException: Stream was not read
 at org.apache.pdfbox.cos.COSStream.getDecodeResult(COSStream.java:235)
 at 
 org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.init(PDImageXObject.java:80)
 at 
 org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.init(PDImageXObject.java:70)
 at 
 org.apache.pdfbox.pdmodel.graphics.image.CCITTFactory.createFromRandomAccess(CCITTFactory.java:50)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1987) Provide a PDF Lexer as a base for PDF parsing

2014-03-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938560#comment-13938560
 ] 

John Hewson commented on PDFBOX-1987:
-

{quote}
An are which I kept out is how to handle malformed tokens such as strings which 
have an unbalanced number of parenthesis. 
{quote}

Do you have any sample PDF files with this problem?

 Provide a PDF Lexer as a base for PDF parsing
 -

 Key: PDFBOX-1987
 URL: https://issues.apache.org/jira/browse/PDFBOX-1987
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Reporter: Maruan Sahyoun
Priority: Minor
 Fix For: 2.0.0

 Attachments: src.zip


 In order to enhance the parsing process and as a foundation for a combination 
 of the different parsers a PDF lexer should be provided.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1969) JPEGFactory bug

2014-03-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938571#comment-13938571
 ] 

John Hewson commented on PDFBOX-1969:
-

Ok, well if someone really wants support for JPEGs which use ARGB we can follow 
up on this, given that it has probably never worked (quite a bit of the 1.8 
image parsing code was like that).

 JPEGFactory bug
 ---

 Key: PDFBOX-1969
 URL: https://issues.apache.org/jira/browse/PDFBOX-1969
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Steven Burg
 Fix For: 2.0.0


 Attempted to run the RubberStampWithImage sample and received the following 
 errors:
 Exception in thread main java.lang.NullPointerException
at 
 org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory.createFromStream(JPEGFactory.java:72)
at 
 org.apache.pdfbox.examples.pdmodel.RubberStampWithImage.doIt(RubberStampWithImage.java:93)
at 
 org.apache.pdfbox.examples.pdmodel.RubberStampWithImage.main(RubberStampWithImage.java:185)
 This happens with any jog I tested with.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1969) JPEGFactory bug

2014-03-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938572#comment-13938572
 ] 

John Hewson commented on PDFBOX-1969:
-

Shall we close this issue?

 JPEGFactory bug
 ---

 Key: PDFBOX-1969
 URL: https://issues.apache.org/jira/browse/PDFBOX-1969
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Steven Burg
 Fix For: 2.0.0


 Attempted to run the RubberStampWithImage sample and received the following 
 errors:
 Exception in thread main java.lang.NullPointerException
at 
 org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory.createFromStream(JPEGFactory.java:72)
at 
 org.apache.pdfbox.examples.pdmodel.RubberStampWithImage.doIt(RubberStampWithImage.java:93)
at 
 org.apache.pdfbox.examples.pdmodel.RubberStampWithImage.main(RubberStampWithImage.java:185)
 This happens with any jog I tested with.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1594) Add support for AES256 Encryption

2014-03-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938576#comment-13938576
 ] 

John Hewson commented on PDFBOX-1594:
-

The problem is that this patch has been made against 1.8.4 rather than the 
trunk, and there are differences between the two. [~neon1] is it possible for 
you to make a new patch against the trunk?

 Add support for AES256 Encryption 
 --

 Key: PDFBOX-1594
 URL: https://issues.apache.org/jira/browse/PDFBOX-1594
 Project: PDFBox
  Issue Type: Improvement
Reporter: Maruan Sahyoun
 Fix For: 2.0.0

 Attachments: pdfbox-1.8.4-aes256.diff


 Adobe 9 added support for AES 256 encryption. Further information is 
 available at  
 http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/adobe_supplement_iso32000.pdf
  (specially 3.5.1) or ISO 32000-2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1512) TextPositionComparator is not compatible with Java 7

2014-03-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938585#comment-13938585
 ] 

John Hewson commented on PDFBOX-1512:
-

Perhaps we should migrate away from using Collections.sort altogether and use 
some other sorting algorithm?

 TextPositionComparator is not compatible with Java 7
 

 Key: PDFBOX-1512
 URL: https://issues.apache.org/jira/browse/PDFBOX-1512
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
Affects Versions: 1.7.1
 Environment: Java 7
Reporter: Benjamin Papez
Assignee: Andreas Lehmkühler
 Attachments: FOP-2252.pdf, TextPositionComparator.java, 
 WFI_PDFParser_TextPostionComparator.txt, immo-kurier_arsenal_93x62.pdf


 The TextPostionCompartor causes the following exception running on Java 7: 
 Unexpected RuntimeException from 
 org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison 
 method violates its general contract!
 I think the problem is with this check:
 if ( yDifference  .1 ||
 (pos2YBottom = pos1YTop  pos2YBottom = pos1YBottom) ||
 (pos1YBottom = pos2YTop  pos1YBottom = pos2YBottom))
 as it violates the contract requirement:
 The implementor must also ensure that the relation is transitive: 
 ((compare(x, y)0)  (compare(y, z)0)) implies compare(x, z)0.
 Finally, the implementor must ensure that compare(x, y)==0 implies that 
 sgn(compare(x, z))==sgn(compare(y, z)) for all z.
 Java 7 now is strict and throws exceptions when the contract is violated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Visible signature image

2014-03-17 Thread John Hewson
I just made a unit test for CreateSignature and I’ll add one for visible 
signatures soon.

-- John

On 17 Mar 2014, at 07:31, Vakhtang koroghlishvili 
vakhtang.koroghlishv...@gmail.com wrote:

 I'have just updated pdfbox and test this feature. Everything works well.
 
 
 
 
 
 
 
 On Sat, Mar 15, 2014 at 10:35 AM, Tilman Hausherr 
 thaush...@t-online.dewrote:
 
 I believe that somebody mentioned somewhere that creating the signature
 image didn't work properly, but I just can't find out who it was. While
 working on a test for JPEGFactory (PDFBOX-1969) I noticed that
 JPEGFactory.createFromImage() was temporarly broken (now hopefully no
 more), and this method is only used by PDVisibleSigBuilder.
 createSignatureImage().
 
 I see now that this was created in PDFBOX-1766 by Thomas and Vakhtang -
 please test whether it still works.
 
 Tilman
 



[jira] [Commented] (PDFBOX-1989) Save LZW and other encoded PDImageXObject resources

2014-03-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938596#comment-13938596
 ] 

John Hewson commented on PDFBOX-1989:
-

+1

 Save LZW and other encoded PDImageXObject resources
 ---

 Key: PDFBOX-1989
 URL: https://issues.apache.org/jira/browse/PDFBOX-1989
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
Priority: Minor
 Fix For: 2.0.0


 The logo image of the file from PDFBOX-1147.png isn't extracted because 
 PDImageXObject.getSuffix() returns null. Changing getSuffix() so that it 
 returns png brings us a correct file.
 With some other images, e.g. the raw_image_demo.pdf file, getSuffix() brings 
 an NPE when getPDStream().getFilters() returns null. This happens with images 
 that are uncompressed. Returning png for this case also brings us a nice 
 image.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1988) PDFBox ExtractText issue of PDF with no embedded fonts

2014-03-17 Thread Craig Strong (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938606#comment-13938606
 ] 

Craig Strong commented on PDFBOX-1988:
--

I tested the fix on the 2.0.0 build and it worked.  Thanks again.

 PDFBox ExtractText issue of PDF with no embedded fonts
 --

 Key: PDFBOX-1988
 URL: https://issues.apache.org/jira/browse/PDFBOX-1988
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering, Text extraction
Affects Versions: 1.8.4
 Environment: Windows 7
 Also, PASE on IBM i
Reporter: Craig Strong
  Labels: patch
 Fix For: 1.8.5, 2.0.0

 Attachments: Test1.pdf

   Original Estimate: 120h
  Remaining Estimate: 120h

 I have been using PDFBox 1.8.4 to extract text from several different PDF 
 files fine.  I use the latest PDFBox app with ExtractText command line.  
 There is one PDF that PDFBox (and iText) fails to extract any text even 
 though I can extract the text with Adobe Reader and also pdftotext.exe part 
 of XPdf.  java -jar pdfbox-app-1.8.4.jar ExtractText Test1.pdf Out.txt.  I 
 don't want to have to rely on using pdftotext.exe from a PC since this is 
 part of an automated application.  I think the error relates to an unknown 
 font type and having to use the few fonts installed in the jar file.  I tried 
 running the API classes and trying to force a font from a certain location 
 but I still got errors.  I thought I loaded the font with the loadTTF method 
 but I don't know if that did anything with the font.  I would really like to 
 have this working straight from the ExtractText class anyway.
 Here are the errors I am getting.  I tried this from both a Windows 7 PC and 
 our IBM i in the PASE environment but I get the same errors.  The section 
 starting processEncodedText and on repeats a few times so I just included the 
 first entries.
  
 Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory 
 createFont   
 WARNING: Substituting TrueType for unknown font subtype=  
 
 Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
 processOperator
 WARNING: java.lang.NullPointerException   
 
 Throwable occurred: java.lang.NullPointerException
 
 at 
 org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
 at 
 org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)
 
 at 
 org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.init(PDTrueTypeFont.java:119) 

 at 
 org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)
   
 at 
 org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)  

 at 
 org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604) 

 at 
 org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)  

 at 
 org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
  
 at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
 at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
 at 
 org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)

 at 
 org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)  

 at 
 org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) 

 at 
 org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)

 at 
 org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)   

 at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
   
 at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)  
   
 Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
 processEncodedText   
 WARNING: java.lang.NullPointerException   
   
 Throwable occurred: java.lang.NullPointerException
 
 at 
 org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
 at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) 
 
 at 
 org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)

 at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
   

[jira] [Closed] (PDFBOX-1988) PDFBox ExtractText issue of PDF with no embedded fonts

2014-03-17 Thread Craig Strong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Strong closed PDFBOX-1988.



Closing the issue.

 PDFBox ExtractText issue of PDF with no embedded fonts
 --

 Key: PDFBOX-1988
 URL: https://issues.apache.org/jira/browse/PDFBOX-1988
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering, Text extraction
Affects Versions: 1.8.4
 Environment: Windows 7
 Also, PASE on IBM i
Reporter: Craig Strong
  Labels: patch
 Fix For: 1.8.5, 2.0.0

 Attachments: Test1.pdf

   Original Estimate: 120h
  Remaining Estimate: 120h

 I have been using PDFBox 1.8.4 to extract text from several different PDF 
 files fine.  I use the latest PDFBox app with ExtractText command line.  
 There is one PDF that PDFBox (and iText) fails to extract any text even 
 though I can extract the text with Adobe Reader and also pdftotext.exe part 
 of XPdf.  java -jar pdfbox-app-1.8.4.jar ExtractText Test1.pdf Out.txt.  I 
 don't want to have to rely on using pdftotext.exe from a PC since this is 
 part of an automated application.  I think the error relates to an unknown 
 font type and having to use the few fonts installed in the jar file.  I tried 
 running the API classes and trying to force a font from a certain location 
 but I still got errors.  I thought I loaded the font with the loadTTF method 
 but I don't know if that did anything with the font.  I would really like to 
 have this working straight from the ExtractText class anyway.
 Here are the errors I am getting.  I tried this from both a Windows 7 PC and 
 our IBM i in the PASE environment but I get the same errors.  The section 
 starting processEncodedText and on repeats a few times so I just included the 
 first entries.
  
 Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory 
 createFont   
 WARNING: Substituting TrueType for unknown font subtype=  
 
 Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
 processOperator
 WARNING: java.lang.NullPointerException   
 
 Throwable occurred: java.lang.NullPointerException
 
 at 
 org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
 at 
 org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)
 
 at 
 org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.init(PDTrueTypeFont.java:119) 

 at 
 org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)
   
 at 
 org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)  

 at 
 org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604) 

 at 
 org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)  

 at 
 org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
  
 at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
 at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
 at 
 org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)

 at 
 org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)  

 at 
 org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) 

 at 
 org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)

 at 
 org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)   

 at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
   
 at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)  
   
 Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
 processEncodedText   
 WARNING: java.lang.NullPointerException   
   
 Throwable occurred: java.lang.NullPointerException
 
 at 
 org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
 at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) 
 
 at 
 org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)

 at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
   
 at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)