[jira] [Comment Edited] (PDFBOX-2602) Enhance command line tools

2020-12-27 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255310#comment-17255310
 ] 

Tilman Hausherr edited comment on PDFBOX-2602 at 12/27/20, 7:17 PM:


How about making the commands case insensitive? This way old habits don't have 
to be changed.


was (Author: tilman):
How about making the commands case insensitive? This way old habits don't have 
to be changed.

 

 

> Enhance command line tools
> --
>
> Key: PDFBOX-2602
> URL: https://issues.apache.org/jira/browse/PDFBOX-2602
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.8.8, 2.0.0
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Minor
> Fix For: 3.0.0 PDFBox
>
>
> The command line tools shall be enhanced to have the same behavior across all 
> tools.
> From the discussion on the dev mailing list
> - add an -h option to print the usage
> - print the usage to System.err and use an exit code of 1 if there was an 
> invalid command line parameter
> - print messages on exceptions to System.err
> - rethrow the exception so java can handle it if it will terminate afterwards 
> anyway
> - use an exit code of 1if rethrowing doesn't make sense
> Additional input:
> https://clig.dev/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2602) Enhance command line tools

2020-12-27 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255312#comment-17255312
 ] 

Tilman Hausherr commented on PDFBOX-2602:
-

PDFDebugger has also command line options... I don't understand what you mean 
with "interactive shell".

> Enhance command line tools
> --
>
> Key: PDFBOX-2602
> URL: https://issues.apache.org/jira/browse/PDFBOX-2602
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.8.8, 2.0.0
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Minor
> Fix For: 3.0.0 PDFBox
>
>
> The command line tools shall be enhanced to have the same behavior across all 
> tools.
> From the discussion on the dev mailing list
> - add an -h option to print the usage
> - print the usage to System.err and use an exit code of 1 if there was an 
> invalid command line parameter
> - print messages on exceptions to System.err
> - rethrow the exception so java can handle it if it will terminate afterwards 
> anyway
> - use an exit code of 1if rethrowing doesn't make sense
> Additional input:
> https://clig.dev/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2602) Enhance command line tools

2020-12-27 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255310#comment-17255310
 ] 

Tilman Hausherr commented on PDFBOX-2602:
-

How about making the commands case insensitive? This way old habits don't have 
to be changed.

 

 

> Enhance command line tools
> --
>
> Key: PDFBOX-2602
> URL: https://issues.apache.org/jira/browse/PDFBOX-2602
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.8.8, 2.0.0
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Minor
> Fix For: 3.0.0 PDFBox
>
>
> The command line tools shall be enhanced to have the same behavior across all 
> tools.
> From the discussion on the dev mailing list
> - add an -h option to print the usage
> - print the usage to System.err and use an exit code of 1 if there was an 
> invalid command line parameter
> - print messages on exceptions to System.err
> - rethrow the exception so java can handle it if it will terminate afterwards 
> anyway
> - use an exit code of 1if rethrowing doesn't make sense
> Additional input:
> https://clig.dev/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2602) Enhance command line tools

2020-12-27 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255290#comment-17255290
 ] 

Maruan Sahyoun commented on PDFBOX-2602:


[~tilman] I've (re-)added PDFDebugger following your comments but I'm not 
totally happy with that as all tools/commands are terminal based apart from 
PDFDebugger being the only GUI. And we already have that as an independent 
standalone app. This will limit us adding new functionality or at least will be 
the case for inconsistency when adding intented functionality such as tab 
completion, input/output piping or an interactive shell which are easier now 
with picocli (and some addons).

> Enhance command line tools
> --
>
> Key: PDFBOX-2602
> URL: https://issues.apache.org/jira/browse/PDFBOX-2602
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.8.8, 2.0.0
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Minor
> Fix For: 3.0.0 PDFBox
>
>
> The command line tools shall be enhanced to have the same behavior across all 
> tools.
> From the discussion on the dev mailing list
> - add an -h option to print the usage
> - print the usage to System.err and use an exit code of 1 if there was an 
> invalid command line parameter
> - print messages on exceptions to System.err
> - rethrow the exception so java can handle it if it will terminate afterwards 
> anyway
> - use an exit code of 1if rethrowing doesn't make sense
> Additional input:
> https://clig.dev/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2602) Enhance command line tools

2020-12-27 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255287#comment-17255287
 ] 

Maruan Sahyoun commented on PDFBOX-2602:


I'd like to change all commands to be in lower case as this is common for shell 
commands - thoughts?

> Enhance command line tools
> --
>
> Key: PDFBOX-2602
> URL: https://issues.apache.org/jira/browse/PDFBOX-2602
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.8.8, 2.0.0
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Minor
> Fix For: 3.0.0 PDFBox
>
>
> The command line tools shall be enhanced to have the same behavior across all 
> tools.
> From the discussion on the dev mailing list
> - add an -h option to print the usage
> - print the usage to System.err and use an exit code of 1 if there was an 
> invalid command line parameter
> - print messages on exceptions to System.err
> - rethrow the exception so java can handle it if it will terminate afterwards 
> anyway
> - use an exit code of 1if rethrowing doesn't make sense
> Additional input:
> https://clig.dev/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-2602) Enhance command line tools

2020-12-27 Thread Maruan Sahyoun (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maruan Sahyoun updated PDFBOX-2602:
---
Description: 
The command line tools shall be enhanced to have the same behavior across all 
tools.

>From the discussion on the dev mailing list
- add an -h option to print the usage
- print the usage to System.err and use an exit code of 1 if there was an 
invalid command line parameter
- print messages on exceptions to System.err
- rethrow the exception so java can handle it if it will terminate afterwards 
anyway
- use an exit code of 1if rethrowing doesn't make sense

Additional input:
https://clig.dev/

  was:
The command line tools shall be enhanced to have the same behavior across all 
tools.

>From the discussion on the dev mailing list
- add an -h option to print the usage
- print the usage to System.err and use an exit code of 1 if there was an 
invalid command line parameter
- print messages on exceptions to System.err
- rethrow the exception so java can handle it if it will terminate afterwards 
anyway
- use an exit code of 1if rethrowing doesn't make sense


> Enhance command line tools
> --
>
> Key: PDFBOX-2602
> URL: https://issues.apache.org/jira/browse/PDFBOX-2602
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.8.8, 2.0.0
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Minor
> Fix For: 3.0.0 PDFBox
>
>
> The command line tools shall be enhanced to have the same behavior across all 
> tools.
> From the discussion on the dev mailing list
> - add an -h option to print the usage
> - print the usage to System.err and use an exit code of 1 if there was an 
> invalid command line parameter
> - print messages on exceptions to System.err
> - rethrow the exception so java can handle it if it will terminate afterwards 
> anyway
> - use an exit code of 1if rethrowing doesn't make sense
> Additional input:
> https://clig.dev/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4952) PDF compression - object stream creation

2020-12-27 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255265#comment-17255265
 ] 

Tilman Hausherr commented on PDFBOX-4952:
-

javadoc error at COSWriterObjectStream.java 51 and 79. I think the parameters 
don't match.

> PDF compression - object stream creation
> 
>
> Key: PDFBOX-4952
> URL: https://issues.apache.org/jira/browse/PDFBOX-4952
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel
>Affects Versions: 2.0.21
>Reporter: Christian Appl
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
> Attachments: image-2020-09-07-09-47-30-172.png, 
> image-2020-09-07-10-05-15-631.png
>
>
> I implemented a basic starting point to realize a PDF compression based on 
> PDFBox 2.0.22-SNAPSHOT
> I want to use this ticket, to ask if you would be interested in such a 
> feature and whether you would be interested to merge it into PDFBox.
> This is sort of a POC, only implementing some very basic functionality, that 
> surely must and could be extended further and it does only implement some 
> very basic and simplistic Unit Tests.
>  However it is able to reduce the size of resulting documents, and creates 
> objectstreams as defined in the PDF reference manual.
> *What it currently does:*
>  It provides the bundling and compression of objects to objectstreams -and 
> further applies simple content compression to a small selection of contents-.
> -To realize content compression, it provides a simple interface and abstract 
> class for "ContentCompressor"s which search a document for specific content, 
> that could be compressed and do compress that contents.-
> -Currently two content compressors exist:-
>  -_ImageCompressor_-
>  -Searches for simple images, that could be compressed using DCT.-
> -_UnencodedStreamCompressor_-
>  -Searches the document for yet unencoded streams and applies a Flate 
> compression where necessary.-
> -Both compressors can be parameterized using a centralized 
> "CompressParameters" instance which is passed to a new "saveCompressed" 
> method of PDDocument.-
> The compression is based on, modifies and is realized by a set of extensions 
> for the "COSWriter" class. Basically it organizes objects, that are passed to 
> the COSWriter in objectStreams -and applies content optimization where 
> necessary and possible-.
> Currently this does support encryption, but does not support linearization of 
> the compressed documents.
> *Caveat:*
>  If this feature is interesting to you, then I would not expect you to simply 
> merge this fork into 2.0.22. I am expecting that you would like to have some 
> details and concepts changed and am ready to implement changes that would be 
> required for this to work to your liking.
> *POC:*
>  4 resulting documents can be found in "target/test-output/compression" when 
> "COSDocumentCompressionTest" is run.
> *The Pull request can be found on Github at:*
>  [https://github.com/apache/pdfbox/pull/86]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2602) Enhance command line tools

2020-12-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255254#comment-17255254
 ] 

ASF subversion and git services commented on PDFBOX-2602:
-

Commit 1884847 from Maruan Sahyoun in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1884847 ]

PDFBOX-2602: additional usage information

> Enhance command line tools
> --
>
> Key: PDFBOX-2602
> URL: https://issues.apache.org/jira/browse/PDFBOX-2602
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.8.8, 2.0.0
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Minor
> Fix For: 3.0.0 PDFBox
>
>
> The command line tools shall be enhanced to have the same behavior across all 
> tools.
> From the discussion on the dev mailing list
> - add an -h option to print the usage
> - print the usage to System.err and use an exit code of 1 if there was an 
> invalid command line parameter
> - print messages on exceptions to System.err
> - rethrow the exception so java can handle it if it will terminate afterwards 
> anyway
> - use an exit code of 1if rethrowing doesn't make sense



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4952) PDF compression - object stream creation

2020-12-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255245#comment-17255245
 ] 

ASF subversion and git services commented on PDFBOX-4952:
-

Commit 1884846 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1884846 ]

PDFBOX-4952: take all object keys into account when calculating the highest 
object number of a pdf, refactor the usage of the highest object number within 
COSWriter

> PDF compression - object stream creation
> 
>
> Key: PDFBOX-4952
> URL: https://issues.apache.org/jira/browse/PDFBOX-4952
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel
>Affects Versions: 2.0.21
>Reporter: Christian Appl
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
> Attachments: image-2020-09-07-09-47-30-172.png, 
> image-2020-09-07-10-05-15-631.png
>
>
> I implemented a basic starting point to realize a PDF compression based on 
> PDFBox 2.0.22-SNAPSHOT
> I want to use this ticket, to ask if you would be interested in such a 
> feature and whether you would be interested to merge it into PDFBox.
> This is sort of a POC, only implementing some very basic functionality, that 
> surely must and could be extended further and it does only implement some 
> very basic and simplistic Unit Tests.
>  However it is able to reduce the size of resulting documents, and creates 
> objectstreams as defined in the PDF reference manual.
> *What it currently does:*
>  It provides the bundling and compression of objects to objectstreams -and 
> further applies simple content compression to a small selection of contents-.
> -To realize content compression, it provides a simple interface and abstract 
> class for "ContentCompressor"s which search a document for specific content, 
> that could be compressed and do compress that contents.-
> -Currently two content compressors exist:-
>  -_ImageCompressor_-
>  -Searches for simple images, that could be compressed using DCT.-
> -_UnencodedStreamCompressor_-
>  -Searches the document for yet unencoded streams and applies a Flate 
> compression where necessary.-
> -Both compressors can be parameterized using a centralized 
> "CompressParameters" instance which is passed to a new "saveCompressed" 
> method of PDDocument.-
> The compression is based on, modifies and is realized by a set of extensions 
> for the "COSWriter" class. Basically it organizes objects, that are passed to 
> the COSWriter in objectStreams -and applies content optimization where 
> necessary and possible-.
> Currently this does support encryption, but does not support linearization of 
> the compressed documents.
> *Caveat:*
>  If this feature is interesting to you, then I would not expect you to simply 
> merge this fork into 2.0.22. I am expecting that you would like to have some 
> details and concepts changed and am ready to implement changes that would be 
> required for this to work to your liking.
> *POC:*
>  4 resulting documents can be found in "target/test-output/compression" when 
> "COSDocumentCompressionTest" is run.
> *The Pull request can be found on Github at:*
>  [https://github.com/apache/pdfbox/pull/86]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5059) java.io.IOException: expected number, actual=COSFloat{18446744073521659909} at offset 4932600

2020-12-27 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255244#comment-17255244
 ] 

Tilman Hausherr commented on PDFBOX-5059:
-

Please retry with the current version, which is 2.0.22. Your version is four 
years old. Might be related to PDFBOX-4495.

> java.io.IOException: expected number, actual=COSFloat{18446744073521659909} 
> at offset 4932600
> -
>
> Key: PDFBOX-5059
> URL: https://issues.apache.org/jira/browse/PDFBOX-5059
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.3
> Environment: linux
>Reporter: Ling Hock Hin, Daniel
>Priority: Major
>
> Encountered this error while trying to upload pdf. Seems to apply only for 
> certain pdfs. Can't share more due to confidentiality.
>  
> java.io.IOException: expected number, actual=COSFloat\{18446744073521659909} 
> at offset 4932600java.io.IOException: expected number, 
> actual=COSFloat\{18446744073521659909} at offset 4932600 at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:162)
>  at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:274)
>  at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:207)
>  at 
> org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:854) at 
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:757) at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:726)
>  at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:657)
>  at 
> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:617) at 
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:215) at 
> org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:249) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1093) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5059) java.io.IOException: expected number, actual=COSFloat{18446744073521659909} at offset 4932600

2020-12-27 Thread Ling Hock Hin, Daniel (Jira)
Ling Hock Hin, Daniel created PDFBOX-5059:
-

 Summary: java.io.IOException: expected number, 
actual=COSFloat{18446744073521659909} at offset 4932600
 Key: PDFBOX-5059
 URL: https://issues.apache.org/jira/browse/PDFBOX-5059
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.3
 Environment: linux
Reporter: Ling Hock Hin, Daniel


Encountered this error while trying to upload pdf. Seems to apply only for 
certain pdfs. Can't share more due to confidentiality.

 

java.io.IOException: expected number, actual=COSFloat\{18446744073521659909} at 
offset 4932600java.io.IOException: expected number, 
actual=COSFloat\{18446744073521659909} at offset 4932600 at 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:162)
 at 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:274)
 at 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:207) 
at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:854) 
at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:757) at 
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:726)
 at 
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:657)
 at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:617) 
at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:215) at 
org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:249) at 
org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1093) at 
org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071) at 
org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053) at 
org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4952) PDF compression - object stream creation

2020-12-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255238#comment-17255238
 ] 

ASF subversion and git services commented on PDFBOX-4952:
-

Commit 1884845 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1884845 ]

PDFBOX-4952: refactor COSWriterObjectStream to avoid creating a new ScratchFile 
for every object stream

> PDF compression - object stream creation
> 
>
> Key: PDFBOX-4952
> URL: https://issues.apache.org/jira/browse/PDFBOX-4952
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel
>Affects Versions: 2.0.21
>Reporter: Christian Appl
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
> Attachments: image-2020-09-07-09-47-30-172.png, 
> image-2020-09-07-10-05-15-631.png
>
>
> I implemented a basic starting point to realize a PDF compression based on 
> PDFBox 2.0.22-SNAPSHOT
> I want to use this ticket, to ask if you would be interested in such a 
> feature and whether you would be interested to merge it into PDFBox.
> This is sort of a POC, only implementing some very basic functionality, that 
> surely must and could be extended further and it does only implement some 
> very basic and simplistic Unit Tests.
>  However it is able to reduce the size of resulting documents, and creates 
> objectstreams as defined in the PDF reference manual.
> *What it currently does:*
>  It provides the bundling and compression of objects to objectstreams -and 
> further applies simple content compression to a small selection of contents-.
> -To realize content compression, it provides a simple interface and abstract 
> class for "ContentCompressor"s which search a document for specific content, 
> that could be compressed and do compress that contents.-
> -Currently two content compressors exist:-
>  -_ImageCompressor_-
>  -Searches for simple images, that could be compressed using DCT.-
> -_UnencodedStreamCompressor_-
>  -Searches the document for yet unencoded streams and applies a Flate 
> compression where necessary.-
> -Both compressors can be parameterized using a centralized 
> "CompressParameters" instance which is passed to a new "saveCompressed" 
> method of PDDocument.-
> The compression is based on, modifies and is realized by a set of extensions 
> for the "COSWriter" class. Basically it organizes objects, that are passed to 
> the COSWriter in objectStreams -and applies content optimization where 
> necessary and possible-.
> Currently this does support encryption, but does not support linearization of 
> the compressed documents.
> *Caveat:*
>  If this feature is interesting to you, then I would not expect you to simply 
> merge this fork into 2.0.22. I am expecting that you would like to have some 
> details and concepts changed and am ready to implement changes that would be 
> required for this to work to your liking.
> *POC:*
>  4 resulting documents can be found in "target/test-output/compression" when 
> "COSDocumentCompressionTest" is run.
> *The Pull request can be found on Github at:*
>  [https://github.com/apache/pdfbox/pull/86]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4548) Update PDType1Font to make PDFBox GraalVM native mode ready

2020-12-27 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255235#comment-17255235
 ] 

Andreas Lehmkühler commented on PDFBOX-4548:


IMHO there is a lot of room for optimizations and we should have have things 
like GraalVM in our minds when refactoring some of the effected code. However, 
those refactorings most likely will be complex and as we are planing to get 
3.0.0 out of the door as soon as possible, we have to postpone this one to 
4.0.0.

> Update PDType1Font to make PDFBox GraalVM native mode ready 
> 
>
> Key: PDFBOX-4548
> URL: https://issues.apache.org/jira/browse/PDFBOX-4548
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 2.0.16
>Reporter: Sergey Beryozkin
>Priority: Major
> Fix For: 4.0.0
>
>
> {{org.apache.pdfbox.pdmodel.font.PDType1Font}} has statically initialized 
> PDType1Font instances with the private constructor having a code path to 
> {{org.apache.fontbox.ttf.RAFDataStream}} which opens a File.
> This prevents [GraalVM|https://github.com/oracle/graal] from building a 
> native image of PDFBox or libraries which depend on it.
> Please see [TIKA-2862|https://issues.apache.org/jira/browse/TIKA-2862] for 
> more information. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4548) Update PDType1Font to make PDFBox GraalVM native mode ready

2020-12-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-4548:
---
Fix Version/s: (was: 3.0.0 PDFBox)
   4.0.0

> Update PDType1Font to make PDFBox GraalVM native mode ready 
> 
>
> Key: PDFBOX-4548
> URL: https://issues.apache.org/jira/browse/PDFBOX-4548
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 2.0.16
>Reporter: Sergey Beryozkin
>Priority: Major
> Fix For: 4.0.0
>
>
> {{org.apache.pdfbox.pdmodel.font.PDType1Font}} has statically initialized 
> PDType1Font instances with the private constructor having a code path to 
> {{org.apache.fontbox.ttf.RAFDataStream}} which opens a File.
> This prevents [GraalVM|https://github.com/oracle/graal] from building a 
> native image of PDFBox or libraries which depend on it.
> Please see [TIKA-2862|https://issues.apache.org/jira/browse/TIKA-2862] for 
> more information. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5058) Remove check for external glyphlist

2020-12-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255234#comment-17255234
 ] 

ASF subversion and git services commented on PDFBOX-5058:
-

Commit 1884844 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1884844 ]

PDFBOX-5058: remove exception for a no longer supported feature which was 
removed in 2.0

> Remove check for external glyphlist
> ---
>
> Key: PDFBOX-5058
> URL: https://issues.apache.org/jira/browse/PDFBOX-5058
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> PDFBOX-2379 removes the usage of the system property "glyphlist_ext" and an 
> exception is thrown if is is used nonetheless.
> We should remove the check completely



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5058) Remove check for external glyphlist

2020-12-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-5058.

Resolution: Fixed

> Remove check for external glyphlist
> ---
>
> Key: PDFBOX-5058
> URL: https://issues.apache.org/jira/browse/PDFBOX-5058
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> PDFBOX-2379 removes the usage of the system property "glyphlist_ext" and an 
> exception is thrown if is is used nonetheless.
> We should remove the check completely



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5058) Remove check for external glyphlist

2020-12-27 Thread Jira
Andreas Lehmkühler created PDFBOX-5058:
--

 Summary: Remove check for external glyphlist
 Key: PDFBOX-5058
 URL: https://issues.apache.org/jira/browse/PDFBOX-5058
 Project: PDFBox
  Issue Type: Improvement
Affects Versions: 3.0.0 PDFBox
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Fix For: 3.0.0 PDFBox


PDFBOX-2379 removes the usage of the system property "glyphlist_ext" and an 
exception is thrown if is is used nonetheless.

We should remove the check completely



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5056) Double checked locking done wrong in several places

2020-12-27 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255230#comment-17255230
 ] 

Andreas Lehmkühler commented on PDFBOX-5056:


W.r.t. refactorings we should wait until 3.0.0 is released, otherwise that 
release will become a never ending story ;-)

> Double checked locking done wrong in several places
> ---
>
> Key: PDFBOX-5056
> URL: https://issues.apache.org/jira/browse/PDFBOX-5056
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.22
>Reporter: Mike Kaplinskiy
>Priority: Major
>
> There's several places inside pdfbox where double-checked locking is done 
> wrong. Specifically, double checked locking is the pattern of:
> {code:java}
> private volatile boolean doneInit = false;
> if (!doneInit) {
> synchronized (this) {
> if (!doneInit) {
> ... do init
> doneInit = true;
> }
> }
> }{code}
> Common issues are - lack of {{volatile}} or the volatile set not being last. 
> Here are the cases I found so far:
>  * 
> [https://github.com/apache/pdfbox/blob/e409f25009702be8889ce586b8f6dd7274201f0a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/color/PDDeviceCMYK.java#L60]
>  - {{volatile}} set isn't the last statement in the block.
>  * 
> [https://github.com/apache/pdfbox/blob/e409f25009702be8889ce586b8f6dd7274201f0a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/color/PDDeviceRGB.java#L48]
>  - {{volatile}} set isn't the last statement in the block.
>  * 
> [https://github.com/apache/pdfbox/blob/e409f25009702be8889ce586b8f6dd7274201f0a/fontbox/src/main/java/org/apache/fontbox/ttf/TrueTypeFont.java#L162-L167]
>  - {{initialized}} isn't {{volatile}}
>  * 
> [https://github.com/apache/pdfbox/blob/947966ea830ff91c61a6740ca1787eb795b5ca95/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/encoding/Encoding.java#L132-L149]
>  - {{names}} isn't {{volatile}} and the second check is missing (which might 
> be harmless)
>  * 
> [https://github.com/apache/pdfbox/blob/6c8526bab8b7ca399721e067d065c1f272f97644/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/Standard14Fonts.java#L172-L190]
>  - {{fonts}} isn't even locked and it's a vanilla hashmap that can be 
> modified by other threads.
>  * 
> [https://github.com/apache/pdfbox/blob/7984a52ad4d475886568593614ce566a88bf6bdd/pdfbox/src/main/java/org/apache/pdfbox/cos/COSName.java#L632-L637]
>  - (tricky to see, but the constructor adds itself to the map) - the map 
> isn't locked before the blind {{put}}, so unclear which invocation "wins" 
> (not sure if this is an issue, logic wise).
>  * 
> [https://github.com/apache/pdfbox/blob/61d6a53eacdee6a40d352509105e1c8d51cfb5dc/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FontMapperImpl.java#L415-L419]
>  - {{fontProvider}} isn't volatile, though is used as a double checked lock 
> marker. (Not sure if this an issue, concurrency wise).
> Fixing these one-by-one is possible and what I started doing around 
> [https://github.com/apache/pdfbox/pull/90] - but would it make sense to make 
> a class that does this properly? Guava has {{Suppliers.memoize}} which does 
> the right thing - it's trivial to make an equivalent without the dep in 
> pdfbox as well if necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2602) Enhance command line tools

2020-12-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255213#comment-17255213
 ] 

ASF subversion and git services commented on PDFBOX-2602:
-

Commit 1884840 from Maruan Sahyoun in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1884840 ]

PDFBOX-2602: add footer; custom synopsis

> Enhance command line tools
> --
>
> Key: PDFBOX-2602
> URL: https://issues.apache.org/jira/browse/PDFBOX-2602
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.8.8, 2.0.0
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Minor
> Fix For: 3.0.0 PDFBox
>
>
> The command line tools shall be enhanced to have the same behavior across all 
> tools.
> From the discussion on the dev mailing list
> - add an -h option to print the usage
> - print the usage to System.err and use an exit code of 1 if there was an 
> invalid command line parameter
> - print messages on exceptions to System.err
> - rethrow the exception so java can handle it if it will terminate afterwards 
> anyway
> - use an exit code of 1if rethrowing doesn't make sense



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org