PDFBox 2.0.29 release?

2023-05-23 Thread Andreas Lehmkuehler

Hi,

I tend to release 2.0.29 soon due to the regression which was solved with 
PDFBOX-5606.


WDYT?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[ANNOUNCE] Apache PDFBox 2.0.28 released

2023-04-13 Thread Andreas Lehmkuehler

The Apache PDFBox community is pleased to announce the release of
Apache PDFBox version 2.0.28. The release is available for download at:

https://pdfbox.apache.org/download.html

See the full release notes below for details about this release.

Release Notes -- Apache PDFBox -- Version 2.0.28

Introduction


The Apache PDFBox library is an open source Java tool for working with PDF 
documents.


This is an incremental bugfix release based on the earlier 2.0.27 release. It 
contains a couple of fixes and small improvements.


For more details on these changes and all the other fixes and improvements
included in this release, please refer to the following issues on the
PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX.

Bug

[PDFBOX-4531] - Extraction of Arabic PDF has incorrect ordering of normalized 
ligatures

[PDFBOX-5178] - Parsing differences between 2.0.23 and 2.0.24/3.0
[PDFBOX-5521] - Signing tries to set byteRange of old signature
[PDFBOX-5523] - Bug in 
org/apache/pdfbox/multipdf/Overlay#overlay(specificPageOverlayFile)

[PDFBOX-5524] - Inactive OCGs shown when not top level
[PDFBOX-5525] - Null pointer exception in PDFASchemaType.getNamespaceURI()
[PDFBOX-5540] - export:text creates jibberish / malformed output
[PDFBOX-5552] - ArrayIndexOutOfBounds in SampledImageReader.fromAny()
[PDFBOX-5553] - PDFRenderer resulting image has black background
[PDFBOX-] - NPE due to a malformed rectangle
[PDFBOX-5557] - Fix meta markup in HTML generation
[PDFBOX-5562] - ArrayIndexOutOfBoundsException in CFFCIDFont class
[PDFBOX-5563] - Can't open PDF with PDFBox: java.awt.color.CMMException: LCMS 
error 13: Couldn't link the profiles

[PDFBOX-5566] - ClassCastException in ShadingFill.process()
[PDFBOX-5567] - Font gets smaller for each rendered page
[PDFBOX-5572] - fix some logging inconsistencies
[PDFBOX-5577] - NPE in PDFMergerUtility.acroFormLegacyMode()
[PDFBOX-5582] - Avoid OOME when parsing an malformed pdf with a corrupted object 
stream


Improvement

[PDFBOX-5526] - Apply subsampling and region to masks
[PDFBOX-5534] - Remove finalize from ScratchFileBuffer
[PDFBOX-5549] - Invisible signature field is not referenced from /Annots 
dictionary of a Page

[PDFBOX-5554] - Support charset parameter in TextToPDF
[PDFBOX-5560] - Add a method to get the components of a composite glyph
[PDFBOX-5564] - PDResource font cache improvement
[PDFBOX-5565] - RFE: Comb flag warning
[PDFBOX-5573] - fix unnecessary boxing/unboxing
[PDFBOX-5575] - optimize LZWFilter
[PDFBOX-5581] - renderer.setSubsamplingAllowed(true) causing the picture to blur

Task

[PDFBOX-5535] - Remove Travis build

Release Contents


This release consists of a single source archive packaged as a zip file.
The archive can be unpacked with the jar tool from your JDK installation.
See the README.txt file for instructions on how to build this release.

The source archive is accompanied by a SHA512 checksum and a PGP signature
that you can use to verify the authenticity of your download.
The public key used for the PGP signature can be found at
https://www.apache.org/dist/pdfbox/KEYS.

About Apache PDFBox
---

Apache PDFBox is an open source Java library for working with PDF documents.
This project allows creation of new PDF documents, manipulation of existing
documents and the ability to extract content from documents. Apache PDFBox
also includes several command line utilities. Apache PDFBox is published
under the Apache License, Version 2.0.

For more information, visit https://pdfbox.apache.org/

About The Apache Software Foundation


Established in 1999, The Apache Software Foundation provides organizational,
legal, and financial support for more than 100 freely-available,
collaboratively-developed Open Source projects. The pragmatic Apache License
enables individual and commercial users to easily deploy Apache software;
the Foundation's intellectual property framework limits the legal exposure
of its 2,500+ contributors.

For more information, visit https://www.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[RESULT][VOTE] Release Apache PDFBox 2.0.28

2023-04-13 Thread Andreas Lehmkuehler

Am 10.04.23 um 12:15 schrieb Andreas Lehmkuehler:

Please vote on releasing this package as Apache PDFBox 2.0.28.


   +1 Tilman Hausherr
   +1 Maruan Sahyoun
   +1 Tim Allison
   +1 Andreas Lehmkühler

Thanks for your support and help!! I'm going to push the release out.

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Apache PDFBox Board Report April 2023 due

2023-04-11 Thread Andreas Lehmkuehler

Hi,

thanks for your reviews, I've submitted the report as proposed

Andreas

Am 10.04.23 um 17:30 schrieb Andreas Lehmkuehler:

Hi,

find attached a quick draft of the board report we're expected to submit this
month. It's based upon the report wizard template which can be found at [1]

Any comments or additions are appreciated ...


## Description:
The mission of PDFBox is the creation and maintenance of software related to
Java library for working with PDF documents

## Issues:
There are no issues requiring board attention at this time.

## Membership Data:
Apache PDFBox was founded 2009-10-21 (13 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

     2.0.27 was released on 2022-09-29.
     1.8.17 was released on 2022-09-15.
     2.0.26 was released on 2022-04-21.

## Community Health:
- there is a steady stream of contributions, bug reports and questions on the
   mailing lists
- there are a lot of refactorings, improvements and bugfixes
- we are still planning to cut the first beta release of our next major
   version 3.0.0
- we've started a vote for the 2.0.28 release
- the new release consists of bug fixes and small improvements. One of the
   more significant changes is the improved support for arabic pdfs
- we received three reports through security@a.o. All of them are well known
   and didn't qualify for a CVE due to a low severity

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Apache PDFBox Board Report April 2023 due

2023-04-10 Thread Andreas Lehmkuehler

Hi,

find attached a quick draft of the board report we're expected to submit this
month. It's based upon the report wizard template which can be found at [1]

Any comments or additions are appreciated ...


## Description:
The mission of PDFBox is the creation and maintenance of software related to
Java library for working with PDF documents

## Issues:
There are no issues requiring board attention at this time.

## Membership Data:
Apache PDFBox was founded 2009-10-21 (13 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

2.0.27 was released on 2022-09-29.
1.8.17 was released on 2022-09-15.
2.0.26 was released on 2022-04-21.

## Community Health:
- there is a steady stream of contributions, bug reports and questions on the
  mailing lists
- there are a lot of refactorings, improvements and bugfixes
- we are still planning to cut the first beta release of our next major
  version 3.0.0
- we've started a vote for the 2.0.28 release
- the new release consists of bug fixes and small improvements. One of the
  more significant changes is the improved support for arabic pdfs
- we received three reports through security@a.o. All of them are well known
  and didn't qualify for a CVE due to a low severity

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Fwd: 2.0.28 release?

2023-04-10 Thread Andreas Lehmkuehler

Am 10.04.23 um 16:44 schrieb Tim Allison:

https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-rc1.tgz

Thanks Tim!

I had a first look. No new exceptions compared to the last test run but more 
than 60 fixed exceptions. Looks like the last fixes did address some other 
issues as well.


I seems as if the test run doesn't reveals any reason to stop the release.

WDYT?

Andreas



On Mon, Apr 10, 2023 at 7:02 AM Andreas Lehmkuehler  wrote:


Sounds like one of those expected issues. I guess PDFBox now swallows the former
exception and is able to process the pdf in question. At least the exception is
gone, maybe there is some more content or just an empty page.

However, IMHO that isn't a regression, but an (small) improvement.

@Tim Thanks for running the tests

Andreas


Am 10.04.23 um 12:54 schrieb Tim Allison:

We're getting a build failure on this test now.  I'll turn it off for
the build and run the full process.

https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java#L1031

org.opentest4j.AssertionFailedError: Should have thrown exception ==>
expected:  but was: 
  at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
  at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
  at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
  at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
  at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:211)
  at 
org.apache.tika.parser.pdf.PDFParserTest.testSkipBadPage(PDFParserTest.java:1044)

On Mon, Apr 10, 2023 at 6:41 AM Tim Allison  wrote:


Y. Will start process now. Thank you!

On Mon, Apr 10, 2023 at 6:20 AM Andreas Lehmkuehler  wrote:


Hi,

I've finished the release process and provided a releases candidate for 2.0.28

@Tim Is there any chance to re-run the tests in the next 3 days, so that we
could stop the release if there is any major regression.

I don't expect any new issue as the last changes should produces less exceptions
than before but you knows 

Thanks in advance

Andreas


Am 10.04.23 um 11:42 schrieb Andreas Lehmkuehler:


Am 10.04.23 um 04:32 schrieb Tilman Hausherr:

On 09.04.2023 22:36, Andreas Lehmkuehler wrote:

OK, so there is one more question left: do we need to re-run the tests before
starting the release process?


Yes I prefer to have another comparison, but it can be done in parallel.

Good idea, I'm going to cut the release ...

Andreas



Tilman





Andreas

Am 09.04.23 um 20:56 schrieb Tilman Hausherr:

On 09.04.2023 17:35, Andreas Lehmkuehler wrote:

Hi,

I've fixed the issue with 2 of the 3 pdfs.

GHOSTSCRIPT-702891-0.pdf is left as the only problematic pdf. I didn't
found a solution which fixes the regressions and still fixes the origin
issue from PDFBOX-5178. The parser from the trunk is able to handle that
pdf well.

IMHO we should leave it alone, as it is malformed anmd doesn't contain any
useful content. More important, it is one pdf out of hundreds of
thoudsands, just a corner cases.

WDYT?


I agree!

Tilman




Andreas

Am 05.04.23 um 08:10 schrieb Andreas Lehmkuehler:

Am 04.04.23 um 07:40 schrieb Andreas Lehmkuehler:

Am 03.04.23 um 19:50 schrieb Tim Allison:

https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-20230403-reports.tgz

Haven't had a chance to take a look yet. :(

Thanks Tim!

There are still 5 new exceptions listed. All of them are related to the
very same change coming from PDFBOX-5178 which I've fixed the other day.
But these cases are different and the trunk is affected as well. My bad
to not have a deeper look in the first place.

I'm going to investigate those issues

All pdfs are more or less broken. Two of them are totally useless and the
new exception is just another one. The other three contain some more or
less readable content and we are hitting the well know dilemma: should we
stop reading once we hit something bad or should we try to read as much as
possible and maybe run into much bigger issues than before.

I guess these are all some special corner cases. I'm still thinking about
a solution to support both strategies.

Andreas



Andreas




On Mon, Apr 3, 2023 at 6:53 AM Tilman Hausherr 
wrote:


Don't wait please
Thanks
Tilman



--- Original-Nachricht ---
Von: Tim Allison
Betreff: Re: Fwd: 2.0.28 release?
Datum: 03. April 2023, 12:47
An: dev@pdfbox.apache.org




Y. I can kick that off now. Or should I wait?

On Sat, Apr 1, 2023 at 2:06 PM Andreas Lehmkuehler mailto:andr...@lehmi.de> > wrote:


@Tim <mailto:@Tim>
Is there any chance to re-run the tests?

Andreas

Am 01.04.23 um 17:08 schrieb Andreas Lehmkuehler:

Am 01.04.23 um 17:05 schrieb Andreas Lehmkuehler:


I've accidentally send this to Tim only :-|

 Weitergeleitete Nachricht 
Betr

Re: Fwd: 2.0.28 release?

2023-04-10 Thread Andreas Lehmkuehler
Sounds like one of those expected issues. I guess PDFBox now swallows the former 
exception and is able to process the pdf in question. At least the exception is 
gone, maybe there is some more content or just an empty page.


However, IMHO that isn't a regression, but an (small) improvement.

@Tim Thanks for running the tests

Andreas


Am 10.04.23 um 12:54 schrieb Tim Allison:

We're getting a build failure on this test now.  I'll turn it off for
the build and run the full process.

https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java#L1031

org.opentest4j.AssertionFailedError: Should have thrown exception ==>
expected:  but was: 
 at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
 at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
 at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
 at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
 at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:211)
 at 
org.apache.tika.parser.pdf.PDFParserTest.testSkipBadPage(PDFParserTest.java:1044)

On Mon, Apr 10, 2023 at 6:41 AM Tim Allison  wrote:


Y. Will start process now. Thank you!

On Mon, Apr 10, 2023 at 6:20 AM Andreas Lehmkuehler  wrote:


Hi,

I've finished the release process and provided a releases candidate for 2.0.28

@Tim Is there any chance to re-run the tests in the next 3 days, so that we
could stop the release if there is any major regression.

I don't expect any new issue as the last changes should produces less exceptions
than before but you knows 

Thanks in advance

Andreas


Am 10.04.23 um 11:42 schrieb Andreas Lehmkuehler:


Am 10.04.23 um 04:32 schrieb Tilman Hausherr:

On 09.04.2023 22:36, Andreas Lehmkuehler wrote:

OK, so there is one more question left: do we need to re-run the tests before
starting the release process?


Yes I prefer to have another comparison, but it can be done in parallel.

Good idea, I'm going to cut the release ...

Andreas



Tilman





Andreas

Am 09.04.23 um 20:56 schrieb Tilman Hausherr:

On 09.04.2023 17:35, Andreas Lehmkuehler wrote:

Hi,

I've fixed the issue with 2 of the 3 pdfs.

GHOSTSCRIPT-702891-0.pdf is left as the only problematic pdf. I didn't
found a solution which fixes the regressions and still fixes the origin
issue from PDFBOX-5178. The parser from the trunk is able to handle that
pdf well.

IMHO we should leave it alone, as it is malformed anmd doesn't contain any
useful content. More important, it is one pdf out of hundreds of
thoudsands, just a corner cases.

WDYT?


I agree!

Tilman




Andreas

Am 05.04.23 um 08:10 schrieb Andreas Lehmkuehler:

Am 04.04.23 um 07:40 schrieb Andreas Lehmkuehler:

Am 03.04.23 um 19:50 schrieb Tim Allison:

https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-20230403-reports.tgz

Haven't had a chance to take a look yet. :(

Thanks Tim!

There are still 5 new exceptions listed. All of them are related to the
very same change coming from PDFBOX-5178 which I've fixed the other day.
But these cases are different and the trunk is affected as well. My bad
to not have a deeper look in the first place.

I'm going to investigate those issues

All pdfs are more or less broken. Two of them are totally useless and the
new exception is just another one. The other three contain some more or
less readable content and we are hitting the well know dilemma: should we
stop reading once we hit something bad or should we try to read as much as
possible and maybe run into much bigger issues than before.

I guess these are all some special corner cases. I'm still thinking about
a solution to support both strategies.

Andreas



Andreas




On Mon, Apr 3, 2023 at 6:53 AM Tilman Hausherr 
wrote:


Don't wait please
Thanks
Tilman



--- Original-Nachricht ---
Von: Tim Allison
Betreff: Re: Fwd: 2.0.28 release?
Datum: 03. April 2023, 12:47
An: dev@pdfbox.apache.org




Y. I can kick that off now. Or should I wait?

On Sat, Apr 1, 2023 at 2:06 PM Andreas Lehmkuehler mailto:andr...@lehmi.de> > wrote:


@Tim <mailto:@Tim>
Is there any chance to re-run the tests?

Andreas

Am 01.04.23 um 17:08 schrieb Andreas Lehmkuehler:

Am 01.04.23 um 17:05 schrieb Andreas Lehmkuehler:


I've accidentally send this to Tim only :-|

 Weitergeleitete Nachricht 
Betreff: Re: 2.0.28 release?
Datum: Fri, 31 Mar 2023 07:50:10 +0200
Von: Andreas Lehmkuehler mailto:andr...@lehmi.de>



An: Tim Allison mailto:talli...@apache.org> >

Am 30.03.23 um 16:27 schrieb Tim Allison:

Reports are here:


<https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-SNAPSHOT.tgz>

Thanks Tim.

Looks like we have a regression. There is a handful of new exceptions.

Some of

them just replace another exception and it is un

Re: Fwd: 2.0.28 release?

2023-04-10 Thread Andreas Lehmkuehler

Hi,

I've finished the release process and provided a releases candidate for 2.0.28

@Tim Is there any chance to re-run the tests in the next 3 days, so that we 
could stop the release if there is any major regression.


I don't expect any new issue as the last changes should produces less exceptions 
than before but you knows 


Thanks in advance

Andreas


Am 10.04.23 um 11:42 schrieb Andreas Lehmkuehler:


Am 10.04.23 um 04:32 schrieb Tilman Hausherr:

On 09.04.2023 22:36, Andreas Lehmkuehler wrote:
OK, so there is one more question left: do we need to re-run the tests before 
starting the release process?


Yes I prefer to have another comparison, but it can be done in parallel.

Good idea, I'm going to cut the release ...

Andreas



Tilman





Andreas

Am 09.04.23 um 20:56 schrieb Tilman Hausherr:

On 09.04.2023 17:35, Andreas Lehmkuehler wrote:

Hi,

I've fixed the issue with 2 of the 3 pdfs.

GHOSTSCRIPT-702891-0.pdf is left as the only problematic pdf. I didn't 
found a solution which fixes the regressions and still fixes the origin 
issue from PDFBOX-5178. The parser from the trunk is able to handle that 
pdf well.


IMHO we should leave it alone, as it is malformed anmd doesn't contain any 
useful content. More important, it is one pdf out of hundreds of 
thoudsands, just a corner cases.


WDYT?


I agree!

Tilman




Andreas

Am 05.04.23 um 08:10 schrieb Andreas Lehmkuehler:

Am 04.04.23 um 07:40 schrieb Andreas Lehmkuehler:

Am 03.04.23 um 19:50 schrieb Tim Allison:

https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-20230403-reports.tgz

Haven't had a chance to take a look yet. :(

Thanks Tim!

There are still 5 new exceptions listed. All of them are related to the 
very same change coming from PDFBOX-5178 which I've fixed the other day. 
But these cases are different and the trunk is affected as well. My bad 
to not have a deeper look in the first place.


I'm going to investigate those issues
All pdfs are more or less broken. Two of them are totally useless and the 
new exception is just another one. The other three contain some more or 
less readable content and we are hitting the well know dilemma: should we 
stop reading once we hit something bad or should we try to read as much as 
possible and maybe run into much bigger issues than before.


I guess these are all some special corner cases. I'm still thinking about 
a solution to support both strategies.


Andreas



Andreas




On Mon, Apr 3, 2023 at 6:53 AM Tilman Hausherr  
wrote:


Don't wait please
Thanks
Tilman



--- Original-Nachricht ---
Von: Tim Allison
Betreff: Re: Fwd: 2.0.28 release?
Datum: 03. April 2023, 12:47
An: dev@pdfbox.apache.org




Y. I can kick that off now. Or should I wait?

On Sat, Apr 1, 2023 at 2:06 PM Andreas Lehmkuehler mailto:andr...@lehmi.de> > wrote:


@Tim <mailto:@Tim>
Is there any chance to re-run the tests?

Andreas

Am 01.04.23 um 17:08 schrieb Andreas Lehmkuehler:

Am 01.04.23 um 17:05 schrieb Andreas Lehmkuehler:


I've accidentally send this to Tim only :-|

 Weitergeleitete Nachricht 
Betreff: Re: 2.0.28 release?
Datum: Fri, 31 Mar 2023 07:50:10 +0200
Von: Andreas Lehmkuehler mailto:andr...@lehmi.de>



An: Tim Allison mailto:talli...@apache.org> >

Am 30.03.23 um 16:27 schrieb Tim Allison:

Reports are here:


<https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-SNAPSHOT.tgz>

Thanks Tim.

Looks like we have a regression. There is a handful of new exceptions.

Some of

them just replace another exception and it is unclear if the result is

better

or worse. But at least one of the pdfs works in 2.0.27 and doesn't in

2.0.28


bug_trackers/PDFBOX/PDFBOX-4424-1.pdf

I'll have a look

The regression was related to PDFBOX-5178. I've fixed it so that the

exceptions

should be gone.

Andreas




Andreas



On Tue, Mar 28, 2023 at 10:42 PM Tilman Hausherr <

thaush...@t-online.de <mailto:thaush...@t-online.de> > wrote:


Yes please!

Thanks

Tilman

On 28.03.2023 19:22, Tim Allison wrote:

+1

Should I run the regression tests now or is there anything else

text

related that is still being worked on?

On Tue, Mar 28, 2023 at 1:05 PM Tilman Hausherr <

thaush...@t-online.de <mailto:thaush...@t-online.de> > wrote:

+1

Tilman

On 28.03.2023 08:46, Andreas Lehmkuehler wrote:

Hi,

how about cutting a 2.0.28 release next week on Monday?

there is a bunch of solved tickets and the last release dates

back 6

months

Andreas





-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mai

[VOTE] Release Apache PDFBox 2.0.28

2023-04-10 Thread Andreas Lehmkuehler

Hi,

a candidate for the PDFBox 2.0.28 release is available at:

https://dist.apache.org/repos/dist/dev/pdfbox/2.0.28/

The release candidate is a zip archive of the sources in:

https://svn.apache.org/repos/asf/pdfbox/tags/2.0.28/

The SHA-512 checksum of the archive is 
cae8ee30903dae6ccf9821be2ec193498de5232f71fb0ad0f8ce1b53a2aa9c64cbd01ca7a81b6f9eef1da4aaf5146c2b54ed3ee36c5574527b751886fdbc351e.


Please vote on releasing this package as Apache PDFBox 2.0.28.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

[ ] +1 Release this package as Apache PDFBox 2.0.28
[ ] -1 Do not release this package because...

Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Fwd: 2.0.28 release?

2023-04-10 Thread Andreas Lehmkuehler



Am 10.04.23 um 04:32 schrieb Tilman Hausherr:

On 09.04.2023 22:36, Andreas Lehmkuehler wrote:
OK, so there is one more question left: do we need to re-run the tests before 
starting the release process?


Yes I prefer to have another comparison, but it can be done in parallel.

Good idea, I'm going to cut the release ...

Andreas



Tilman





Andreas

Am 09.04.23 um 20:56 schrieb Tilman Hausherr:

On 09.04.2023 17:35, Andreas Lehmkuehler wrote:

Hi,

I've fixed the issue with 2 of the 3 pdfs.

GHOSTSCRIPT-702891-0.pdf is left as the only problematic pdf. I didn't found 
a solution which fixes the regressions and still fixes the origin issue from 
PDFBOX-5178. The parser from the trunk is able to handle that pdf well.


IMHO we should leave it alone, as it is malformed anmd doesn't contain any 
useful content. More important, it is one pdf out of hundreds of thoudsands, 
just a corner cases.


WDYT?


I agree!

Tilman




Andreas

Am 05.04.23 um 08:10 schrieb Andreas Lehmkuehler:

Am 04.04.23 um 07:40 schrieb Andreas Lehmkuehler:

Am 03.04.23 um 19:50 schrieb Tim Allison:

https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-20230403-reports.tgz

Haven't had a chance to take a look yet. :(

Thanks Tim!

There are still 5 new exceptions listed. All of them are related to the 
very same change coming from PDFBOX-5178 which I've fixed the other day. 
But these cases are different and the trunk is affected as well. My bad to 
not have a deeper look in the first place.


I'm going to investigate those issues
All pdfs are more or less broken. Two of them are totally useless and the 
new exception is just another one. The other three contain some more or 
less readable content and we are hitting the well know dilemma: should we 
stop reading once we hit something bad or should we try to read as much as 
possible and maybe run into much bigger issues than before.


I guess these are all some special corner cases. I'm still thinking about a 
solution to support both strategies.


Andreas



Andreas




On Mon, Apr 3, 2023 at 6:53 AM Tilman Hausherr  
wrote:


Don't wait please
Thanks
Tilman



--- Original-Nachricht ---
Von: Tim Allison
Betreff: Re: Fwd: 2.0.28 release?
Datum: 03. April 2023, 12:47
An: dev@pdfbox.apache.org




Y. I can kick that off now. Or should I wait?

On Sat, Apr 1, 2023 at 2:06 PM Andreas Lehmkuehler mailto:andr...@lehmi.de> > wrote:


@Tim <mailto:@Tim>
Is there any chance to re-run the tests?

Andreas

Am 01.04.23 um 17:08 schrieb Andreas Lehmkuehler:

Am 01.04.23 um 17:05 schrieb Andreas Lehmkuehler:


I've accidentally send this to Tim only :-|

 Weitergeleitete Nachricht 
Betreff: Re: 2.0.28 release?
Datum: Fri, 31 Mar 2023 07:50:10 +0200
Von: Andreas Lehmkuehler mailto:andr...@lehmi.de>



An: Tim Allison mailto:talli...@apache.org> >

Am 30.03.23 um 16:27 schrieb Tim Allison:

Reports are here:


<https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-SNAPSHOT.tgz>

Thanks Tim.

Looks like we have a regression. There is a handful of new exceptions.

Some of

them just replace another exception and it is unclear if the result is

better

or worse. But at least one of the pdfs works in 2.0.27 and doesn't in

2.0.28


bug_trackers/PDFBOX/PDFBOX-4424-1.pdf

I'll have a look

The regression was related to PDFBOX-5178. I've fixed it so that the

exceptions

should be gone.

Andreas




Andreas



On Tue, Mar 28, 2023 at 10:42 PM Tilman Hausherr <

thaush...@t-online.de <mailto:thaush...@t-online.de> > wrote:


Yes please!

Thanks

Tilman

On 28.03.2023 19:22, Tim Allison wrote:

+1

Should I run the regression tests now or is there anything else

text

related that is still being worked on?

On Tue, Mar 28, 2023 at 1:05 PM Tilman Hausherr <

thaush...@t-online.de <mailto:thaush...@t-online.de> > wrote:

+1

Tilman

On 28.03.2023 08:46, Andreas Lehmkuehler wrote:

Hi,

how about cutting a 2.0.28 release next week on Monday?

there is a bunch of solved tickets and the last release dates

back 6

months

Andreas





-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbo

Re: Fwd: 2.0.28 release?

2023-04-09 Thread Andreas Lehmkuehler
OK, so there is one more question left: do we need to re-run the tests before 
starting the release process?


Andreas

Am 09.04.23 um 20:56 schrieb Tilman Hausherr:

On 09.04.2023 17:35, Andreas Lehmkuehler wrote:

Hi,

I've fixed the issue with 2 of the 3 pdfs.

GHOSTSCRIPT-702891-0.pdf is left as the only problematic pdf. I didn't found a 
solution which fixes the regressions and still fixes the origin issue from 
PDFBOX-5178. The parser from the trunk is able to handle that pdf well.


IMHO we should leave it alone, as it is malformed anmd doesn't contain any 
useful content. More important, it is one pdf out of hundreds of thoudsands, 
just a corner cases.


WDYT?


I agree!

Tilman




Andreas

Am 05.04.23 um 08:10 schrieb Andreas Lehmkuehler:

Am 04.04.23 um 07:40 schrieb Andreas Lehmkuehler:

Am 03.04.23 um 19:50 schrieb Tim Allison:

https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-20230403-reports.tgz

Haven't had a chance to take a look yet. :(

Thanks Tim!

There are still 5 new exceptions listed. All of them are related to the very 
same change coming from PDFBOX-5178 which I've fixed the other day. But 
these cases are different and the trunk is affected as well. My bad to not 
have a deeper look in the first place.


I'm going to investigate those issues
All pdfs are more or less broken. Two of them are totally useless and the new 
exception is just another one. The other three contain some more or less 
readable content and we are hitting the well know dilemma: should we stop 
reading once we hit something bad or should we try to read as much as 
possible and maybe run into much bigger issues than before.


I guess these are all some special corner cases. I'm still thinking about a 
solution to support both strategies.


Andreas



Andreas




On Mon, Apr 3, 2023 at 6:53 AM Tilman Hausherr  wrote:


Don't wait please
Thanks
Tilman



--- Original-Nachricht ---
Von: Tim Allison
Betreff: Re: Fwd: 2.0.28 release?
Datum: 03. April 2023, 12:47
An: dev@pdfbox.apache.org




Y. I can kick that off now. Or should I wait?

On Sat, Apr 1, 2023 at 2:06 PM Andreas Lehmkuehler mailto:andr...@lehmi.de> > wrote:


@Tim <mailto:@Tim>
Is there any chance to re-run the tests?

Andreas

Am 01.04.23 um 17:08 schrieb Andreas Lehmkuehler:

Am 01.04.23 um 17:05 schrieb Andreas Lehmkuehler:


I've accidentally send this to Tim only :-|

 Weitergeleitete Nachricht 
Betreff: Re: 2.0.28 release?
Datum: Fri, 31 Mar 2023 07:50:10 +0200
Von: Andreas Lehmkuehler mailto:andr...@lehmi.de>



An: Tim Allison mailto:talli...@apache.org> >

Am 30.03.23 um 16:27 schrieb Tim Allison:

Reports are here:


<https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-SNAPSHOT.tgz>

Thanks Tim.

Looks like we have a regression. There is a handful of new exceptions.

Some of

them just replace another exception and it is unclear if the result is

better

or worse. But at least one of the pdfs works in 2.0.27 and doesn't in

2.0.28


bug_trackers/PDFBOX/PDFBOX-4424-1.pdf

I'll have a look

The regression was related to PDFBOX-5178. I've fixed it so that the

exceptions

should be gone.

Andreas




Andreas



On Tue, Mar 28, 2023 at 10:42 PM Tilman Hausherr <

thaush...@t-online.de <mailto:thaush...@t-online.de> > wrote:


Yes please!

Thanks

Tilman

On 28.03.2023 19:22, Tim Allison wrote:

+1

Should I run the regression tests now or is there anything else

text

related that is still being worked on?

On Tue, Mar 28, 2023 at 1:05 PM Tilman Hausherr <

thaush...@t-online.de <mailto:thaush...@t-online.de> > wrote:

+1

Tilman

On 28.03.2023 08:46, Andreas Lehmkuehler wrote:

Hi,

how about cutting a 2.0.28 release next week on Monday?

there is a bunch of solved tickets and the last release dates

back 6

months

Andreas





-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Fwd: 2.0.28 release?

2023-04-09 Thread Andreas Lehmkuehler

Hi,

I've fixed the issue with 2 of the 3 pdfs.

GHOSTSCRIPT-702891-0.pdf is left as the only problematic pdf. I didn't found a 
solution which fixes the regressions and still fixes the origin issue from 
PDFBOX-5178. The parser from the trunk is able to handle that pdf well.


IMHO we should leave it alone, as it is malformed anmd doesn't contain any 
useful content. More important, it is one pdf out of hundreds of thoudsands, 
just a corner cases.


WDYT?

Andreas

Am 05.04.23 um 08:10 schrieb Andreas Lehmkuehler:

Am 04.04.23 um 07:40 schrieb Andreas Lehmkuehler:

Am 03.04.23 um 19:50 schrieb Tim Allison:

https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-20230403-reports.tgz

Haven't had a chance to take a look yet. :(

Thanks Tim!

There are still 5 new exceptions listed. All of them are related to the very 
same change coming from PDFBOX-5178 which I've fixed the other day. But these 
cases are different and the trunk is affected as well. My bad to not have a 
deeper look in the first place.


I'm going to investigate those issues
All pdfs are more or less broken. Two of them are totally useless and the new 
exception is just another one. The other three contain some more or less 
readable content and we are hitting the well know dilemma: should we stop 
reading once we hit something bad or should we try to read as much as possible 
and maybe run into much bigger issues than before.


I guess these are all some special corner cases. I'm still thinking about a 
solution to support both strategies.


Andreas



Andreas




On Mon, Apr 3, 2023 at 6:53 AM Tilman Hausherr  wrote:


Don't wait please
Thanks
Tilman



--- Original-Nachricht ---
Von: Tim Allison
Betreff: Re: Fwd: 2.0.28 release?
Datum: 03. April 2023, 12:47
An: dev@pdfbox.apache.org




Y. I can kick that off now. Or should I wait?

On Sat, Apr 1, 2023 at 2:06 PM Andreas Lehmkuehler mailto:andr...@lehmi.de> > wrote:


@Tim <mailto:@Tim>
Is there any chance to re-run the tests?

Andreas

Am 01.04.23 um 17:08 schrieb Andreas Lehmkuehler:

Am 01.04.23 um 17:05 schrieb Andreas Lehmkuehler:


I've accidentally send this to Tim only :-|

 Weitergeleitete Nachricht 
Betreff: Re: 2.0.28 release?
Datum: Fri, 31 Mar 2023 07:50:10 +0200
Von: Andreas Lehmkuehler mailto:andr...@lehmi.de>



An: Tim Allison mailto:talli...@apache.org> >

Am 30.03.23 um 16:27 schrieb Tim Allison:

Reports are here:


<https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-SNAPSHOT.tgz>

Thanks Tim.

Looks like we have a regression. There is a handful of new exceptions.

Some of

them just replace another exception and it is unclear if the result is

better

or worse. But at least one of the pdfs works in 2.0.27 and doesn't in

2.0.28


bug_trackers/PDFBOX/PDFBOX-4424-1.pdf

I'll have a look

The regression was related to PDFBOX-5178. I've fixed it so that the

exceptions

should be gone.

Andreas




Andreas



On Tue, Mar 28, 2023 at 10:42 PM Tilman Hausherr <

thaush...@t-online.de <mailto:thaush...@t-online.de> > wrote:


Yes please!

Thanks

Tilman

On 28.03.2023 19:22, Tim Allison wrote:

+1

Should I run the regression tests now or is there anything else

text

related that is still being worked on?

On Tue, Mar 28, 2023 at 1:05 PM Tilman Hausherr <

thaush...@t-online.de <mailto:thaush...@t-online.de> > wrote:

+1

Tilman

On 28.03.2023 08:46, Andreas Lehmkuehler wrote:

Hi,

how about cutting a 2.0.28 release next week on Monday?

there is a bunch of solved tickets and the last release dates

back 6

months

Andreas





-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Fwd: 2.0.28 release?

2023-04-05 Thread Andreas Lehmkuehler

Am 04.04.23 um 07:40 schrieb Andreas Lehmkuehler:

Am 03.04.23 um 19:50 schrieb Tim Allison:

https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-20230403-reports.tgz

Haven't had a chance to take a look yet. :(

Thanks Tim!

There are still 5 new exceptions listed. All of them are related to the very 
same change coming from PDFBOX-5178 which I've fixed the other day. But these 
cases are different and the trunk is affected as well. My bad to not have a 
deeper look in the first place.


I'm going to investigate those issues
All pdfs are more or less broken. Two of them are totally useless and the new 
exception is just another one. The other three contain some more or less 
readable content and we are hitting the well know dilemma: should we stop 
reading once we hit something bad or should we try to read as much as possible 
and maybe run into much bigger issues than before.


I guess these are all some special corner cases. I'm still thinking about a 
solution to support both strategies.


Andreas



Andreas




On Mon, Apr 3, 2023 at 6:53 AM Tilman Hausherr  wrote:


Don't wait please
Thanks
Tilman



--- Original-Nachricht ---
Von: Tim Allison
Betreff: Re: Fwd: 2.0.28 release?
Datum: 03. April 2023, 12:47
An: dev@pdfbox.apache.org




Y. I can kick that off now. Or should I wait?

On Sat, Apr 1, 2023 at 2:06 PM Andreas Lehmkuehler mailto:andr...@lehmi.de> > wrote:


@Tim <mailto:@Tim>
Is there any chance to re-run the tests?

Andreas

Am 01.04.23 um 17:08 schrieb Andreas Lehmkuehler:

Am 01.04.23 um 17:05 schrieb Andreas Lehmkuehler:


I've accidentally send this to Tim only :-|

 Weitergeleitete Nachricht 
Betreff: Re: 2.0.28 release?
Datum: Fri, 31 Mar 2023 07:50:10 +0200
Von: Andreas Lehmkuehler mailto:andr...@lehmi.de>



An: Tim Allison mailto:talli...@apache.org> >

Am 30.03.23 um 16:27 schrieb Tim Allison:

Reports are here:


<https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-SNAPSHOT.tgz>

Thanks Tim.

Looks like we have a regression. There is a handful of new exceptions.

Some of

them just replace another exception and it is unclear if the result is

better

or worse. But at least one of the pdfs works in 2.0.27 and doesn't in

2.0.28


bug_trackers/PDFBOX/PDFBOX-4424-1.pdf

I'll have a look

The regression was related to PDFBOX-5178. I've fixed it so that the

exceptions

should be gone.

Andreas




Andreas



On Tue, Mar 28, 2023 at 10:42 PM Tilman Hausherr <

thaush...@t-online.de <mailto:thaush...@t-online.de> > wrote:


Yes please!

Thanks

Tilman

On 28.03.2023 19:22, Tim Allison wrote:

+1

Should I run the regression tests now or is there anything else

text

related that is still being worked on?

On Tue, Mar 28, 2023 at 1:05 PM Tilman Hausherr <

thaush...@t-online.de <mailto:thaush...@t-online.de> > wrote:

+1

Tilman

On 28.03.2023 08:46, Andreas Lehmkuehler wrote:

Hi,

how about cutting a 2.0.28 release next week on Monday?

there is a bunch of solved tickets and the last release dates

back 6

months

Andreas





-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Fwd: 2.0.28 release?

2023-04-03 Thread Andreas Lehmkuehler

Am 03.04.23 um 19:50 schrieb Tim Allison:

https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-20230403-reports.tgz

Haven't had a chance to take a look yet. :(

Thanks Tim!

There are still 5 new exceptions listed. All of them are related to the very 
same change coming from PDFBOX-5178 which I've fixed the other day. But these 
cases are different and the trunk is affected as well. My bad to not have a 
deeper look in the first place.


I'm going to investigate those issues

Andreas




On Mon, Apr 3, 2023 at 6:53 AM Tilman Hausherr  wrote:


Don't wait please
Thanks
Tilman



--- Original-Nachricht ---
Von: Tim Allison
Betreff: Re: Fwd: 2.0.28 release?
Datum: 03. April 2023, 12:47
An: dev@pdfbox.apache.org




Y. I can kick that off now. Or should I wait?

On Sat, Apr 1, 2023 at 2:06 PM Andreas Lehmkuehler mailto:andr...@lehmi.de> > wrote:


@Tim <mailto:@Tim>
Is there any chance to re-run the tests?

Andreas

Am 01.04.23 um 17:08 schrieb Andreas Lehmkuehler:

Am 01.04.23 um 17:05 schrieb Andreas Lehmkuehler:


I've accidentally send this to Tim only :-|

 Weitergeleitete Nachricht 
Betreff: Re: 2.0.28 release?
Datum: Fri, 31 Mar 2023 07:50:10 +0200
Von: Andreas Lehmkuehler mailto:andr...@lehmi.de>



An: Tim Allison mailto:talli...@apache.org> >

Am 30.03.23 um 16:27 schrieb Tim Allison:

Reports are here:


<https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-SNAPSHOT.tgz>

Thanks Tim.

Looks like we have a regression. There is a handful of new exceptions.

Some of

them just replace another exception and it is unclear if the result is

better

or worse. But at least one of the pdfs works in 2.0.27 and doesn't in

2.0.28


bug_trackers/PDFBOX/PDFBOX-4424-1.pdf

I'll have a look

The regression was related to PDFBOX-5178. I've fixed it so that the

exceptions

should be gone.

Andreas




Andreas



On Tue, Mar 28, 2023 at 10:42 PM Tilman Hausherr <

thaush...@t-online.de <mailto:thaush...@t-online.de> > wrote:


Yes please!

Thanks

Tilman

On 28.03.2023 19:22, Tim Allison wrote:

+1

Should I run the regression tests now or is there anything else

text

related that is still being worked on?

On Tue, Mar 28, 2023 at 1:05 PM Tilman Hausherr <

thaush...@t-online.de <mailto:thaush...@t-online.de> > wrote:

+1

Tilman

On 28.03.2023 08:46, Andreas Lehmkuehler wrote:

Hi,

how about cutting a 2.0.28 release next week on Monday?

there is a bunch of solved tickets and the last release dates

back 6

months

Andreas





-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Fwd: 2.0.28 release?

2023-04-01 Thread Andreas Lehmkuehler

@Tim
Is there any chance to re-run the tests?

Andreas

Am 01.04.23 um 17:08 schrieb Andreas Lehmkuehler:

Am 01.04.23 um 17:05 schrieb Andreas Lehmkuehler:


I've accidentally send this to Tim only :-|

 Weitergeleitete Nachricht 
Betreff: Re: 2.0.28 release?
Datum: Fri, 31 Mar 2023 07:50:10 +0200
Von: Andreas Lehmkuehler 
An: Tim Allison 

Am 30.03.23 um 16:27 schrieb Tim Allison:

Reports are here:
https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-SNAPSHOT.tgz

Thanks Tim.

Looks like we have a regression. There is a handful of new exceptions. Some of 
them just replace another exception and it is unclear if the result is better 
or worse. But at least one of the pdfs works in 2.0.27 and doesn't in 2.0.28


bug_trackers/PDFBOX/PDFBOX-4424-1.pdf

I'll have a look
The regression was related to PDFBOX-5178. I've fixed it so that the exceptions 
should be gone.


Andreas




Andreas



On Tue, Mar 28, 2023 at 10:42 PM Tilman Hausherr  wrote:


Yes please!

Thanks

Tilman

On 28.03.2023 19:22, Tim Allison wrote:

+1

Should I run the regression tests now or is there anything else text
related that is still being worked on?

On Tue, Mar 28, 2023 at 1:05 PM Tilman Hausherr  wrote:

+1

Tilman

On 28.03.2023 08:46, Andreas Lehmkuehler wrote:

Hi,

how about cutting a 2.0.28 release next week on Monday?

there is a bunch of solved tickets and the last release dates back 6
months

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Fwd: 2.0.28 release?

2023-04-01 Thread Andreas Lehmkuehler

Am 01.04.23 um 17:05 schrieb Andreas Lehmkuehler:


I've accidentally send this to Tim only :-|

 Weitergeleitete Nachricht 
Betreff: Re: 2.0.28 release?
Datum: Fri, 31 Mar 2023 07:50:10 +0200
Von: Andreas Lehmkuehler 
An: Tim Allison 

Am 30.03.23 um 16:27 schrieb Tim Allison:

Reports are here:
https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-SNAPSHOT.tgz

Thanks Tim.

Looks like we have a regression. There is a handful of new exceptions. Some of 
them just replace another exception and it is unclear if the result is better or 
worse. But at least one of the pdfs works in 2.0.27 and doesn't in 2.0.28


bug_trackers/PDFBOX/PDFBOX-4424-1.pdf

I'll have a look
The regression was related to PDFBOX-5178. I've fixed it so that the exceptions 
should be gone.


Andreas




Andreas



On Tue, Mar 28, 2023 at 10:42 PM Tilman Hausherr  wrote:


Yes please!

Thanks

Tilman

On 28.03.2023 19:22, Tim Allison wrote:

+1

Should I run the regression tests now or is there anything else text
related that is still being worked on?

On Tue, Mar 28, 2023 at 1:05 PM Tilman Hausherr  wrote:

+1

Tilman

On 28.03.2023 08:46, Andreas Lehmkuehler wrote:

Hi,

how about cutting a 2.0.28 release next week on Monday?

there is a bunch of solved tickets and the last release dates back 6
months

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Fwd: 2.0.28 release?

2023-04-01 Thread Andreas Lehmkuehler



I've accidentally send this to Tim only :-|

 Weitergeleitete Nachricht 
Betreff: Re: 2.0.28 release?
Datum: Fri, 31 Mar 2023 07:50:10 +0200
Von: Andreas Lehmkuehler 
An: Tim Allison 

Am 30.03.23 um 16:27 schrieb Tim Allison:

Reports are here:
https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-SNAPSHOT.tgz

Thanks Tim.

Looks like we have a regression. There is a handful of new exceptions. Some of 
them just replace another exception and it is unclear if the result is better or 
worse. But at least one of the pdfs works in 2.0.27 and doesn't in 2.0.28


bug_trackers/PDFBOX/PDFBOX-4424-1.pdf

I'll have a look

Andreas



On Tue, Mar 28, 2023 at 10:42 PM Tilman Hausherr  wrote:


Yes please!

Thanks

Tilman

On 28.03.2023 19:22, Tim Allison wrote:

+1

Should I run the regression tests now or is there anything else text
related that is still being worked on?

On Tue, Mar 28, 2023 at 1:05 PM Tilman Hausherr  wrote:

+1

Tilman

On 28.03.2023 08:46, Andreas Lehmkuehler wrote:

Hi,

how about cutting a 2.0.28 release next week on Monday?

there is a bunch of solved tickets and the last release dates back 6
months

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: 2.0.28 release?

2023-03-28 Thread Andreas Lehmkuehler

Am 28.03.23 um 19:22 schrieb Tim Allison:

+1

Should I run the regression tests now or is there anything else text
related that is still being worked on?
I don't have any text related TODO on my list, please run the tests if nobody 
else objects.


Andreas


On Tue, Mar 28, 2023 at 1:05 PM Tilman Hausherr  wrote:


+1

Tilman

On 28.03.2023 08:46, Andreas Lehmkuehler wrote:

Hi,

how about cutting a 2.0.28 release next week on Monday?

there is a bunch of solved tickets and the last release dates back 6
months

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



2.0.28 release?

2023-03-28 Thread Andreas Lehmkuehler

Hi,

how about cutting a 2.0.28 release next week on Monday?

there is a bunch of solved tickets and the last release dates back 6 months

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Minimum Java version for PDFBox 3.x

2023-03-19 Thread Andreas Lehmkuehler
I've created https://issues.apache.org/jira/browse/PDFBOX-5576 to not forget 
about this ;-)


Andreas

Am 18.03.23 um 18:07 schrieb Maruan Sahyoun:

Fine - so let‘s target that for 4x


Am 18.03.2023 um 16:51 schrieb Andreas Lehmkuehler :

Am 18.03.23 um 10:49 schrieb Maruan Sahyoun:

I‘d second a move to 11 for 3.x as for the lifetime of 3.x this will enable us 
to use newer funtions without another major release.

I'd like to do so for the next major version 4.0.x. Hopefully it won't take us 
that much time to release that version as it took us to release 3.0.x.

BTW 3.0.x will be the last version supporting preflight and maybe it is a good 
idea to stuck with java 8 compatibility.

Andreas

BR
Maruan

Am 18.03.2023 um 10:13 schrieb Tilman Hausherr :


You may have a point with some of your arguments, but not this one:

Public updates for Java 8 have stopped in march 2022, now one year ago


My latest jdk8 is from January 17th of this year. (Amazon Corretto)

About the difficulty to find contributors - this has always been difficult. That's 
because PDF isn't "sexy" at all.

Coincidentally, our CI builds are now failing because of Jenkins / Hudson has 
moved to jdk11.

Tilman



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Minimum Java version for PDFBox 3.x

2023-03-18 Thread Andreas Lehmkuehler

Am 18.03.23 um 10:13 schrieb Tilman Hausherr:

You may have a point with some of your arguments, but not this one:

Public updates for Java 8 have stopped in march 2022, now one year ago


My latest jdk8 is from January 17th of this year. (Amazon Corretto)

About the difficulty to find contributors - this has always been difficult. 
That's because PDF isn't "sexy" at all.


Coincidentally, our CI builds are now failing because of Jenkins / Hudson has 
moved to jdk11.

I've switched the builds (trunk and 2.0) to Java11.

Andreas



Tilman



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Minimum Java version for PDFBox 3.x

2023-03-18 Thread Andreas Lehmkuehler

Am 18.03.23 um 10:49 schrieb Maruan Sahyoun:

I‘d second a move to 11 for 3.x as for the lifetime of 3.x this will enable us 
to use newer funtions without another major release.
I'd like to do so for the next major version 4.0.x. Hopefully it won't take us 
that much time to release that version as it took us to release 3.0.x.


BTW 3.0.x will be the last version supporting preflight and maybe it is a good 
idea to stuck with java 8 compatibility.


Andreas


BR
Maruan


Am 18.03.2023 um 10:13 schrieb Tilman Hausherr :

You may have a point with some of your arguments, but not this one:

Public updates for Java 8 have stopped in march 2022, now one year ago


My latest jdk8 is from January 17th of this year. (Amazon Corretto)

About the difficulty to find contributors - this has always been difficult. That's 
because PDF isn't "sexy" at all.

Coincidentally, our CI builds are now failing because of Jenkins / Hudson has 
moved to jdk11.

Tilman



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Minimum Java version for PDFBox 3.x

2023-03-18 Thread Andreas Lehmkuehler

Am 17.03.23 um 23:40 schrieb axh:

Hi Andreas,

you are right about the typo, it’s Tomcat 9.1 that drops Java 8 support and 
requires at least Java 11.

What I tried to point out with that list is that IMHO there’s no benefit in 
maintaining compatibility with an outdated Java version.



This are all valid points, but


I think reasons to update are:
  - code can be shorter and more concise in many places
We have a lot more complicated issues than code which could be written shorter 
using a more recent java version



  - it will be easier to find contributors

Interesting, but I doubt that. The biggest issue is the complexity of the PDF 
format


  - functionality available in newer versions of the JDK does not have to be 
reproduced and later maintained

Do you have a specific example in your mind?


  - functionality provided by the JDK implementations will in most cases be 
better tested and often more performant
Sorry, but IMHO that is just a theory. We have seen a lot of different 
behaviours. Better/worst performance, issue intorduce due to an improvement ...



  - functionality deprecated after Java 8 cannot be used because it would 
impact compatibility with newer Java versions while new functionality cannot be 
used because it impacts compatibility with Java 8
We already removed most of that stuff, otherwise we won't be able to run PDFBox 
using Java 19 or 20



Public updates for Java 8 have stopped in march 2022, now one year ago, and 
keeping PDFBox compatible with that version does not come at no cost (see the 
points above), so what’s the point of still supporting it?

There are still some java 8 based JDK/JREs which are updated on a regularly 
basis.

We won't solve any real issue with that version bump but would cut off a lot of 
people from the long awaited 3.0 version. PDFBox as a "under the hood lib" and 
there are a lot of people using it who don't run a most recent java version 
environment


Andreas


And yes, there are more things to brush up. For example Apache commons logging 
has not seen an update in 9 years, is missing functionality, and IMHO should be 
replaced by SLF4J 2 or log4j. But that’s another point (and yes, I’d volunteer 
to do the transition provided there’s a chance to get it in).

Cheers,
Axel



Am 17.03.2023 um 20:06 schrieb Andreas Lehmkuehler :

Am 17.03.23 um 10:09 schrieb axh:

Hi,
I am developing a software that relies heavily on Apache PdfBox. It uses the 
current the current PDFBox 3.0.0 from trunk, with some patches.
I wanted to know what your thoughts are about raising the minimum Java version for 
PDFBox 3.x to Java 11. I know some might say "we are still on JDK 8 and will be 
for the foreseeable future“. But then you could probably stay on the 2.x line of 
PDFBox, since you won’t be able to update most of your technology stack to recent 
versions anyway:
  - Spring BOOT 3 requires Java 17
  - WildFly 27 dropped support for Java 8, and while I don’t have access to the 
Redhat docs, I think so will JBOSS 8
  - Hibernate 6 requires at least Java 11
  - Apache Tomcat 6 requires at least Java 11, 6.1 even requires 17

Has to be a typo, Tomcat 9 or 10 are the recent to be used. ;-)


  - Apache Lucene 9.5 requires Java 11
IMHO, the next major version a.k.a. PDFBox 3.0.0 would be the right moment to 
increase the required Java version.

Yes, but I don't see any reason to do so. The trunk version works fine with 
java 19 and 20. As long as we don't really need any java9 or later feature I 
tend to stick with java8. We don't have to switch just because others are doing 
so ;-)

Andreas


- PDFBox uses classes and methods that are deprecated in newer Java versions.
- IMHO it will also be harder to get contributors.
- Some things have to be done in a cumbersome and less performant way to 
maintain Java 8 compatibility because functionality introduced in newer JDKs 
cannot be used to keep Java 8 compatibility:
 Java 8 (current implementation):
 public static byte[] toByteArray(InputStream in) throws IOException
 {
 ByteArrayOutputStream baout = new ByteArrayOutputStream();
 copy(in, baout);
 return baout.toByteArray();
 }
 public static long copy(InputStream input, OutputStream output) throws 
IOException
 {
 byte[] buffer = new byte[4096];
 long count = 0;
 int n = 0;
 while (-1 != (n = input.read(buffer)))
 {
 output.write(buffer, 0, n);
 count += n;
 }
 return count;
 }
 Java 11:
 public static byte[] toByteArray(InputStream in) throws IOException {
 return in.readAllBytes();
 }
 public static long copy(InputStream input, OutputStream output) throws 
IOException {
 return input.transferTo(output);
 }
I would like to contribute to PDFBox, but would really suggest to bump the 
required Java version to Java 11. I personally would be OK with going directly 
to 17 like Spring frame

Re: Minimum Java version for PDFBox 3.x

2023-03-17 Thread Andreas Lehmkuehler

Am 17.03.23 um 10:09 schrieb axh:

Hi,

I am developing a software that relies heavily on Apache PdfBox. It uses the 
current the current PDFBox 3.0.0 from trunk, with some patches.

I wanted to know what your thoughts are about raising the minimum Java version for 
PDFBox 3.x to Java 11. I know some might say "we are still on JDK 8 and will be 
for the foreseeable future“. But then you could probably stay on the 2.x line of 
PDFBox, since you won’t be able to update most of your technology stack to recent 
versions anyway:

  - Spring BOOT 3 requires Java 17
  - WildFly 27 dropped support for Java 8, and while I don’t have access to the 
Redhat docs, I think so will JBOSS 8
  - Hibernate 6 requires at least Java 11
  - Apache Tomcat 6 requires at least Java 11, 6.1 even requires 17

Has to be a typo, Tomcat 9 or 10 are the recent to be used. ;-)


  - Apache Lucene 9.5 requires Java 11

IMHO, the next major version a.k.a. PDFBox 3.0.0 would be the right moment to 
increase the required Java version.
Yes, but I don't see any reason to do so. The trunk version works fine with java 
19 and 20. As long as we don't really need any java9 or later feature I tend to 
stick with java8. We don't have to switch just because others are doing so ;-)


Andreas



- PDFBox uses classes and methods that are deprecated in newer Java versions.
- IMHO it will also be harder to get contributors.
- Some things have to be done in a cumbersome and less performant way to 
maintain Java 8 compatibility because functionality introduced in newer JDKs 
cannot be used to keep Java 8 compatibility:
 Java 8 (current implementation):
 public static byte[] toByteArray(InputStream in) throws IOException
 {
 ByteArrayOutputStream baout = new ByteArrayOutputStream();
 copy(in, baout);
 return baout.toByteArray();
 }
 public static long copy(InputStream input, OutputStream output) throws 
IOException
 {
 byte[] buffer = new byte[4096];
 long count = 0;
 int n = 0;
 while (-1 != (n = input.read(buffer)))
 {
 output.write(buffer, 0, n);
 count += n;
 }
 return count;
 }
 Java 11:
 public static byte[] toByteArray(InputStream in) throws IOException {
 return in.readAllBytes();
 }
 public static long copy(InputStream input, OutputStream output) throws 
IOException {
 return input.transferTo(output);
 }

I would like to contribute to PDFBox, but would really suggest to bump the 
required Java version to Java 11. I personally would be OK with going directly 
to 17 like Spring framework did, but I can see that Java 11 compatibility might 
be a serious issue for some.

What are your thoughts on this matter?

Axel





-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBOx 3.0.0-beta1 release

2023-01-18 Thread Andreas Lehmkuehler

Hi,

I'm not sure that I understand your proposal. Even if we provide a Loader class 
in 2.0 it won't be compatible on a binary level as there are many other changes. 
The idea to use PDFBox 3.0.0 in environments where other dependencies are 
compiled against 2.0.x won't work. Or do I miss something?


Andreas

Am 11.01.23 um 12:43 schrieb Emmeran Seehuber:

Hi,

I’ve one point which is not per se related to the 3.0 beta release, but which 
should be considered:

The 3.0 release changed some APIs in a source and binary incompatible way. E.g 
instead of using the PDDocument.load() method you now use the Loader class etc.

It would be extremely helpful if PDFBox 2.0 could provide a kind of „Shim“ 
which provides the 3.0 API. E.g. implement a Loader class in PDFBox 2.0 - which 
internally just uses PDDocument.load(). At the same time you could depreciate 
the load() method in the PDDocument and document that the Loader should be used.

This would allow to „migrate“ projects using PDFBox 2.0 to the 3.0 API while 
staying on PDFBox 2.0. As PDFBox 3.0 has the same namespace as PDFBox 2.0 I 
think this is needed to make the migration path as smooth as possible.

Rational: More and more projects use PDFBox directly or indirectly (e.g. by 
using my PDFBox-Graphics2D wrapper - Apache POI for example). When you have a 
complex application you might have many dependencies which will pull in PDFBox 
2.0. If they can work with PDFBox 3.0 because they migrated the API usage to 
the new API using the shim in PDFBox 2.0 you can upgrade to PDFBox 3.0 in your 
project. I.e. the lib targets PDFBox 2.0 but would work with 3.0.

Otherwise you can only consider a upgrade to PDFBox 3.0 in your application 
after the last dependency has migrated to PDFBox 3.0. Which might take a long 
time…

I.e. at the moment as PDFBox 3.0 has the same namespace as PDFBox 2.0 just 
upgrading to PDFBox 3.0 will break many things in dependencies. So you have to 
stay on 2.0 as long as not all dependencies have migrated...

It would be very nice if we could avoid a version mess here.

Greetings

Emmeran


Am 11.01.2023 um 08:24 schrieb Andreas Lehmkuehler :

Hi,

I'm planning to cut our first beta release of 3.0.0. Be aware that the api is 
supposed to be stable after the release.

Are there any objections? Are there any tickets which should be solved before?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Mit freundlichen Grüßen aus Augsburg

Emmeran Seehuber
Dipl. Inf. (FH)
Schrannenstraße 8
86150 Augsburg
USt-IdNr.: DE266070804





-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Apache PDFBox Board Report January 2023 due

2023-01-12 Thread Andreas Lehmkuehler

Hi,

thanks for the feedback. I've submitted the report as proposed.

Andreas


Am 11.01.23 um 20:09 schrieb Andreas Lehmkuehler:

Hi,

find attached a quick draft of the board report we're expected to submit this
month. It's based upon the report wizard template which can be found at [1]

Any comments or additions are appreciated ...


## Description:
The mission of PDFBox is the creation and maintenance of software related to
Java library for working with PDF documents

## Issues:
There are no issues requiring board attention at this time.

## Membership Data:
Apache PDFBox was founded 2009-10-21 (13 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

     2.0.27 was released on 2022-09-29.
     1.8.17 was released on 2022-09-15.
     2.0.26 was released on 2022-04-21.

## Community Health:
- there is a steady stream of contributions, bug reports and questions on the
   mailing lists
- due to the holiday season the last quarter was a little bit quieter than usual
- we are going to cut the first beta release of our next major
   version 3.0.0 this quarter
- we are working on the 3.0 migration guide

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Apache PDFBox Board Report January 2023 due

2023-01-11 Thread Andreas Lehmkuehler

Hi,

find attached a quick draft of the board report we're expected to submit this
month. It's based upon the report wizard template which can be found at [1]

Any comments or additions are appreciated ...


## Description:
The mission of PDFBox is the creation and maintenance of software related to
Java library for working with PDF documents

## Issues:
There are no issues requiring board attention at this time.

## Membership Data:
Apache PDFBox was founded 2009-10-21 (13 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

2.0.27 was released on 2022-09-29.
1.8.17 was released on 2022-09-15.
2.0.26 was released on 2022-04-21.

## Community Health:
- there is a steady stream of contributions, bug reports and questions on the
  mailing lists
- due to the holiday season the last quarter was a little bit quieter than usual
- we are going to cut the first beta release of our next major
  version 3.0.0 this quarter
- we are working on the 3.0 migration guide

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



PDFBOx 3.0.0-beta1 release

2023-01-10 Thread Andreas Lehmkuehler

Hi,

I'm planning to cut our first beta release of 3.0.0. Be aware that the api is 
supposed to be stable after the release.


Are there any objections? Are there any tickets which should be solved before?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



PDFBOX-5538: introduction of a new interface/functional interface to handle a stream cache

2022-11-01 Thread Andreas Lehmkuehler

Hi,

the new on demand parser doesn't use the ScratchFileBuffer anymore and my idea 
was to overhaul/remove the usage of the ScratchFileBuffer for the creation of 
new COSStreams as well. My plan was to wait for the 4.0 release. A couple of 
days ago I realize it might be a good idea to introduce an interface to make the 
usage of that stream a little bit more flexible. An I'd like to do it in 3.0.


In the end it wasn't that hard to do, see PDFBOX-5538 for details. But I had to 
adjust some of the public method signatures such as those load-methods using 
MemoryUsageSetting as parameter.


The main benefit is that the usage of the stream cache within the parser is 
decoupled from the actual implementation. Furthermore users have the opportunity 
to implement their own cache. Most likely we don't have to change the signature 
of the loader methods in a possible 4.0 version.



Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Fwd: Migrating away from Travis-CI

2022-10-28 Thread Andreas Lehmkuehler

Hi,

I've removed all Travis-CUI builds, see PDFBOX-5535

Andreas


Am 25.10.22 um 21:50 schrieb Andreas Lehmkuehler:

Hi,

what do you think? Should we convert our travis builds to github actions? Is 
there any benefit in having those additional builds? Or is it save to rely on 
our jenkins builds only?


Andreas


 Weitergeleitete Nachricht 
Betreff: Migrating away from Travis-CI
Datum: Mon, 24 Oct 2022 15:46:16 -0500
Von: Drew Foulks 
Antwort an: us...@infra.apache.org
An: annou...@infra.apache.org



Greetings PMC Members!

Infrastructure is moving away from Travis-CI at the end of 2022.

If your project is not currently using Travis-CI, this email is not actionable.

Projects using Travis-CI:

Please take a look at 
https://cwiki.apache.org/confluence/display/INFRA/Travis+Migrations 
<https://cwiki.apache.org/confluence/display/INFRA/Travis+Migrations>


and join the builds meetings and mailing list for project updates.





-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Fwd: Migrating away from Travis-CI

2022-10-25 Thread Andreas Lehmkuehler

Hi,

what do you think? Should we convert our travis builds to github actions? Is 
there any benefit in having those additional builds? Or is it save to rely on 
our jenkins builds only?


Andreas


 Weitergeleitete Nachricht 
Betreff:Migrating away from Travis-CI
Datum:  Mon, 24 Oct 2022 15:46:16 -0500
Von:Drew Foulks 
Antwort an: us...@infra.apache.org
An: annou...@infra.apache.org



Greetings PMC Members!

Infrastructure is moving away from Travis-CI at the end of 2022.

If your project is not currently using Travis-CI, this email is not actionable.

Projects using Travis-CI:

Please take a look at 
https://cwiki.apache.org/confluence/display/INFRA/Travis+Migrations 



and join the builds meetings and mailing list for project updates.


--
Cheers,

Drew Foulks
ASF Infra



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Apache PDFBox Board Report October 2022 due

2022-10-11 Thread Andreas Lehmkuehler

Hi,

thanks for the feedback. I've submitted the report as proposed

Andreas


Am 09.10.22 um 14:06 schrieb Andreas Lehmkuehler:

Hi,

find attached a quick draft of the board report we're expected to submit this
month. It's based upon the report wizard template which can be found at [1]

Any comments or additions are appreciated ...


## Description:
The mission of PDFBox is the creation and maintenance of software related to
a Java library for working with PDF documents

## Issues:
There are no issues requiring board attention at this time.

## Membership Data:
Apache PDFBox was founded 2009-10-21 (13 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

     2.0.27 was released on 2022-09-29.
     1.8.17 was released on 2022-09-15.
     2.0.26 was released on 2022-04-21.

## Community Health:
- there is a steady stream of contributions, bug reports and questions on the
   mailing lists
- there are a lot of refactorings, improvements and bugfixes
- we are still planning to cut the first beta release of our next major
   version
   3.0.0
- to do so we start to identify the last tickets with breaking changes to be
   included in 3.0.0.
- due to the releases last month the preparations for the beta release were
   slowed down a little
- there was an article about maintaining interoperability in open source
   software". To do so the authors studied the activities within Apache PDFBox
   for two years without the knowledge of the community. We don't see any
   surprises, see https://s.apache.org/aljtz for further details



Andreas

[1] https://reporter.apache.org/wizard/?pdfbox


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[REPORT] PDFBox - October 2022

2022-10-11 Thread Andreas Lehmkuehler

## Description:
The mission of PDFBox is the creation and maintenance of software related to a 
Java library for working with PDF documents


## Issues:
There are no issues requiring board attention at this time.
## Membership Data:
Apache PDFBox was founded 2009-10-21 (13 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

2.0.27 was released on 2022-09-29.
1.8.17 was released on 2022-09-15.
2.0.26 was released on 2022-04-21.

## Community Health:
- there is a steady stream of contributions, bug reports and questions on the
  mailing lists
- there are a lot of refactorings, improvements and bugfixes
- we are still planning to cut the first beta release of our next major
  version
  3.0.0
- to do so we start to identify the last tickets with breaking changes to be
  included in 3.0.0.
- due to the releases last month the preparations for the beta release were
  slowed down a little
- there was an article about maintaining interoperability in open source
  software". To do so the authors studied the activities within Apache PDFBox
  for two years without the knowledge of the community. We don't see any
  surprises, see https://s.apache.org/aljtz for further details

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Apache PDFBox Board Report October 2022 due

2022-10-09 Thread Andreas Lehmkuehler

Hi,

find attached a quick draft of the board report we're expected to submit this
month. It's based upon the report wizard template which can be found at [1]

Any comments or additions are appreciated ...


## Description:
The mission of PDFBox is the creation and maintenance of software related to
a Java library for working with PDF documents

## Issues:
There are no issues requiring board attention at this time.

## Membership Data:
Apache PDFBox was founded 2009-10-21 (13 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

2.0.27 was released on 2022-09-29.
1.8.17 was released on 2022-09-15.
2.0.26 was released on 2022-04-21.

## Community Health:
- there is a steady stream of contributions, bug reports and questions on the
  mailing lists
- there are a lot of refactorings, improvements and bugfixes
- we are still planning to cut the first beta release of our next major
  version
  3.0.0
- to do so we start to identify the last tickets with breaking changes to be
  included in 3.0.0.
- due to the releases last month the preparations for the beta release were
  slowed down a little
- there was an article about maintaining interoperability in open source
  software". To do so the authors studied the activities within Apache PDFBox
  for two years without the knowledge of the community. We don't see any
  surprises, see https://s.apache.org/aljtz for further details



Andreas

[1] https://reporter.apache.org/wizard/?pdfbox


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[ANNOUNCE] Apache PDFBox 2.0.27 released

2022-09-29 Thread Andreas Lehmkuehler

The Apache PDFBox community is pleased to announce the release of
Apache PDFBox version 2.0.27. The release is available for download at:

https://pdfbox.apache.org/download.html

See the full release notes below for details about this release.

Release Notes -- Apache PDFBox -- Version 2.0.27

Introduction


The Apache PDFBox library is an open source Java tool for working with PDF 
documents.


This is an incremental bugfix release based on the earlier 2.0.26 release. It 
contains

a couple of fixes and small improvements.

For more details on these changes and all the other fixes and improvements
included in this release, please refer to the following issues on the
PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX.

Bug

[PDFBOX-4925] - Invalid stream Length validation in StreamValidationProcess
[PDFBOX-5389] - To set compressed on buffered image while creating a PDF
[PDFBOX-5403] - Blurry / distorted rendering
[PDFBOX-5424] - java.lang.IndexOutOfBoundsException (2)
[PDFBOX-5427] - PDFDebugger does not remove listeners for PagePane when opening 
new File

[PDFBOX-5428] - PDFRenderer.renderImageWithDPI thows EOFException in PDF
[PDFBOX-5429] - PDFCloneUtility.checkForRecursion breaks support for some 
existing PDFs

[PDFBOX-5430] - PDFStreamEngine.showTextStrings with font switch
[PDFBOX-5453] - ClassCastException (PDColor.java:66)
[PDFBOX-5459] - NullPointerException in PDFunctionType3.eval()
[PDFBOX-5460] - Deadlock in TrueTypeFont and RAFDataStream
[PDFBOX-5463] - illegalArgumentException for rendering PDF (image extraction)
[PDFBOX-5465] - NullPointerException in CmapSubtable.getCharCode
[PDFBOX-5470] - PDActionEmbeddedGoTo does not accept a Destination with a page 
number or string

[PDFBOX-5471] - NPE when Transparency Group is missing the BBox
[PDFBOX-5484] - PDFRenderer does not render letters when converting page to 
image
[PDFBOX-5488] - JPEG image rendered with wrong colors when using TwelveMonkeys
[PDFBOX-5499] - Performance issue since 2.0.18
[PDFBOX-5500] - NullPointerException in PDType0Font.readCode() if cMap is null
[PDFBOX-5504] - NullPointerException in CFFParser.parseFont()
[PDFBOX-5505] - IndexOutOfBoundsException in PDCIDFont.readWidths()
[PDFBOX-5506] - IndexOutOfBoundsException in Type1Parser.java
[PDFBOX-5507] - ClassCastException in CMapParser.parseBeginbfchar
[PDFBOX-5508] - ClassCastException in PDXObject.createXObject()
[PDFBOX-5509] - ClassCastException in PDAcroForm.getFields()
[PDFBOX-5510] - ClassCastException in PDDocumentCatalog.getAcroForm()
[PDFBOX-5511] - ClassCastException in PDResources.getIndirect()
[PDFBOX-5513] - getPageLayout throws IllegalArgumentException for empty mode
[PDFBOX-5514] - Font not found because of case issues

Improvement

[PDFBOX-5448] - Clarify access permission for high / low print quality
[PDFBOX-5450] - Create Opaque PDFRenderer example
[PDFBOX-5494] - "invalid object number value" is too scary

Release Contents


This release consists of a single source archive packaged as a zip file.
The archive can be unpacked with the jar tool from your JDK installation.
See the README.txt file for instructions on how to build this release.

The source archive is accompanied by a SHA512 checksum and a PGP signature
that you can use to verify the authenticity of your download.
The public key used for the PGP signature can be found at
https://www.apache.org/dist/pdfbox/KEYS.

About Apache PDFBox
---

Apache PDFBox is an open source Java library for working with PDF documents.
This project allows creation of new PDF documents, manipulation of existing
documents and the ability to extract content from documents. Apache PDFBox
also includes several command line utilities. Apache PDFBox is published
under the Apache License, Version 2.0.

For more information, visit https://pdfbox.apache.org/

About The Apache Software Foundation


Established in 1999, The Apache Software Foundation provides organizational,
legal, and financial support for more than 100 freely-available,
collaboratively-developed Open Source projects. The pragmatic Apache License
enables individual and commercial users to easily deploy Apache software;
the Foundation's intellectual property framework limits the legal exposure
of its 2,500+ contributors.

For more information, visit https://www.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[RESULT][VOTE] Release Apache PDFBox 2.0.27

2022-09-29 Thread Andreas Lehmkuehler

Am 26.09.22 um 17:28 schrieb Andreas Lehmkuehler:

Please vote on releasing this package as Apache PDFBox 2.0.27.


   +1 Tim Allison
   +1 Tilman Hausherr
   +1 Maruan Sahyoun
   +1 Timo Boehme
   +1 Daniel Persson (non-binding)
   +1 Andreas Lehmkühler

Thanks for your support and help!! I'm going to push the release out.

Andreas


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[VOTE] Release Apache PDFBox 2.0.27

2022-09-26 Thread Andreas Lehmkuehler

a candidate for the PDFBox 2.0.27 release is available at:

https://dist.apache.org/repos/dist/dev/pdfbox/2.0.27/

The release candidate is a zip archive of the sources in:

https://svn.apache.org/repos/asf/pdfbox/tags/2.0.27/

The SHA-512 checksum of the archive is 
59a5675f5d1d34f092adc019679f7d10e7e93c0f554a002ac29d48cbffcaa600d930309fa94a92191c01ead8da905cbb37ce5e233dcc9b8732a881d4abf75def.


Please vote on releasing this package as Apache PDFBox 2.0.27.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

[ ] +1 Release this package as Apache PDFBox 2.0.27
[ ] -1 Do not release this package because...

Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: jdk20 build

2022-09-24 Thread Andreas Lehmkuehler

+1 thanks

Andreas

Am 24.09.22 um 13:43 schrieb Tilman Hausherr:

I've reconfigured the jdk18 build to be a jdk20 build due to the release of 
jdk19

Tilman


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Release 2.0.27

2022-09-20 Thread Andreas Lehmkuehler

Am 20.09.22 um 20:23 schrieb Tim Allison:

PS thanks for doing the test!


Thank _you_ for the confirmation!  Onwards!

Thanks to both of you for running the test and checking the results.

I'm planing to cut the release next Monday if no one objects.

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Release 2.0.27

2022-09-19 Thread Andreas Lehmkuehler

Thanks in advance, whoever will be faster ;-)

We are not in a hurry, I'll wait for the results.

Andreas

Am 19.09.22 um 21:09 schrieb Tim Allison:

I should have time tomorrow/Wednesday.  Thank you!

On Mon, Sep 19, 2022 at 2:30 PM Tilman Hausherr  wrote:


On 19.09.2022 08:22, Andreas Lehmkuehler wrote:

1.8.17 is out of the door and I guess it is time for 2.0.27 release.

@Tim or @Tilman
Is there any chance to run the regressions tests any time soon?

Andreas


I could try on friday or on saturday.

Tilman



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Release 2.0.27

2022-09-19 Thread Andreas Lehmkuehler

Am 10.09.22 um 17:30 schrieb Andreas Lehmkuehler:

Am 08.09.22 um 16:12 schrieb Eloisa Costa:

Hello!

We're having an issue converting some PDF pages to JPG and opened a ticket
with you. The fix was made in a May version (2.0.27-Snapshot) that was not
released yet.
We would like to know if you have any release date for the new version, as
we use maven to update our dependencies.

We don't do releases on a regular basis and therefore we don't have any plans 
yet.

The last version was released in April and IMHO a new release is due. I'm going 
to cut a new 2.0.27 release once the 1.8.17 release is done.

1.8.17 is out of the door and I guess it is time for 2.0.27 release.

@Tim or @Tilman
Is there any chance to run the regressions tests any time soon?

Andreas



Cheers Andreas



Best regards,



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[ANNOUNCE] Apache PDFBox 1.8.17 released

2022-09-15 Thread Andreas Lehmkuehler

The Apache PDFBox community is pleased to announce the release of
Apache PDFBox version 1.8.17. The release is available for download at:

https://pdfbox.apache.org/download.html

See the full release notes below for details about this release.

Release Notes -- Apache PDFBox -- Version 1.8.17

Introduction


The Apache PDFBox library is an open source Java tool for working with PDF 
documents.


This is an incremental bugfix release based on the earlier 1.8.16 release. It
contains a couple of fixes and small improvements.

For more details on all fixes included in this release, please refer to the 
following issues on the PDFBox issue tracker at 
https://issues.apache.org/jira/browse/PDFBOX.


Bug

[PDFBOX-1752] - Rendering PDF containing Jpeg2000 fails
[PDFBOX-4312] - Signature is not getting inserted into 0 area
[PDFBOX-4330] - NumberFormatException in CFFParser.readRealNumber()
[PDFBOX-4332] - XMP dates contain time zone, while document info dates do not, 
and this isn't detected by preflight (2)

[PDFBOX-4360] - ArrayIndexOutOfBoundsException in ASCIIHexFilter
[PDFBOX-4372] - Stack overflow around PDFStreamEngine.processStream
[PDFBOX-4404] - Input streams passed to Font.createFont() are not always closed
[PDFBOX-4453] - Encrypted string not decrypted
[PDFBOX-4461] - PDFunctionType0.eval() damages its input
[PDFBOX-4466] - Signature without /Type /Sig can't be read
[PDFBOX-4494] - Problem with google noto bold font and hungarian characters
[PDFBOX-4497] - dash phase start should be float
[PDFBOX-4551] - Prevent printing from CL applications when not authorized
[PDFBOX-4582] - PDJpeg should throw IOException if the image isn't a JPEG
[PDFBOX-4586] - Annotation widgets without AP not detected by preflight
[PDFBOX-4622] - Various exceptions in TTFParser.parse
[PDFBOX-4654] - PDFToImage shows reader image formats in usage
[PDFBOX-4683] - Could not find referenced cmap stream Adobe-Japan1-7
[PDFBOX-4722] - TestTextStripper doesn't detect when less output
[PDFBOX-4727] - ExtractEmbeddedFiles.java example uses name tree keys as file 
names
[PDFBOX-4822] - Off-by-one error in PDSignature.getConvertedContents()
[PDFBOX-4839] - Iphone IOS able to open password PDF file without password
[PDFBOX-4849] - FlateFilter Inflater leaks
[PDFBOX-4902] - PDF/A validation fails when system time zone has minutes
[PDFBOX-4907] - Signature not detected by Acrobat Reader
[PDFBOX-4910] - Build test failure on OpenJDK "Invalid argument to native 
writeImage"
[PDFBOX-4911] - isartor-6-2-2-t02-fail-a.pdf fails on OpenJDK with 
ArrayIndexOutOfBoundsException

[PDFBOX-4913] - ArrayIndexOutOfBoundsException in ShadingContext.convertToRGB()
[PDFBOX-4969] - java.lang.IndexOutOfBoundsException
[PDFBOX-5028] - Partial field names must not contain period characters
[PDFBOX-5033] - CFF FontParser exits with illegal offset in font
[PDFBOX-5127] - Multithreading issue in JempBox's DateConverter
[PDFBOX-5129] - 1.8 build test fails in 
com.ibm.icu.util.VersionInfo.getInstance()
[PDFBOX-5240] - preflight SMask entry check incorrect
[PDFBOX-5393] - NegativeArraySizeException in pfb parser with 0 byte pfb font 
file
[PDFBOX-5459] - NullPointerException in PDFunctionType3.eval()

Improvement

[PDFBOX-3192] - Animal sniffer maven plugin doesn't detect non java 5 api usage 
within the 1.8 branch

[PDFBOX-4420] - Correct javadoc comment
[PDFBOX-4641] - Keywords created using PDFBox are not visible in Acrobat

Task

[PDFBOX-4933] - Correct PDFBOX-1777 to PDFBOX-1977 in tests
[PDFBOX-5165] - Exceedingly slow processing of XMPSchemaMediaManagement's 
getHistory in JempBox


Release Contents


This release consists of a single source archive packaged as a zip file.
The archive can be unpacked with the jar tool from your JDK installation.
See the README.txt file for instructions on how to build this release.

The source archive is accompanied by SHA512 checksum and a PGP signature
that you can use to verify the authenticity of your download. The public
key used for the PGP signature can be found at
https://www.apache.org/dist/pdfbox/KEYS.

About Apache PDFBox
---

Apache PDFBox is an open source Java library for working with PDF documents.
This project allows creation of new PDF documents, manipulation of existing
documents and the ability to extract content from documents. Apache PDFBox
also includes several command line utilities. Apache PDFBox is published
under the Apache License, Version 2.0.

For more information, visit https://pdfbox.apache.org/

About The Apache Software Foundation


Established in 1999, The Apache Software Foundation provides organizational,
legal, and financial support for more than 100 freely-available,
collaboratively-developed Open Source projects. The pragmatic Apache License
enables individual and commercial users to easily deploy Apache software;
the Foundation's intellectual property framework limits the legal exposure
of its 2,500+ 

[RESULT][VOTE] Release Apache PDFBox 1.8.17

2022-09-15 Thread Andreas Lehmkuehler

Am 12.09.22 um 18:50 schrieb Andreas Lehmkuehler:

Please vote on releasing this package as Apache PDFBox 1.8.17.


   +1 Tilman Hausherr
   +1 Timo Boehme
   +1 Maruan Sahyoun
   +1 Andreas Lehmkühler

Thanks for your support and help!! I'm going to push the release out.

Andreas


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 1.8.17

2022-09-13 Thread Andreas Lehmkuehler

Am 13.09.22 um 10:13 schrieb Timo Boehme:

+1

Really interesting how much faster (often 10-20 times; and more correct) 2.X is 
for parsing+rendering compared to the 1.X version.

Looks like all the hard work and effort of our community pay off :-)))




Timo


Am 12.09.22 um 18:50 schrieb Andreas Lehmkuehler:

a candidate for the PDFBox 1.8.17 release is available at:

 https://dist.apache.org/repos/dist/dev/pdfbox/1.8.17/

The release candidate is a zip archive of the sources in:

 https://svn.apache.org/repos/asf/pdfbox/tags/1.8.17/

The SHA-512 checksum of the archive is 
e808b3b159b61b5928b0ad983b3bdadfc694ee80ca8a209669d591f90335165a45de684ea04b23d0a149bfc7ce5d890a287cb4e79300f3a08bb954884024c909.


Please vote on releasing this package as Apache PDFBox 1.8.17.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

 [ ] +1 Release this package as Apache PDFBox 1.8.17
 [ ] -1 Do not release this package because...


Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org







-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[VOTE] Release Apache PDFBox 1.8.17

2022-09-12 Thread Andreas Lehmkuehler

a candidate for the PDFBox 1.8.17 release is available at:

https://dist.apache.org/repos/dist/dev/pdfbox/1.8.17/

The release candidate is a zip archive of the sources in:

https://svn.apache.org/repos/asf/pdfbox/tags/1.8.17/

The SHA-512 checksum of the archive is 
e808b3b159b61b5928b0ad983b3bdadfc694ee80ca8a209669d591f90335165a45de684ea04b23d0a149bfc7ce5d890a287cb4e79300f3a08bb954884024c909.


Please vote on releasing this package as Apache PDFBox 1.8.17.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

[ ] +1 Release this package as Apache PDFBox 1.8.17
[ ] -1 Do not release this package because...


Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Release 2.0.27

2022-09-10 Thread Andreas Lehmkuehler

Am 08.09.22 um 16:12 schrieb Eloisa Costa:

Hello!

We're having an issue converting some PDF pages to JPG and opened a ticket
with you. The fix was made in a May version (2.0.27-Snapshot) that was not
released yet.
We would like to know if you have any release date for the new version, as
we use maven to update our dependencies.

We don't do releases on a regular basis and therefore we don't have any plans 
yet.

The last version was released in April and IMHO a new release is due. I'm going 
to cut a new 2.0.27 release once the 1.8.17 release is done.


Cheers Andreas



Best regards,



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



New PDFBox 1.8.17

2022-09-10 Thread Andreas Lehmkuehler

Hi,

Tilman asked me to cut a 1.8.17 to support our friends from Apache TIKA [1]


I'm going to do so next Monday in 2 days from now if nobody objects.

Cheers Andreas


[1] https://issues.apache.org/jira/browse/PDFBOX-5501

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Replace methods using an InputStream from Loader.loadPDF

2022-08-01 Thread Andreas Lehmkuehler

Am 01.08.22 um 20:20 schrieb Tilman Hausherr:

+1 but
- the explanation below (when to use which class) should be in the javadoc
- the removal should be in the migration guide

It is already on my TODO list

Andreas



Tilman

Am 31.07.2022 um 15:18 schrieb Andreas Lehmkuehler:

Hi fellow devs,


there was a discussion on JIRA [1] about the changed behaviour of the parser 
due to the removal of the ScratchFileBuffer when reading a pdf.


Additionally there was the post "High memory usage with pdfbox 3" on 
users@pdfbox targeting the very same topic


After explaining myself and my changes twice I came to conclusion that I'm 
going to have to do so in the future again and again if we don't change the 
API of Loader.loadPDF


People simply realize that all methods to be used for loading a pdf are moved 
from PDDocument to Loader. They expect the very same behaviour when using a 
similar api and that is understandable from a user point of view.


We have to remove the loadPDF variants using InputStream and replace them with 
RandomAccessRead.


It it comes to InputStreams users have to decide how to procide:
* copy the InputStream to memory by using RandomAccessReadBuffer
* copy the InputStream to a file and use RandomAccessReadBufferedFile or 
RandomAccessReadMemoryMappedFile


This would make it more transparent what happens under the hood when using the 
different kinds of loadPDF methods:


* a byte array as source is already in memory and the obvious choice is to use 
RandomAccessReadBuffer as a wrapper
* a file as source targets a local file and the most obvious choice is to use 
RandomAccessReadBufferedFile as a wrapper. We should document that as the 
other alternative RandomAccessReadMemoryMappedFile is offered in this case
* RandomAccessRead as source is the most obvious one and the user decides how 
to create it. Additionally is ist possible to implement some own caching 
loading and/or mechanism


I know, this will lead to some changes in the codebase of our users, but they 
have to do it in any case as the method was moved, so why not change the data 
type as well



WDYT? Am I missing something?

Andreas

[1] https://issues.apache.org/jira/browse/PDFBOX-5462

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Replace methods using an InputStream from Loader.loadPDF

2022-07-31 Thread Andreas Lehmkuehler

Hi fellow devs,


there was a discussion on JIRA [1] about the changed behaviour of the parser due 
to the removal of the ScratchFileBuffer when reading a pdf.


Additionally there was the post "High memory usage with pdfbox 3" on 
users@pdfbox targeting the very same topic


After explaining myself and my changes twice I came to conclusion that I'm going 
to have to do so in the future again and again if we don't change the API of 
Loader.loadPDF


People simply realize that all methods to be used for loading a pdf are moved 
from PDDocument to Loader. They expect the very same behaviour when using a 
similar api and that is understandable from a user point of view.


We have to remove the loadPDF variants using InputStream and replace them with 
RandomAccessRead.


It it comes to InputStreams users have to decide how to procide:
* copy the InputStream to memory by using RandomAccessReadBuffer
* copy the InputStream to a file and use RandomAccessReadBufferedFile or 
RandomAccessReadMemoryMappedFile


This would make it more transparent what happens under the hood when using the 
different kinds of loadPDF methods:


* a byte array as source is already in memory and the obvious choice is to use 
RandomAccessReadBuffer as a wrapper
* a file as source targets a local file and the most obvious choice is to use 
RandomAccessReadBufferedFile as a wrapper. We should document that as the other 
alternative RandomAccessReadMemoryMappedFile is offered in this case
* RandomAccessRead as source is the most obvious one and the user decides how to 
create it. Additionally is ist possible to implement some own caching loading 
and/or mechanism


I know, this will lead to some changes in the codebase of our users, but they 
have to do it in any case as the method was moved, so why not change the data 
type as well



WDYT? Am I missing something?

Andreas

[1] https://issues.apache.org/jira/browse/PDFBOX-5462

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Apache PDFBox Board Report July 2022 due

2022-07-11 Thread Andreas Lehmkuehler

Hi,

find attached a quick draft of the board report we're expected to submit this
month. It's based upon the report wizard template which can be found at [1]

Any comments or additions are appreciated ...

P.S.: I'm going to add some private comment about the issue we've discussed on 
private@ recently before posting the report



## Description:
The mission of PDFBox is the creation and maintenance of software related to
Java library for working with PDF documents

## Issues:
There are no issues requiring board attention at this time.

## Membership Data:
Apache PDFBox was founded 2009-10-21 (13 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

3.0.0-alpha3 was released on 2022-05-05.
2.0.26 was released on 2022-04-21.
3.0.4 JBIG2 was released on 2022-03-01.
2.0.25 was released on 2021-12-16.

## Community Health:
- there is a steady stream of contributions, bug reports and questions on the
  mailing lists
- there are a lot of refactorings, improvements and bugfixes
- we are planning to cut the first beta release of our next major version
  3.0.0
- to do so we start to identify the last tickets with breaking changes to be
  included in 3.0.0


Andreas

[1] https://reporter.apache.org/wizard/?pdfbox

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: wrong commit msg in jbig2

2022-07-11 Thread Andreas Lehmkuehler

Am 11.07.22 um 19:21 schrieb Tilman Hausherr:

There's a wrong commit message and I can't use

     git commit --amend -m "PDFBOX-4892: update rat"
     git push --force-with-lease

because I get "Rewinding refs/heads/master is forbidden" and "[remote rejected] 
master -> master (pre-receive hook declined)", I assume they have disabled such 
dangerous stuff.


I could probably create a new branch from at the previous position, redo the 
commit properly, push, and then make this the new master branch, but I'm 
reluctant to take this risk.

I'm not a git pro so that I don't have any idea what to do in such cases.

Andreas


Tilman



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: New PDFBox 3.0.0 release

2022-06-18 Thread Andreas Lehmkuehler

Am 14.06.22 um 10:20 schrieb Emmeran Seehuber:

Hi Andreas,


Am 14.06.2022 um 08:19 schrieb Andreas Lehmkuehler :

Hi,

looks like it is time for another 3.0.0 release of PDFBox. Depending on the 
outcome of the next regression test I'd like to cut the next 3.0.0 release.

Should we target another alpha or maybe the first beta?

Or are is it time for a stable 3.0.0 PDFBox release already?


No, I would suggest at least releasing a beta first. The beta should already 
have some kind of API stability guarantee.

I'm ok with that.


I would then port my commercial projects to this beta, and see what breaks.

That will hopefully give us some valuable feedback, thanks in advance

Andreas



Best regards

Emmeran



WDYT?

Do you have some TODOs on your lists which have to be solved first?

I'm going to resolve my remaining 3.0.0 tickets soon

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Mit freundlichen Grüßen aus Augsburg

Emmeran Seehuber
Dipl. Inf. (FH)
Schrannenstraße 8
86150 Augsburg
USt-IdNr.: DE266070804


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: text extraction regression tests for 3.x?

2022-06-16 Thread Andreas Lehmkuehler

Am 15.06.22 um 13:07 schrieb Tim Allison:

In "parse_time_millis_details.xlsx", there are some that took much longer
in 3.x during the multithreaded run but do not show much of a difference
singlethreaded...likely accidents of resources available at parse time.

Overall, the sum of processing times across all files is very similar.

However, I did find two files that really do take up far more time single
threaded in 3.x vs. 2.x.  Again, I'm not sure these need to be dealt with
immediately, and the time required may be a fault of Tika, not PDFBox.
I did some rendering tests and I can't see any significant difference, but I 
didn't do a scientific test with real figures ;-)



commoncrawl3_refetched/SO/SONYLMWCHDDEOC3D5OHEXDTOJ7NGVODV
The file looks like a pdf containing arabic text, but most of the text isn't 
text at all. The pdf uses line graphics for the content. So, the question is, 
what does TIKA in such cases and why seems 3.x be slower that than 2.x?



commoncrawl3_refetched/OL/OLZ5TAS53B4BDC673OFMWZE5DDZ7ZGIN
This file is similar to the other one. It contains a lot of graphics and not 
that much text.


Maybe something with the rendering code and/or default settings is different and 
leads to slower results in 3.x?


Andreas



On Wed, Jun 15, 2022 at 6:49 AM Tim Allison  wrote:


I had a chance to look at new_catastrophic_exceptions_in_b, and the three
files in there take roughly the same amount of time and resources.  I think
they failed on trunk only because of the whims of multithreading and
available resources at the time.

This file is admittedly quite large, but it was able to take up an
unhealthy amount of resources (both RAM and CPU):
bug_trackers/evince/evince-LINK-1250-0.pdf in both 2.x and 3.x (sourrce:
https://gitlab.gnome.org/GNOME/evince/-/issues/1250).  I don't think
there's anything to be done for that one immediately.


On Wed, Jun 15, 2022 at 6:19 AM Tim Allison  wrote:


Reports are here:
https://corpora.tika.apache.org/base/reports/pdfbox-3-20220614.tgz

On Mon, Jun 13, 2022 at 4:54 PM Tim Allison  wrote:


Just seeing this now.  Y.  I'll kick off the tests tomorrow morning (ET).

On Sat, Jun 11, 2022 at 8:09 AM Andreas Lehmkuehler 
wrote:


I've fixed PDFBOX-5452 and found/fixed another one, see PDFBOX-5456

@Tim is there any chance to rerun the regression tests?

Thanks in advance
Andreas

Am 07.06.22 um 08:06 schrieb Andreas Lehmkuehler:

I've found another regression, see PDFBOX-5452

Andreas

Am 29.05.22 um 18:37 schrieb Andreas Lehmkuehler:

Thanks Tim,

looks like there are some regressions, see PDFBOX-5444 and

PDFBOX-5447.


Maybe there are more to come 

Andreas


Am 26.05.22 um 15:04 schrieb Tim Allison:

Apologies for my delay.  I ran trunk/3.x on May 12 against 2.0.26.

The

reports are here:


https://corpora.tika.apache.org/base/reports/reports_pdfbox_3x_20220512.tgz


Happy to rerun with a more recent version of trunk.

Cheers,

Tim

On Sun, May 8, 2022 at 1:21 PM Andreas Lehmkuehler <

andr...@lehmi.de> wrote:



Am 06.05.22 um 14:30 schrieb Tim Allison:

All,
 Let me know when makes sense to run the text extraction

regression

Yes, it'd be useful to have some update results.

How about comparing 2.0.26 vs 3.0.0-alpha3 and maybe 3.0.0-alpha2

vs.

3.0.0-alpha3?



tests for 3.x.  I regret I haven't been following our mailing

list as

closely as I should be.

No need to worry, everything is fine.

Andreas



  Cheers,

  Tim



-

To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org






-

To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org







-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org







-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: text extraction regression tests for 3.x?

2022-06-16 Thread Andreas Lehmkuehler

Am 15.06.22 um 12:19 schrieb Tim Allison:

Reports are here:
https://corpora.tika.apache.org/base/reports/pdfbox-3-20220614.tgz

@Tim thanks again

Looks like there aren't any new exceptions in 3.0.0 at all, ergo we are good to 
target a new release  :-)


Andreas



On Mon, Jun 13, 2022 at 4:54 PM Tim Allison  wrote:


Just seeing this now.  Y.  I'll kick off the tests tomorrow morning (ET).

On Sat, Jun 11, 2022 at 8:09 AM Andreas Lehmkuehler 
wrote:


I've fixed PDFBOX-5452 and found/fixed another one, see PDFBOX-5456

@Tim is there any chance to rerun the regression tests?

Thanks in advance
Andreas

Am 07.06.22 um 08:06 schrieb Andreas Lehmkuehler:

I've found another regression, see PDFBOX-5452

Andreas

Am 29.05.22 um 18:37 schrieb Andreas Lehmkuehler:

Thanks Tim,

looks like there are some regressions, see PDFBOX-5444 and PDFBOX-5447.

Maybe there are more to come 

Andreas


Am 26.05.22 um 15:04 schrieb Tim Allison:

Apologies for my delay.  I ran trunk/3.x on May 12 against 2.0.26.

The

reports are here:


https://corpora.tika.apache.org/base/reports/reports_pdfbox_3x_20220512.tgz


Happy to rerun with a more recent version of trunk.

Cheers,

Tim

On Sun, May 8, 2022 at 1:21 PM Andreas Lehmkuehler 

wrote:



Am 06.05.22 um 14:30 schrieb Tim Allison:

All,
 Let me know when makes sense to run the text extraction

regression

Yes, it'd be useful to have some update results.

How about comparing 2.0.26 vs 3.0.0-alpha3 and maybe 3.0.0-alpha2 vs.
3.0.0-alpha3?



tests for 3.x.  I regret I haven't been following our mailing list

as

closely as I should be.

No need to worry, everything is fine.

Andreas



  Cheers,

  Tim



-

To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org







-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org







-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



New PDFBox 3.0.0 release

2022-06-14 Thread Andreas Lehmkuehler

Hi,

looks like it is time for another 3.0.0 release of PDFBox. Depending on the 
outcome of the next regression test I'd like to cut the next 3.0.0 release.


Should we target another alpha or maybe the first beta?

Or are is it time for a stable 3.0.0 PDFBox release already?

WDYT?

Do you have some TODOs on your lists which have to be solved first?

I'm going to resolve my remaining 3.0.0 tickets soon

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: text extraction regression tests for 3.x?

2022-06-11 Thread Andreas Lehmkuehler

I've fixed PDFBOX-5452 and found/fixed another one, see PDFBOX-5456

@Tim is there any chance to rerun the regression tests?

Thanks in advance
Andreas

Am 07.06.22 um 08:06 schrieb Andreas Lehmkuehler:

I've found another regression, see PDFBOX-5452

Andreas

Am 29.05.22 um 18:37 schrieb Andreas Lehmkuehler:

Thanks Tim,

looks like there are some regressions, see PDFBOX-5444 and PDFBOX-5447.

Maybe there are more to come 

Andreas


Am 26.05.22 um 15:04 schrieb Tim Allison:

Apologies for my delay.  I ran trunk/3.x on May 12 against 2.0.26.  The
reports are here:
https://corpora.tika.apache.org/base/reports/reports_pdfbox_3x_20220512.tgz

Happy to rerun with a more recent version of trunk.

Cheers,

   Tim

On Sun, May 8, 2022 at 1:21 PM Andreas Lehmkuehler  wrote:


Am 06.05.22 um 14:30 schrieb Tim Allison:

All,
    Let me know when makes sense to run the text extraction regression

Yes, it'd be useful to have some update results.

How about comparing 2.0.26 vs 3.0.0-alpha3 and maybe 3.0.0-alpha2 vs.
3.0.0-alpha3?



tests for 3.x.  I regret I haven't been following our mailing list as
closely as I should be.

No need to worry, everything is fine.

Andreas



 Cheers,

 Tim

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org







-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: text extraction regression tests for 3.x?

2022-06-07 Thread Andreas Lehmkuehler

I've found another regression, see PDFBOX-5452

Andreas

Am 29.05.22 um 18:37 schrieb Andreas Lehmkuehler:

Thanks Tim,

looks like there are some regressions, see PDFBOX-5444 and PDFBOX-5447.

Maybe there are more to come 

Andreas


Am 26.05.22 um 15:04 schrieb Tim Allison:

Apologies for my delay.  I ran trunk/3.x on May 12 against 2.0.26.  The
reports are here:
https://corpora.tika.apache.org/base/reports/reports_pdfbox_3x_20220512.tgz

Happy to rerun with a more recent version of trunk.

Cheers,

   Tim

On Sun, May 8, 2022 at 1:21 PM Andreas Lehmkuehler  wrote:


Am 06.05.22 um 14:30 schrieb Tim Allison:

All,
    Let me know when makes sense to run the text extraction regression

Yes, it'd be useful to have some update results.

How about comparing 2.0.26 vs 3.0.0-alpha3 and maybe 3.0.0-alpha2 vs.
3.0.0-alpha3?



tests for 3.x.  I regret I haven't been following our mailing list as
closely as I should be.

No need to worry, everything is fine.

Andreas



 Cheers,

 Tim

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org







-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: FoxitDingbats

2022-06-01 Thread Andreas Lehmkuehler

Am 01.06.22 um 16:09 schrieb Tilman Hausherr:

There's a Zapf Dingbats replacement FoxitDingbats with a nice license:
https://github.com/mozilla/pdf.js/blob/master/external/standard_fonts/FoxitDingbats.pfb 


https://github.com/mozilla/pdf.js/blob/master/external/standard_fonts/LICENSE_FOXIT

Sadly I'm unable to open this file. It should start with 0x80 but doesn't. I 
tried adding it with a SequenceInputStream but got another error. (Which 
resulted in PDFBOX-5424).
I was able to open it using FontForge on Linux. I'm afraid the file extension is 
misleading. It isn't a PS-type1 font. I was able to parse the font using the 
CFFParser.



In the distant future, my wish is that we distribute the standard 14 fonts and 
use these as fallback (currently we use only one font), to increase the 
likelihood that PDFBox "just works" when used on smaller systems without lots of 
fonts.

A good idea


Andreas


Tilman



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: text extraction regression tests for 3.x?

2022-05-29 Thread Andreas Lehmkuehler

Thanks Tim,

looks like there are some regressions, see PDFBOX-5444 and PDFBOX-5447.

Maybe there are more to come 

Andreas


Am 26.05.22 um 15:04 schrieb Tim Allison:

Apologies for my delay.  I ran trunk/3.x on May 12 against 2.0.26.  The
reports are here:
https://corpora.tika.apache.org/base/reports/reports_pdfbox_3x_20220512.tgz

Happy to rerun with a more recent version of trunk.

Cheers,

   Tim

On Sun, May 8, 2022 at 1:21 PM Andreas Lehmkuehler  wrote:


Am 06.05.22 um 14:30 schrieb Tim Allison:

All,
Let me know when makes sense to run the text extraction regression

Yes, it'd be useful to have some update results.

How about comparing 2.0.26 vs 3.0.0-alpha3 and maybe 3.0.0-alpha2 vs.
3.0.0-alpha3?



tests for 3.x.  I regret I haven't been following our mailing list as
closely as I should be.

No need to worry, everything is fine.

Andreas



 Cheers,

 Tim

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org







-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [ANNOUNCE] Apache PDFBox 3.0.0-alpha2 released

2022-05-08 Thread Andreas Lehmkuehler

Hi Emmeran,

thanks for the prompt feedback. I'm glad to hear that everything works fine for 
you.

Andreas


Am 06.05.22 um 10:09 schrieb Emmeran Seehuber:

Hi,

I’ve just updated the 3.0 branch of my PDFBox-Graphics2D adapter to PDFBox 
3.0.0-alpha3 and did a matching 3.0.0-alpha3 release.

Everything seems work fine.

Best regards

Emmeran


Am 05.05.2022 um 19:46 schrieb Andreas Lehmkuehler :

The Apache PDFBox community is pleased to announce the release of the third 
alpha release for Apache PDFBox 3.0.0. It is available for download at:

https://pdfbox.apache.org/download.html

The Apache PDFBox library is an open source Java tool for working with PDF 
documents.

This is the third alpha release for the upcoming major release 3.0.0 of PDFBox. 
It contains a lot of improvements, fixes and refactorings. The API is supposed 
to be stable, but we can't guarantee that there won't be any last changes to it 
before providing the final release candidate.

A migration guide is available at https://pdfbox.apache.org/3.0/migration.html. 
It is still a work in progress and we are happy to include any valuable 
feedback from our community.

For more details on these changes and all the other fixes and improvements
included in this release, please refer to the following issues on the
PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX.


The full release notes are available at:

https://www.apache.org/dist/pdfbox/3.0.0-alpha3/RELEASE-NOTES.txt


The Apache PDFBox website can be found at:

https://pdfbox.apache.org/



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Mit freundlichen Grüßen aus Augsburg

Emmeran Seehuber
Dipl. Inf. (FH)
Schrannenstraße 8
86150 Augsburg
USt-IdNr.: DE266070804


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: text extraction regression tests for 3.x?

2022-05-08 Thread Andreas Lehmkuehler

Am 06.05.22 um 14:30 schrieb Tim Allison:

All,
   Let me know when makes sense to run the text extraction regression

Yes, it'd be useful to have some update results.

How about comparing 2.0.26 vs 3.0.0-alpha3 and maybe 3.0.0-alpha2 vs. 
3.0.0-alpha3?



tests for 3.x.  I regret I haven't been following our mailing list as
closely as I should be.

No need to worry, everything is fine.

Andreas



Cheers,

Tim

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [ANNOUNCE] Apache PDFBox 3.0.0-alpha2 released

2022-05-05 Thread Andreas Lehmkuehler

Sorry for the confusion in the subject. It is the new alpha3 release.

Andreas

Am 05.05.22 um 19:46 schrieb Andreas Lehmkuehler:
The Apache PDFBox community is pleased to announce the release of the third 
alpha release for Apache PDFBox 3.0.0. It is available for download at:


https://pdfbox.apache.org/download.html

The Apache PDFBox library is an open source Java tool for working with PDF 
documents.


This is the third alpha release for the upcoming major release 3.0.0 of PDFBox. 
It contains a lot of improvements, fixes and refactorings. The API is supposed 
to be stable, but we can't guarantee that there won't be any last changes to it 
before providing the final release candidate.


A migration guide is available at https://pdfbox.apache.org/3.0/migration.html. 
It is still a work in progress and we are happy to include any valuable feedback 
from our community.


For more details on these changes and all the other fixes and improvements
included in this release, please refer to the following issues on the
PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX.


The full release notes are available at:

https://www.apache.org/dist/pdfbox/3.0.0-alpha3/RELEASE-NOTES.txt


The Apache PDFBox website can be found at:

https://pdfbox.apache.org/



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[ANNOUNCE] Apache PDFBox 3.0.0-alpha2 released

2022-05-05 Thread Andreas Lehmkuehler
The Apache PDFBox community is pleased to announce the release of the third 
alpha release for Apache PDFBox 3.0.0. It is available for download at:


https://pdfbox.apache.org/download.html

The Apache PDFBox library is an open source Java tool for working with PDF 
documents.


This is the third alpha release for the upcoming major release 3.0.0 of PDFBox. 
It contains a lot of improvements, fixes and refactorings. The API is supposed 
to be stable, but we can't guarantee that there won't be any last changes to it 
before providing the final release candidate.


A migration guide is available at https://pdfbox.apache.org/3.0/migration.html. 
It is still a work in progress and we are happy to include any valuable feedback 
from our community.


For more details on these changes and all the other fixes and improvements
included in this release, please refer to the following issues on the
PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX.


The full release notes are available at:

https://www.apache.org/dist/pdfbox/3.0.0-alpha3/RELEASE-NOTES.txt


The Apache PDFBox website can be found at:

https://pdfbox.apache.org/



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[RESULT][VOTE] Release Apache PDFBox 3.0.0-alpha3

2022-05-05 Thread Andreas Lehmkuehler

Am 02.05.22 um 19:27 schrieb Andreas Lehmkuehler:

Please vote on releasing this package as Apache PDFBox 3.0.0-alpha3.


   +1 Tilman Hausherr
   +1 Maruan Sahyoun
   +1 Andreas Lehmkühler

Thanks for your support and help!! I'm going to push the release out.

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Jenkins build became unstable: PDFBox » PDFBox-trunk #1287

2022-05-05 Thread Andreas Lehmkuehler

I somehow missed that issue when running the tests localy :-(

I'll have a look or revert the changes

Andreas


Am 05.05.22 um 09:12 schrieb Apache Jenkins Server:

See 



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[VOTE] Release Apache PDFBox 3.0.0-alpha3

2022-05-02 Thread Andreas Lehmkuehler

Hi,

a candidate for the PDFBox 3.0.0-alpha3 release is available at:

https://dist.apache.org/repos/dist/dev/pdfbox/3.0.0-alpha3/

The release candidate is a zip archive of the sources in:

http://svn.apache.org/repos/asf/pdfbox/tags/3.0.0-alpha3/

The SHA-512 checksum of the archive is 
1cc2f84335745e0282cda192418d62aff0e85f3f1db8567f4484d086d02f1609beba1afaa348d167c07025b6fad2426a58d53e66b7c5e68e19029c1510c2966f.


Please vote on releasing this package as Apache PDFBox 3.0.0-alpha3.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

[ ] +1 Release this package as Apache PDFBox 3.0.0-alpha3
[ ] -1 Do not release this package because...

Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: New PDFBox 3.0.0 alpha release

2022-05-01 Thread Andreas Lehmkuehler

Am 25.04.22 um 07:53 schrieb Andreas Lehmkuehler:

Hi,

I'm planning to cut another alpha release for PDFBox 3.0.0 in a week from now on 
next Monday.

I'm going to cut the next alpha release tomorrow, approx. about 24 hours from 
now.

Andreas




Any objections?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



New PDFBox 3.0.0 alpha release

2022-04-24 Thread Andreas Lehmkuehler

Hi,

I'm planning to cut another alpha release for PDFBox 3.0.0 in a week from now on 
next Monday.


Any objections?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Sonar build and failing tests

2022-04-24 Thread Andreas Lehmkuehler

Hi,

does anyone know what happens with the sonar build? 60 of the tests are failing 
throwing an NPE. All test using 
org.apache.pdfbox.rendering.PDFRenderer.PDFRenderer(PDDocument) are affected. It 
looks like the NPE is thrown when creating an instance of PDFRenderer.


Is this maybe related to the kcms stuff within the constructor?


Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[ANNOUNCE] Apache PDFBox 2.0.26 released

2022-04-21 Thread Andreas Lehmkuehler

The Apache PDFBox community is pleased to announce the release of
Apache PDFBox version 2.0.26. The release is available for download at:

https://pdfbox.apache.org/download.html

See the full release notes below for details about this release.

Release Notes -- Apache PDFBox -- Version 2.0.26

Introduction


The Apache PDFBox library is an open source Java tool for working with PDF 
documents.


This is an incremental bugfix release based on the earlier 2.0.25 release. It 
contains a couple of fixes and small improvements.


For more details on these changes and all the other fixes and improvements
included in this release, please refer to the following issues on the
PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX.

Bug

[PDFBOX-4623] - COSParser: Infinite recursion
[PDFBOX-5203] - TestCreateSignature.testCreateSignedTimeStamp checkLTV build 
test fail

[PDFBOX-5283] - No Content - xRef / Obj Parsing
[PDFBOX-5305] - Pdf-A/1b Validation
[PDFBOX-5339] - A list of bugs found (70 bugs in total)
[PDFBOX-5342] - Text size option for PDFBox Debugger
[PDFBOX-5345] - IllegalArgumentException: Input buffer too short in 
StandardSecurityHandler.computeRC4key

[PDFBOX-5352] - ArrayIndexOutOfBoundsException in PDSeparation.tintTransform()
[PDFBOX-5360] - EOFException: Can't read 20 bytes
[PDFBOX-5361] - Wrong datatype for OPM in PDExtendedGraphicsState
[PDFBOX-5366] - Unhandled IOException thrown from BaseParser creates issue in 
PDFStreamEngine.processStreamOperators
[PDFBOX-5372] - *LOADS of* "WARNING: key node000x already exists in 
destination IDTree"

[PDFBOX-5373] - NullPointerException in PDRange.getMin()
[PDFBOX-5376] - Image interpolation when there shouldn't be
[PDFBOX-5377] - pDAcroForm.flatten() does not remove /SigFlags in /Catalog 
object
[PDFBOX-5380] - Could not read embedded TTF for font
[PDFBOX-5387] - ToUnicodeWriter.writeTo allows byte overflow in bfrange operator
[PDFBOX-5390] - TextToPDF appends space to each line
[PDFBOX-5393] - NegativeArraySizeException in pfb parser with 0 byte pfb font 
file
[PDFBOX-5395] - Hangup in COSFilterInputStream.nextRange
[PDFBOX-5397] - Certain PDF cannot be processed
[PDFBOX-5398] - Parsing fails in 2.0.26 that worked in 2.0.25
[PDFBOX-5399] - Object must be defined and must not be compressed object
[PDFBOX-5400] - Page tree root must be a dictionary
[PDFBOX-5401] - A carefully crafted pdf can trigger an infinite loop while 
parsing
[PDFBOX-5402] - POCIDFontType2 (Wingdings) encode throws a NullPointerException
[PDFBOX-5410] - Possible loop detection is triggered in 2.0.26 but file works in 
2.0.25
[PDFBOX-5412] - IOException: object reference 112 0 R at offset 18355 in content 
stream

[PDFBOX-5413] - Field text missing
[PDFBOX-5418] - NPE during page render
[PDFBOX-5419] - Parsing shows 1 empty page with 2.0.26 and 7 with 2.0.25

Improvement

[PDFBOX-5347] - Create push button example
[PDFBOX-5348] - FontMapper should also take into account the user's font 
directory on Windows operating systems

[PDFBOX-5363] - Don't log warnings if there are not fonts to cache
[PDFBOX-5379] - support multiple widgets in PDTerminalField.importFDF()
[PDFBOX-5385] - Improve AddValidationInformation to handle exceptional 
situations better

[PDFBOX-5388] - Avoid duplicate certificates in AddValidation example
[PDFBOX-5394] - Render symbol for file attachment annotations

Task

[PDFBOX-5356] - Add test of PFB font
[PDFBOX-5396] - Add maven enforcer rule to ensure that JAVA_HOME is set

Release Contents


This release consists of a single source archive packaged as a zip file.
The archive can be unpacked with the jar tool from your JDK installation.
See the README.txt file for instructions on how to build this release.

The source archive is accompanied by a SHA512 checksum and a PGP signature
that you can use to verify the authenticity of your download.
The public key used for the PGP signature can be found at
https://www.apache.org/dist/pdfbox/KEYS.

About Apache PDFBox
---

Apache PDFBox is an open source Java library for working with PDF documents.
This project allows creation of new PDF documents, manipulation of existing
documents and the ability to extract content from documents. Apache PDFBox
also includes several command line utilities. Apache PDFBox is published
under the Apache License, Version 2.0.

For more information, visit https://pdfbox.apache.org/

About The Apache Software Foundation


Established in 1999, The Apache Software Foundation provides organizational,
legal, and financial support for more than 100 freely-available,
collaboratively-developed Open Source projects. The pragmatic Apache License
enables individual and commercial users to easily deploy Apache software;
the Foundation's intellectual property framework limits the legal exposure
of its 2,500+ contributors.

For more information, visit https://www.apache.org/


[RESULT][VOTE] Release Apache PDFBox 2.0.26

2022-04-21 Thread Andreas Lehmkuehler

Am 18.04.22 um 13:14 schrieb Andreas Lehmkuehler:

Please vote on releasing this package as Apache PDFBox 2.0.26.


   +1 Tilman Hausherr
   +1 Maruan Sahyoun
   +1 Tim Allison
   +1 Andreas Lehmkühler

Thanks for your support and help!! I'm going to push the release out.

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[VOTE] Release Apache PDFBox 2.0.26

2022-04-18 Thread Andreas Lehmkuehler

Hi,

a candidate for the PDFBox 2.0.26 release is available at:

https://dist.apache.org/repos/dist/dev/pdfbox/2.0.26/

The release candidate is a zip archive of the sources in:

http://svn.apache.org/repos/asf/pdfbox/tags/2.0.26/

The SHA-512 checksum of the archive is 
e14c57e28d10324dbcb6ad239bad5751a2dab0035bbd80427afd03f65467ec1376ddd7d08e7cefd4d950b149f85d8f505f6f50cc3093fd65bb8a2cbb2b8c7c1e.


Please vote on releasing this package as Apache PDFBox 2.0.26.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

[ ] +1 Release this package as Apache PDFBox 2.0.26
[ ] -1 Do not release this package because...

Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: 2.0.26 release

2022-04-17 Thread Andreas Lehmkuehler

Am 17.04.22 um 20:25 schrieb Tilman Hausherr:

new regression tests results at

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.25_vs_2.0.26.tar.xz

IMHO we're fine now!

Thanks for the fast re-test!

I'm going to cut the 2.0.26 release tomorrow

Andreas



Tilman

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: 2.0.26 release

2022-04-14 Thread Andreas Lehmkuehler

Cool, thanks for the feedback. I've set the ticket to resolved.

Do we need to re-run the tests?

BTW, what about PDFBOX-5394? Is there anything left to do? Do we have to wait 
for the feedback of the user?


Andreas

Am 13.04.22 um 08:29 schrieb Tilman Hausherr:

Yeah, PDFBOX-5413 fixes that one as well. 

Tilman

Am 12.04.2022 um 19:26 schrieb Tilman Hausherr:

Only one left: 7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M.pdf .

There is some sort of problem with an incremental save, a part of the 
multi-content stream is missing / has a new object number. Lets wait whether 
it is related to PDFBOX-5413 .


(The other one, HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5.pdf is an improvement, I'll 
add it to my own tests)


Tilman

Am 12.04.2022 um 18:25 schrieb Tilman Hausherr:

Only
commoncrawl3/7L/7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M
commoncrawl3/HO/HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5
have a different text extraction

With the other two it's attachment file names or doc info.

Tilman

Am 12.04.2022 um 08:16 schrieb Tilman Hausherr:
After having looked at the content differences and trying to rule out the 
/Names differences, there are 4 files with content in TOP_10_MORE_IN_A that 
feel suspicious and IMHO need investigation.


commoncrawl3/7L/7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M
govdocs1/365/365260.pdf
commoncrawl3/HO/HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5
govdocs1/150/150282.pdf

Tilman



Am 12.04.2022 um 08:09 schrieb Andreas Lehmkuehler:

Thanks Tim!

Looks like there are 5 new exceptions left.

I'm going to check the first two ones

commoncrawl3/ZC/ZCY5MCL7KI6QXVMXUZ2AJKXICQIT4TL4
commoncrawl3/WY/WYPJNTD5KQNODSXWK4GABURXRTTD5P4H

The others are thrown within Jempbox 


Andreas

Am 11.04.22 um 12:40 schrieb Tim Allison:

https://corpora.tika.apache.org/base/reports/tika-2.4-20220410.tgz

Haven't had a chance to review.  Hot off the vm.

On Sun, Apr 10, 2022 at 9:58 AM Tim Allison  wrote:


Will try to kick off today…first thing Monday morning (EDT) at the latest.

On Sun, Apr 10, 2022 at 9:05 AM Andreas Lehmkuehler  
wrote:


Am 09.04.22 um 19:00 schrieb Tilman Hausherr:

testFlattenPDFBOX2469Filled also fails in 2.0 (it is disabled by default).

I've fixed all new tickets. PDFBOX-5413 fixes the issue with the disabled
flatten test.

@Tim Is there any chance to re-run the tests?

Andreas



testFlattenPDFBOX2469Filled(org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest) 


Time elapsed: 1.083 s  <<< ERROR!
java.io.IOException: javax.crypto.BadPaddingException: Given final 
block not
properly padded. Such issues can arise if a bad key is used during 
decryption.

  at
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345) 



  at
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309) 



  at
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105) 



Caused by: javax.crypto.BadPaddingException: Given final block not 
properly

padded. Such issues can arise if a bad key is used during decryption.
  at
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345) 



  at
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309) 



  at
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105) 




I'm not creating an issue this time in case this is also related to 
another

known problem.

Tilman



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.a

Re: 2.0.26 release

2022-04-12 Thread Andreas Lehmkuehler

Thanks Tim!

Looks like there are 5 new exceptions left.

I'm going to check the first two ones

commoncrawl3/ZC/ZCY5MCL7KI6QXVMXUZ2AJKXICQIT4TL4
commoncrawl3/WY/WYPJNTD5KQNODSXWK4GABURXRTTD5P4H

The others are thrown within Jempbox 


Andreas

Am 11.04.22 um 12:40 schrieb Tim Allison:

https://corpora.tika.apache.org/base/reports/tika-2.4-20220410.tgz

Haven't had a chance to review.  Hot off the vm.

On Sun, Apr 10, 2022 at 9:58 AM Tim Allison  wrote:


Will try to kick off today…first thing Monday morning (EDT) at the latest.

On Sun, Apr 10, 2022 at 9:05 AM Andreas Lehmkuehler  wrote:


Am 09.04.22 um 19:00 schrieb Tilman Hausherr:

testFlattenPDFBOX2469Filled also fails in 2.0 (it is disabled by default).

I've fixed all new tickets. PDFBOX-5413 fixes the issue with the disabled
flatten test.

@Tim Is there any chance to re-run the tests?

Andreas



testFlattenPDFBOX2469Filled(org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest)
Time elapsed: 1.083 s  <<< ERROR!
java.io.IOException: javax.crypto.BadPaddingException: Given final block not
properly padded. Such issues can arise if a bad key is used during decryption.
  at
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345)

  at
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309)

  at
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105)

Caused by: javax.crypto.BadPaddingException: Given final block not properly
padded. Such issues can arise if a bad key is used during decryption.
  at
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345)

  at
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309)

  at
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105)


I'm not creating an issue this time in case this is also related to another
known problem.

Tilman



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: 2.0.26 release

2022-04-10 Thread Andreas Lehmkuehler

Am 09.04.22 um 19:00 schrieb Tilman Hausherr:

testFlattenPDFBOX2469Filled also fails in 2.0 (it is disabled by default).
I've fixed all new tickets. PDFBOX-5413 fixes the issue with the disabled 
flatten test.


@Tim Is there any chance to re-run the tests?

Andreas



testFlattenPDFBOX2469Filled(org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest) 
Time elapsed: 1.083 s  <<< ERROR!
java.io.IOException: javax.crypto.BadPaddingException: Given final block not 
properly padded. Such issues can arise if a bad key is used during decryption.
     at 
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345) 

     at 
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309) 

     at 
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105) 

Caused by: javax.crypto.BadPaddingException: Given final block not properly 
padded. Such issues can arise if a bad key is used during decryption.
     at 
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345) 

     at 
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309) 

     at 
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105) 



I'm not creating an issue this time in case this is also related to another 
known problem.


Tilman



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Apache PDFBox Board Report April 2022 due

2022-04-10 Thread Andreas Lehmkuehler

Hi,

find attached a quick draft of the board report we're expected to submit this
month. It's based upon the report wizard template which can be found at [1]

Any comments or additions are appreciated ...


## Description:
The mission of PDFBox is the creation and maintenance of software related to
Java library for working with PDF documents

## Issues:
There are no issues requiring board attention at this time.

## Membership Data:
Apache PDFBox was founded 2009-10-21 (12 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

3.0.4 JBIG2 was released on 2022-03-01.
2.0.25 was released on 2021-12-16.
3.0.0-alpha2 was released on 2021-09-10.

## Community Health:
- there is a steady stream of contributions, bug reports and questions on the
  mailing lists
- there are a lot of refactorings, improvements and bugfixes
- the release process for 2.0.26 just started
- we are planning to cut another alpha release of our next major version 3.0.0



Andreas

[1] https://reporter.apache.org/wizard/?pdfbox

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: 2.0.26 release

2022-04-07 Thread Andreas Lehmkuehler

Thanks Tim!

I've checked the first files of the new exceptions and there seems to be at 
least one new regression


commoncrawl3/ZC/ZCY5MCL7KI6QXVMXUZ2AJKXICQIT4TL4
commoncrawl3/WY/WYPJNTD5KQNODSXWK4GABURXRTTD5P4H
commoncrawl3/YI/YIEMGIQYGXCQ5AZOE35ESXYCZHWR3V57
commoncrawl3_refetched/5C/5CWAUHFCZMK42IHSMSKNIR3MHXHR4IRN

All render fine using 2.0.25 but throw an exception using 2.0.26

I'm going to have a deeper look later

Am 07.04.22 um 20:27 schrieb Tim Allison:

https://corpora.tika.apache.org/base/reports/pdfbox-2.0.26-snapshot-reports.tgz

I haven't had a chance to look at them yet.

On Thu, Apr 7, 2022 at 9:07 AM Andreas Lehmkühler  wrote:


Yes, please

Thanks in advance
Andreas

07.04.2022 11:44:38 Tim Allison :


Sounds great! Should I rerun the regression tests today?

On Thu, Apr 7, 2022 at 1:41 AM Andreas Lehmkuehler  wrote:


Hi,

sorry for the delay.  I'm planning to cut the 2.0.26 release next
Saturday, the
day after tomorrow, if nobody objects.

Andreas

P.S.: I'm targeting a new 3.0.0 alpha release once the 2.0.26 release is
out

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



2.0.26 release

2022-04-06 Thread Andreas Lehmkuehler

Hi,

sorry for the delay.  I'm planning to cut the 2.0.26 release next Saturday, the 
day after tomorrow, if nobody objects.


Andreas

P.S.: I'm targeting a new 3.0.0 alpha release once the 2.0.26 release is out

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: 2.0.26 release? WAS: JBIG2 3.0.4 release?

2022-03-23 Thread Andreas Lehmkuehler

Am 23.03.22 um 05:28 schrieb Tilman Hausherr:
I have created two issues on parsing exceptions, and it's not PDFBOX-5283. Maybe 
it's the same, maybe not. Re text extraction, I looked at one of the files 
(414724.pdf) and there's also a parsing warning, so maybe that is related too so 
lets just wait.

Thanks for the quick analysis. I'm going to have a look

Andreas



Tilman

Am 22.03.2022 um 18:21 schrieb Tilman Hausherr:
I don't have much time right now, but I just tested 077867.pdf and 392443.pdf 
and it's definitively a regression. I wonder if it was PDFBOX-5283.


The files in content_diffs_no_exceptions.xls where the T column is non empty 
are suspicious and need more investigation.


Tilman


Am 22.03.2022 um 13:29 schrieb Tim Allison:

Reports are here:
https://corpora.tika.apache.org/base/reports/tika-2.3-vs-2.4-pdfs.tgz

It looks like no significant changes.  Some diffs on a few files, but
this was run on ~800k PDFs.

There are a couple of cases where a file is now being detected as
rfc822 instead of PDF.  We have to fix that on the Tika side.

On Mon, Mar 21, 2022 at 12:53 PM Andreas Lehmkuehler  wrote:


Am 21.03.22 um 12:21 schrieb Tim Allison:

I'm happy to run the tests today if that would be of any interest.

Yes, please.

TIA
Andreas



On Sun, Mar 20, 2022 at 5:01 PM Andreas Lehmkuehler  wrote:

Am 13.03.22 um 14:20 schrieb Tim Allison:

   From Tika's perspective, there's no rush. We're waiting for a bug fix
in POI (TIKA-3699).

Please let me know if/when I should run the regression tests.

Thanks for the offer. Do we need to run the tests before cutting the release?

Most of the tickets aren't related to text extraction. Those which are 
related

should decrease the number of exceptions and increase the accuracy.

WDYT?



Thank you, all!

Cheers,

   Tim

On Sat, Mar 12, 2022 at 5:29 AM Andreas Lehmkuehler  
wrote:

Am 11.03.22 um 08:30 schrieb Tilman Hausherr:

Am 11.03.2022 um 08:19 schrieb Andreas Lehmkuehler:

Am 10.03.22 um 20:16 schrieb Tilman Hausherr:

I'd agree but that might mean PDFBOX-5384 wouldn't be fixed.
It's there for quite some time and it seems to be a seldom corner 
case. IMHO

it can wait if we won't find a solution before Monday.
No, that one was created on March 2nd. Oliver has just posted a 
suggestion so

maybe that is a solution.

The ticket is quite new, but the issue itself was introduced in 2018 with
2.0.12. ;-)

However, I'll have a look at the proposed solution.

Andreas

Tilman



WDYT?

Andreas


Tilman

Am 10.03.2022 um 19:05 schrieb Andreas Lehmkuehler:

Am 09.03.22 um 17:07 schrieb Tim Allison:

All,

I've been out of the office for a bit and haven't caught up yet.
Apologies if I've missed the discussion.

Are there plans for a 2.0.26 release?  We're probably a few weeks out

How about cutting the release next Monday?

Andreas


from starting our next 1.x and 2.x releases on Tika, and it would be
great to incorporate 2.0.26.  No problem at all if 2.0.26 is slated
for later.

Thank you!

Cheers,

    Tim

On Fri, Mar 4, 2022 at 10:46 PM Tilman Hausherr 
 wrote:

Am 24.02.2022 um 07:41 schrieb Andreas Lehmkuehler:

Am 22.02.22 um 07:49 schrieb Andreas Lehmkuehler:

Hi,

I'm planning to cut a new JBIG2 release next week. There aren't 
that

much changes but I think the fixes are worth to be released. [1]

I'm going to cut the release next weekend, if nobody objects.

Once it is done we should think about a 2.0.26 release of PDFBox


Yes please!

Tilman


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: 2.0.26 release? WAS: JBIG2 3.0.4 release?

2022-03-21 Thread Andreas Lehmkuehler



Am 21.03.22 um 12:21 schrieb Tim Allison:

I'm happy to run the tests today if that would be of any interest.

Yes, please.

TIA
Andreas




On Sun, Mar 20, 2022 at 5:01 PM Andreas Lehmkuehler  wrote:


Am 13.03.22 um 14:20 schrieb Tim Allison:

  From Tika's perspective, there's no rush. We're waiting for a bug fix
in POI (TIKA-3699).

Please let me know if/when I should run the regression tests.

Thanks for the offer. Do we need to run the tests before cutting the release?

Most of the tickets aren't related to text extraction. Those which are related
should decrease the number of exceptions and increase the accuracy.

WDYT?




Thank you, all!

Cheers,

  Tim

On Sat, Mar 12, 2022 at 5:29 AM Andreas Lehmkuehler  wrote:


Am 11.03.22 um 08:30 schrieb Tilman Hausherr:

Am 11.03.2022 um 08:19 schrieb Andreas Lehmkuehler:

Am 10.03.22 um 20:16 schrieb Tilman Hausherr:

I'd agree but that might mean PDFBOX-5384 wouldn't be fixed.

It's there for quite some time and it seems to be a seldom corner case. IMHO
it can wait if we won't find a solution before Monday.


No, that one was created on March 2nd. Oliver has just posted a suggestion so
maybe that is a solution.

The ticket is quite new, but the issue itself was introduced in 2018 with
2.0.12. ;-)

However, I'll have a look at the proposed solution.

Andreas


Tilman




WDYT?

Andreas



Tilman

Am 10.03.2022 um 19:05 schrieb Andreas Lehmkuehler:

Am 09.03.22 um 17:07 schrieb Tim Allison:

All,

I've been out of the office for a bit and haven't caught up yet.
Apologies if I've missed the discussion.

Are there plans for a 2.0.26 release?  We're probably a few weeks out

How about cutting the release next Monday?

Andreas


from starting our next 1.x and 2.x releases on Tika, and it would be
great to incorporate 2.0.26.  No problem at all if 2.0.26 is slated
for later.

Thank you!

Cheers,

   Tim

On Fri, Mar 4, 2022 at 10:46 PM Tilman Hausherr  wrote:


Am 24.02.2022 um 07:41 schrieb Andreas Lehmkuehler:

Am 22.02.22 um 07:49 schrieb Andreas Lehmkuehler:

Hi,

I'm planning to cut a new JBIG2 release next week. There aren't that
much changes but I think the fixes are worth to be released. [1]

I'm going to cut the release next weekend, if nobody objects.

Once it is done we should think about a 2.0.26 release of PDFBox



Yes please!

Tilman


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: 2.0.26 release? WAS: JBIG2 3.0.4 release?

2022-03-20 Thread Andreas Lehmkuehler

Am 13.03.22 um 14:20 schrieb Tim Allison:

 From Tika's perspective, there's no rush. We're waiting for a bug fix
in POI (TIKA-3699).

Please let me know if/when I should run the regression tests.

Thanks for the offer. Do we need to run the tests before cutting the release?

Most of the tickets aren't related to text extraction. Those which are related 
should decrease the number of exceptions and increase the accuracy.


WDYT?




Thank you, all!

Cheers,

 Tim

On Sat, Mar 12, 2022 at 5:29 AM Andreas Lehmkuehler  wrote:


Am 11.03.22 um 08:30 schrieb Tilman Hausherr:

Am 11.03.2022 um 08:19 schrieb Andreas Lehmkuehler:

Am 10.03.22 um 20:16 schrieb Tilman Hausherr:

I'd agree but that might mean PDFBOX-5384 wouldn't be fixed.

It's there for quite some time and it seems to be a seldom corner case. IMHO
it can wait if we won't find a solution before Monday.


No, that one was created on March 2nd. Oliver has just posted a suggestion so
maybe that is a solution.

The ticket is quite new, but the issue itself was introduced in 2018 with
2.0.12. ;-)

However, I'll have a look at the proposed solution.

Andreas


Tilman




WDYT?

Andreas



Tilman

Am 10.03.2022 um 19:05 schrieb Andreas Lehmkuehler:

Am 09.03.22 um 17:07 schrieb Tim Allison:

All,

I've been out of the office for a bit and haven't caught up yet.
Apologies if I've missed the discussion.

Are there plans for a 2.0.26 release?  We're probably a few weeks out

How about cutting the release next Monday?

Andreas


from starting our next 1.x and 2.x releases on Tika, and it would be
great to incorporate 2.0.26.  No problem at all if 2.0.26 is slated
for later.

Thank you!

Cheers,

  Tim

On Fri, Mar 4, 2022 at 10:46 PM Tilman Hausherr  wrote:


Am 24.02.2022 um 07:41 schrieb Andreas Lehmkuehler:

Am 22.02.22 um 07:49 schrieb Andreas Lehmkuehler:

Hi,

I'm planning to cut a new JBIG2 release next week. There aren't that
much changes but I think the fixes are worth to be released. [1]

I'm going to cut the release next weekend, if nobody objects.

Once it is done we should think about a 2.0.26 release of PDFBox



Yes please!

Tilman


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: 2.0.26 release? WAS: JBIG2 3.0.4 release?

2022-03-13 Thread Andreas Lehmkuehler
Due to a possible issue in ToUnicodeWriter.writeTo (see dev@) I'm going to 
postpone the release for a week. I'd like to have a look at the issue and the 
proposed solution first. IMHO we should solve that issue ASAP to ensure that 
pdfs created with PDFBox follow the specs.


Andreas

Am 10.03.22 um 19:05 schrieb Andreas Lehmkuehler:

Am 09.03.22 um 17:07 schrieb Tim Allison:

All,

I've been out of the office for a bit and haven't caught up yet.
Apologies if I've missed the discussion.

Are there plans for a 2.0.26 release?  We're probably a few weeks out

How about cutting the release next Monday?

Andreas


from starting our next 1.x and 2.x releases on Tika, and it would be
great to incorporate 2.0.26.  No problem at all if 2.0.26 is slated
for later.

Thank you!

Cheers,

 Tim

On Fri, Mar 4, 2022 at 10:46 PM Tilman Hausherr  wrote:


Am 24.02.2022 um 07:41 schrieb Andreas Lehmkuehler:

Am 22.02.22 um 07:49 schrieb Andreas Lehmkuehler:

Hi,

I'm planning to cut a new JBIG2 release next week. There aren't that
much changes but I think the fixes are worth to be released. [1]

I'm going to cut the release next weekend, if nobody objects.

Once it is done we should think about a 2.0.26 release of PDFBox



Yes please!

Tilman


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: COSBase, avoid to have the same hashCode for different objects holding the same value

2022-03-12 Thread Andreas Lehmkuehler

Hi,

@Mel good to see you are still around

Am 07.03.22 um 14:55 schrieb Martinez, Mel - 0441 - MITLL:

I’m mainly a lurker on PDFBox these days, as I have moved to focus on other 
things (though ironically my latest tasking does indirectly make use of PDFBox 
again!) and I am not familiar with the details of the code but I would offer 
the following advice on this:

If two objects are:

a) the same type and
b) have the same value (i.e., evaluate:  obj1.equals(obj2) == true ) and
c) are immutable (i.e., their value cannot be changed once constructed)

Then they should have the same hashcode.

Conceptually such objects represent the same exact, immutable, information.  
This is why two String objects that both hold the same character sequence have 
the same hashcode.   These sort of immutable objects are considered 
interchangeable when they have the same data value.  Code execution is exactly 
the same regardless of which object instance you use for a given sequence of 
code.  Numbers such as Integers, Longs, Floats and Doubles and Booleans also 
all represent immutable information and the same rules apply.  The number “5” 
is informationally identical throughout the universe and indeed all references 
to “5” are really references to the same immutable information.

A huge advantage of treating such equal-value, same-type objects 
interchangeably (by giving them identical hashcodes) is that they can be used 
with Object Pooling to reduce memory and improve performance.

If the object types are not immutable, however — i.e., if it is possible for 
their values to be modified (such as with setters or other mutators) then 
whether they should have the same hashcode depends on how they are used.  Do 
they have other data fields that are not being taken into consideration when 
the hash is calculated?   Do they have transient fields that are not maintained 
across serialization?

Hashcodes usually (not always) should be persistent through the life cycle of 
the object.  If you put an object in a hashmap, the internal bucket it gets 
dropped into will be based on the hash and you (normally) don’t want that 
changing while the object reference is stored in the map.

I can not recall enough detail about the PDFBox codebase or the COS 
wrappers in particular to be able to assert how these points apply so I am just 
offering these concepts here for folks to keep in mind.
I'm afraid it is more complicated than that and I myself didn't think about all 
pitfalls.


@Mel It is correct that immutable objects sharing the same value and type should 
be equal and have the same hashcode at least from a programmers point of view.


Those simple COSBase-objects such as COSBoolean, COSInteger, COSFloat and 
COSName seem to match those characteristics. But the base class COSBase has to 
changeable attributes ("key" and "direct") so that instances of those classes 
aren't immutable and don't qualify for your definition.


But after reading yours and Maruans answer I rethink my proposal and withdraw 
it. The whole thing isn't that easy. I guess we have to overhaul to whole 
direct/indirect stuff and some more. And maybe we will find a more elegant way 
to represent COS-objects, so that we don't have think about equals and hashCode 
anymore.


Looks like there are still a lot interesting challenges left ;-)

Andreas



I’m sure you guys will make the right design decision.   And thanks a ton for 
all the work you guys have done over the years.

Mel

Dr. Mel Martinez
m.marti...@ll.mit.edu<mailto:m.marti...@ll.mit.edu>


On Mar 5, 2022, at 10:30 AM, Andreas Lehmkuehler 
mailto:andr...@lehmi.de>> wrote:

Hi,

I'm not sure if we dicussed that topic in the past or if I simply mixed it up with a discussion 
about "equals" and "="

However, PDFBOX-5286 shows the we have an issue with objects which aren't the 
same but are treated as the same because of the same hash. This is true for all 
simple objects such as COSInteger, COSFLoat, COSBoolean and COSName.

Think about the following two indirect /Length objects

100 0 obj
512
endobj


200 0 obj
512
endobj

* there two different COSObjects "100 0" and "200 0"
* both COSObjects have different hashes
* both COSObjects are referencing a COSInteger holding the same value "512"
* both COSIntegers are different objects
* both COSIntegers have the SAME hash, as the current implementation of 
hashCode is based on the value of the COSInteger

Or some pseudo code

COSObject(100,0) != COSObject(200,0)
COSInteger(100,0) != COSInteger(200,0)
COSObject(100,0).hashCode != COSObject(200,0).hashCode
COSInteger(100,0).hashCode == COSInteger(200,0).hashCode
COSInteger(100,0).equals(COSInteger(200,0) == true

IMHO we should change the implementation of hashCode so that different objects 
will have different hashCodes.

I expect some side effects
* we are using a lot of hash-based collections and I'm afraid there may be 

Re: 2.0.26 release? WAS: JBIG2 3.0.4 release?

2022-03-12 Thread Andreas Lehmkuehler

Am 11.03.22 um 08:30 schrieb Tilman Hausherr:

Am 11.03.2022 um 08:19 schrieb Andreas Lehmkuehler:

Am 10.03.22 um 20:16 schrieb Tilman Hausherr:

I'd agree but that might mean PDFBOX-5384 wouldn't be fixed.
It's there for quite some time and it seems to be a seldom corner case. IMHO 
it can wait if we won't find a solution before Monday.


No, that one was created on March 2nd. Oliver has just posted a suggestion so 
maybe that is a solution.
The ticket is quite new, but the issue itself was introduced in 2018 with 
2.0.12. ;-)


However, I'll have a look at the proposed solution.

Andreas


Tilman




WDYT?

Andreas



Tilman

Am 10.03.2022 um 19:05 schrieb Andreas Lehmkuehler:

Am 09.03.22 um 17:07 schrieb Tim Allison:

All,

I've been out of the office for a bit and haven't caught up yet.
Apologies if I've missed the discussion.

Are there plans for a 2.0.26 release?  We're probably a few weeks out

How about cutting the release next Monday?

Andreas


from starting our next 1.x and 2.x releases on Tika, and it would be
great to incorporate 2.0.26.  No problem at all if 2.0.26 is slated
for later.

Thank you!

Cheers,

 Tim

On Fri, Mar 4, 2022 at 10:46 PM Tilman Hausherr  wrote:


Am 24.02.2022 um 07:41 schrieb Andreas Lehmkuehler:

Am 22.02.22 um 07:49 schrieb Andreas Lehmkuehler:

Hi,

I'm planning to cut a new JBIG2 release next week. There aren't that
much changes but I think the fixes are worth to be released. [1]

I'm going to cut the release next weekend, if nobody objects.

Once it is done we should think about a 2.0.26 release of PDFBox



Yes please!

Tilman


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Suspected bug in and proposed fix for ToUnicodeWriter.writeTo

2022-03-12 Thread Andreas Lehmkuehler

Hi,

Am 11.03.22 um 21:49 schrieb Ryan Jackson:

Dear Apache Devs:

I believe that I have identified a bug in the creation of the
(begin/end)bfrange operator used when embedding fonts with the
PDCIDFontType2Embedder class.

The bug exists (as best I can tell) in both the main trunk and in the 2.0
branch. The code in question may be found here
.
The portion of the PDF specification (version 1.7) that bears upon this
code is Section 5.9, Example 5.16.

The existing code attempts to limit the range logic to changes less than or
equal to 255 code points, but it fails to account for at least the
following situation by allowing this (for example):

[srcCode1 srcCode2 dstString]
03FF 0400 0036

The overflow between srcCode1 and srcCode2 is not allowed by the
specification and any text extraction will fail. The glyphs themselves
render fine so it is not immediately obvious there is a problem until one
tries to examine the text by using the Content Panel or by copy/pasting
from Acrobat (Pro) to some other document. By contrast the following
bfrange operator does allow the text extraction to work as intended:

[srcCode1 srcCode2 dstString]
03FE 03FF 0035

Notice that no overflow exists, and as such the requirements of the
specification are met.

I'm afraid you are right, good catch.


I've looked briefly at the PDFBOX project in Jira and have found the
following tickets that may be caused by this same problem:

PDFBOX-4785 
PDFBOX-5350 
Yes, somehow. Those are about reading malformed pdfs containing the very same 
issue your have described above.
Fun fact: we are complaining about other pdf writers not following the spec and 
are doing the very same: I never came up with the idea to check our own code :-(



I have put together a proposed solution here
 in my fork of the PDFBox
GH mirror. With your permission I'd like to open a new Jira ticket for this
and collaborate with whomever would like to help drive this work to get it
reviewed and merged. I do have some open questions about how surrogates are
to be handled. I'm also open to changes in the proposed code.
You don't have to wait for permission. Please create a JIRA ticket including a 
link to you PR




Thank you for your time.

Thanks for you time and the proposed solution.

Andreas



Sincerely,

Ryan Jackson
Senior Software Engineer
Workiva Inc.




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: 2.0.26 release? WAS: JBIG2 3.0.4 release?

2022-03-10 Thread Andreas Lehmkuehler

Am 10.03.22 um 20:16 schrieb Tilman Hausherr:

I'd agree but that might mean PDFBOX-5384 wouldn't be fixed.
It's there for quite some time and it seems to be a seldom corner case. IMHO it 
can wait if we won't find a solution before Monday.


WDYT?

Andreas



Tilman

Am 10.03.2022 um 19:05 schrieb Andreas Lehmkuehler:

Am 09.03.22 um 17:07 schrieb Tim Allison:

All,

I've been out of the office for a bit and haven't caught up yet.
Apologies if I've missed the discussion.

Are there plans for a 2.0.26 release?  We're probably a few weeks out

How about cutting the release next Monday?

Andreas


from starting our next 1.x and 2.x releases on Tika, and it would be
great to incorporate 2.0.26.  No problem at all if 2.0.26 is slated
for later.

Thank you!

Cheers,

 Tim

On Fri, Mar 4, 2022 at 10:46 PM Tilman Hausherr  wrote:


Am 24.02.2022 um 07:41 schrieb Andreas Lehmkuehler:

Am 22.02.22 um 07:49 schrieb Andreas Lehmkuehler:

Hi,

I'm planning to cut a new JBIG2 release next week. There aren't that
much changes but I think the fixes are worth to be released. [1]

I'm going to cut the release next weekend, if nobody objects.

Once it is done we should think about a 2.0.26 release of PDFBox



Yes please!

Tilman


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: 2.0.26 release? WAS: JBIG2 3.0.4 release?

2022-03-10 Thread Andreas Lehmkuehler

Am 09.03.22 um 17:07 schrieb Tim Allison:

All,

I've been out of the office for a bit and haven't caught up yet.
Apologies if I've missed the discussion.

Are there plans for a 2.0.26 release?  We're probably a few weeks out

How about cutting the release next Monday?

Andreas


from starting our next 1.x and 2.x releases on Tika, and it would be
great to incorporate 2.0.26.  No problem at all if 2.0.26 is slated
for later.

Thank you!

Cheers,

 Tim

On Fri, Mar 4, 2022 at 10:46 PM Tilman Hausherr  wrote:


Am 24.02.2022 um 07:41 schrieb Andreas Lehmkuehler:

Am 22.02.22 um 07:49 schrieb Andreas Lehmkuehler:

Hi,

I'm planning to cut a new JBIG2 release next week. There aren't that
much changes but I think the fixes are worth to be released. [1]

I'm going to cut the release next weekend, if nobody objects.

Once it is done we should think about a 2.0.26 release of PDFBox



Yes please!

Tilman


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



COSBase, avoid to have the same hashCode for different objects holding the same value

2022-03-05 Thread Andreas Lehmkuehler

Hi,

I'm not sure if we dicussed that topic in the past or if I simply mixed it up 
with a discussion about "equals" and "="


However, PDFBOX-5286 shows the we have an issue with objects which aren't the 
same but are treated as the same because of the same hash. This is true for all 
simple objects such as COSInteger, COSFLoat, COSBoolean and COSName.


Think about the following two indirect /Length objects

100 0 obj
512
endobj


200 0 obj
512
endobj

* there two different COSObjects "100 0" and "200 0"
* both COSObjects have different hashes
* both COSObjects are referencing a COSInteger holding the same value "512"
* both COSIntegers are different objects
* both COSIntegers have the SAME hash, as the current implementation of hashCode 
is based on the value of the COSInteger


Or some pseudo code

COSObject(100,0) != COSObject(200,0)
COSInteger(100,0) != COSInteger(200,0)
COSObject(100,0).hashCode != COSObject(200,0).hashCode
COSInteger(100,0).hashCode == COSInteger(200,0).hashCode
COSInteger(100,0).equals(COSInteger(200,0) == true

IMHO we should change the implementation of hashCode so that different objects 
will have different hashCodes.


I expect some side effects
* we are using a lot of hash-based collections and I'm afraid there may be some 
cases where the fact of having the same hash for different objects is wanted 
(knowingly or not)
* we have to remove the static instances for COSInteger values in a range from 
-100 to 256 which will result in an increased number of COSInteger instances
* there are just two static instances of COSBoolean ("true" and "false") which 
have to be replaced too
* COSName is caching a lot of values as static instances as well, which should 
be removed as well

* looks like COSFloat shouldn't be a problem

WDYT? Should we simply start with COSFloat and COSInteger and see how it ends 
up?

Andreas










-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[ANNOUNCE] Apache PDFBox JBIG2 ImageIO plugin 3.0.4 released

2022-03-01 Thread Andreas Lehmkuehler

The Apache PDFBox community is pleased to announce the release of
Apache PDFBox JBIG2 ImageIO plugin version 3.0.4. The release is
available for download at:

https://pdfbox.apache.org/download.cgi

See the full release notes below for details about this release.

Release Notes -- Apache JBIG2 ImageIO -- Version 3.0.4

Introduction


The Java ImageIO plugin for JBIG2 enables access to images encoded using the 
JBIG2
image compression standard. This component is part of the Apache PDFBox® 
project.

This is an incremental bugfix release based on the earlier 3.0.3 release.

For more details on all fixes and improvements included in this release, please 
refer

to the following issues on the PDFBox issue tracker at

https://issues.apache.org/jira/browse/PDFBOX.

Bug

[PDFBOX-4671] - NoClassDefFoundError: Could not initialize class 
org.apache.pdfbox.jbig2.JBIG2ImageReader

[PDFBOX-5242] - LoggerBridge loading under the wrong class loader

Improvement

[PDFBOX-5220] - Optimizations for Bitmaps

Release Contents


This release consists of a single source archive packaged as a zip file.
The archive can be unpacked with the jar tool from your JDK installation.
See the README.md file for instructions on how to build this release.

The source archive is accompanied by SHA512 checksums and a PGP signature
that you can use to verify the authenticity of your download.
The public key used for the PGP signature can be found at
https://svn.apache.org/repos/asf/pdfbox/KEYS.

About Apache PDFBox
---

Apache PDFBox is an open source Java library for working with PDF documents.
This project allows creation of new PDF documents, manipulation of existing
documents and the ability to extract content from documents. Apache PDFBox
also includes several command line utilities. Apache PDFBox is published
under the Apache License, Version 2.0.

For more information, visit https://pdfbox.apache.org/

About The Apache Software Foundation


Established in 1999, The Apache Software Foundation provides organizational,
legal, and financial support for more than 100 freely-available,
collaboratively-developed Open Source projects. The pragmatic Apache License
enables individual and commercial users to easily deploy Apache software;
the Foundation's intellectual property framework limits the legal exposure
of its 2,500+ contributors.

For more information, visit https://www.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[RESULT][VOTE] Release Apache PDFBox JBIG2 ImageIO 3.0.4

2022-03-01 Thread Andreas Lehmkuehler

Am 26.02.22 um 16:38 schrieb Andreas Lehmkuehler:

Please vote on releasing this package as Apache PDFBox JBIG2 ImageIO 3.0.4.


   +1 Tilman Hausherr
   +1 Maruan Sahyoun
   +1 Andreas Lehmkühler

Thanks for your support and help!! I'm going to push the release out.

Andreas


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox JBIG2 ImageIO 3.0.4

2022-02-28 Thread Andreas Lehmkuehler
Just a friendly reminder. Is there anyone who can spare some cycles to check the 
release? There are round about 20 hours left


Thanks in advance
Andreas


Am 26.02.22 um 16:38 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the Apache PDFBox JBIG2 ImageIO 3.0.4 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/jbig2-imageio/3.0.4/

The release candidate is a zip archive of the sources in:

     https://github.com/apache/pdfbox-jbig2/tree/3.0.4/

The SHA-512 checksum of the archive is 
382acb53e0bb56595f7eb8c382369a48a000ced22ff4d101ec89316c749b5afd344c6303a3e6c75b12e949f1efe688e18bd1b8b0b5deb449a581b1c97c35e672. 



Please vote on releasing this package as Apache PDFBox JBIG2 ImageIO 3.0.4.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox JBIG2 ImageIO 3.0.4
     [ ] -1 Do not release this package because...

Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[VOTE] Release Apache PDFBox JBIG2 ImageIO 3.0.4

2022-02-26 Thread Andreas Lehmkuehler

Hi,

a candidate for the Apache PDFBox JBIG2 ImageIO 3.0.4 release is available at:

https://dist.apache.org/repos/dist/dev/pdfbox/jbig2-imageio/3.0.4/

The release candidate is a zip archive of the sources in:

https://github.com/apache/pdfbox-jbig2/tree/3.0.4/

The SHA-512 checksum of the archive is 
382acb53e0bb56595f7eb8c382369a48a000ced22ff4d101ec89316c749b5afd344c6303a3e6c75b12e949f1efe688e18bd1b8b0b5deb449a581b1c97c35e672.


Please vote on releasing this package as Apache PDFBox JBIG2 ImageIO 3.0.4.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

[ ] +1 Release this package as Apache PDFBox JBIG2 ImageIO 3.0.4
[ ] -1 Do not release this package because...

Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: JBIG2 3.0.4 release?

2022-02-23 Thread Andreas Lehmkuehler

Am 22.02.22 um 07:49 schrieb Andreas Lehmkuehler:

Hi,

I'm planning to cut a new JBIG2 release next week. There aren't that much 
changes but I think the fixes are worth to be released. [1]

I'm going to cut the release next weekend, if nobody objects.

Once it is done we should think about a 2.0.26 release of PDFBox

Andreas



WDYT?

Andreas

[1] 
https://issues.apache.org/jira/issues/?jql=project%20%3D%2012310760%20AND%20fixVersion%20%3D%2012346618%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC 



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



JBIG2 3.0.4 release?

2022-02-21 Thread Andreas Lehmkuehler

Hi,

I'm planning to cut a new JBIG2 release next week. There aren't that much 
changes but I think the fixes are worth to be released. [1]


WDYT?

Andreas

[1] 
https://issues.apache.org/jira/issues/?jql=project%20%3D%2012310760%20AND%20fixVersion%20%3D%2012346618%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Apache PDFBox Board Report January 2022 due

2022-01-11 Thread Andreas Lehmkuehler

Hi,

thanks for the feedback. I've submitted the report as proposed.

Andreas


Am 09.01.22 um 14:19 schrieb Andreas Lehmkuehler:

Hi,

find attached a quick draft of the board report we're expected to submit this
month. It's based upon the report wizard template which can be found at [1]

Any comments or additions are appreciated ...



## Description:
The mission of PDFBox is the creation and maintenance of software related to
Java library for working with PDF documents

## Issues:
There are no issues requiring board attention at this time.

## Membership Data:
Apache PDFBox was founded 2009-10-21 (12 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

     2.0.25 was released on 2021-12-16.
     3.0.0-alpha2 was released on 2021-09-10.
     2.0.24 was released on 2021-06-10.

## Community Health:
- there is a steady stream of contributions, bug reports and questions on the
   mailing lists
- there are a lot of refactorings, improvements and bugfixes
- we are working on finalizing 3.0.0 and released another alpha version
- PDFBox isn't affected by the log42j vulnerablility as we are using commons
   logging and don't ship any logging library
- Maruan activated GitHub CodeQL scans for our codebase



Andreas

[1] https://reporter.apache.org/wizard/?pdfbox

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[REPORT] PDFBox - January 2022

2022-01-11 Thread Andreas Lehmkuehler

## Description:
The mission of PDFBox is the creation and maintenance of software related to 
Java library for working with PDF documents


## Issues:
There are no issues requiring board attention at this time.
## Membership Data:
Apache PDFBox was founded 2009-10-21 (12 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

2.0.25 was released on 2021-12-16.
3.0.0-alpha2 was released on 2021-09-10.
2.0.24 was released on 2021-06-10.

## Community Health:
- there is a steady stream of contributions, bug reports and questions on the
  mailing lists
- there are a lot of refactorings, improvements and bugfixes
- we are working on finalizing 3.0.0 and released another alpha version
- PDFBox isn't affected by the log42j vulnerablility as we are using commons
  logging and don't ship any logging library
- Maruan activated GitHub CodeQL scans for our codebase

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Apache PDFBox Board Report January 2022 due

2022-01-09 Thread Andreas Lehmkuehler

Hi,

find attached a quick draft of the board report we're expected to submit this
month. It's based upon the report wizard template which can be found at [1]

Any comments or additions are appreciated ...



## Description:
The mission of PDFBox is the creation and maintenance of software related to
Java library for working with PDF documents

## Issues:
There are no issues requiring board attention at this time.

## Membership Data:
Apache PDFBox was founded 2009-10-21 (12 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

2.0.25 was released on 2021-12-16.
3.0.0-alpha2 was released on 2021-09-10.
2.0.24 was released on 2021-06-10.

## Community Health:
- there is a steady stream of contributions, bug reports and questions on the
  mailing lists
- there are a lot of refactorings, improvements and bugfixes
- we are working on finalizing 3.0.0 and released another alpha version
- PDFBox isn't affected by the log42j vulnerablility as we are using commons
  logging and don't ship any logging library
- Maruan activated GitHub CodeQL scans for our codebase



Andreas

[1] https://reporter.apache.org/wizard/?pdfbox

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



2.0.25 javadocs

2021-12-20 Thread Andreas Lehmkuehler

Hi,

I've missed to create and upload the javadocs for 2.0.25 to maven central. :-(

I've put a reminder on my checklist. Please double check if javadocs are present 
next time we prepare a new release.


Thanks in advance!

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.25

2021-12-16 Thread Andreas Lehmkuehler

Am 15.12.21 um 17:39 schrieb Tilman Hausherr:

Am 15.12.2021 um 12:31 schrieb Tilman Hausherr:

I ran Tim's regression tests and here are the results:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.24_vs_2.0.25.tar.xz
Thanks Tilman, especially as I totally forgot to ask if someone has the time to 
run those tests.




I have not yet investigated whether there are regressions. We have a 57% 
increase in tokens and 8% increase in common tokens. Probably thanks to

https://issues.apache.org/jira/browse/PDFBOX-5324
https://issues.apache.org/jira/browse/PDFBOX-5331



We have lots of files that extract trash. Some "common tokens" are "lost" 
because the "new trash" is connected to the token.
I've expected such results. Most likely changes like the above mentioned ones 
have two side of a coin. There are a lot of improvements but more or less "false 
positives" as well. IMHO that's ok.



There is one real regression, that's the file

bug_trackers/poppler/poppler-89422-0.pdf

https://bugs.freedesktop.org/show_bug.cgi?id=89422

but this isn't a real world file so it isn't THAT important. I tried to fix it 
but failed for now.

I agree, it is a corner case and already fixed :-)



Tilman





Tilman


Am 13.12.2021 um 20:02 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 2.0.25 release is available at:

    https://dist.apache.org/repos/dist/dev/pdfbox/2.0.25/

The release candidate is a zip archive of the sources in:

    http://svn.apache.org/repos/asf/pdfbox/tags/2.0.25/

The SHA-512 checksum of the archive is 
e143b2a9aaa4b1f1be72e16a1c9968dacfcb3e89b4f21fdbd0580d8c9f1c9b54ee38d05fe3e52ff93493c858c51090fdd8256d22153cffba1e9b523fdbd1f2f4. 



Please vote on releasing this package as Apache PDFBox 2.0.25.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

    [ ] +1 Release this package as Apache PDFBox 2.0.25
    [ ] -1 Do not release this package because...

Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



  1   2   3   4   5   6   7   8   9   10   >