Hi,
Thank you, the change has been committed.
re 1: we'll see what happens... re "but it is code that needs to be
maintained" - that is a general problem. Sometimes it's even difficult
to maintain ones own code.
re 2: No because most of the time, the faster built-in sort works fine.
The slower mergesort is only used when the exception is thrown. This
happens twice during build tests, of over 100 text extractions.
Tilman
On 16.12.2024 15:55, Kevin Day wrote:
I am attaching the patch file.
And yes, this patch is simply PDFBOX-3774 as an option, a small
cosmetic change to use idiomatic Java for PDFBOX-5487, and a unit test
that demonstrates the overlapping.
A couple of additional thoughts:
1. I feel that PDFBOX-5487 isn't doing very much. The PDFBOX-3774
feature will address the problem fixed by PDFBOX-5487, and the
"problem" of having a space glyph entirely within the previous
character is a very restricted edge-case. In the end, the performance
hit is not a big deal, but it is code that needs to be maintained. I
thought I'd mention it in case the PDFBOX-5487 requester would be
happy with PDFBOX-3774 as a solution.
2. I noticed that there is a note about JDK7+ sorting
requiring transitive comparators. Given that the build requires
JDK8+, I wonder if it is time to remove the Collections.sort path (and
get rid of an exception throw, etc...)?
- K
On Mon, Dec 16, 2024 at 6:21 AM Tilman Hausherr
<thaush...@t-online.de> wrote:
On 16.12.2024 14:02, Kevin Day wrote:
> I just realized that there is an incorrect note in the getter/setter
> Javadocs about the setting only taking effect if sorting is enabled.
>
> That note can be removed. The new setting is valid regardless of
whether
> sorting is enabled.
Hi,
Could you please resend the patch as text attachment? Somehow the
mail
program messed this up.
From what I understand, the patch is the suggestion from
PDFBOX-3774but
as an option, plus a test. The other change (re PDFBOX-5487) is a
(useful) cosmetic change. I wonder why I missed that when I
committed it.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail:users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail:users-h...@pdfbox.apache.org