Hi,

Thank you, the change has been committed.
re 1: we'll see what happens... re "but it is code that needs to be maintained" - that is a general problem. Sometimes it's even difficult to maintain ones own code. re 2: No because most of the time, the faster built-in sort works fine. The slower mergesort is only used when the exception is thrown. This happens twice during build tests, of over 100 text extractions.

Tilman

On 16.12.2024 15:55, Kevin Day wrote:
I am attaching the patch file.

And yes, this patch is simply PDFBOX-3774 as an option, a small cosmetic change to use idiomatic Java for PDFBOX-5487, and a unit test that demonstrates the overlapping.


A couple of additional thoughts:

1.  I feel that PDFBOX-5487 isn't doing very much.  The PDFBOX-3774 feature will address the problem fixed by PDFBOX-5487, and the "problem" of having a space glyph entirely within the previous character is a very restricted edge-case.  In the end, the performance hit is not a big deal, but it is code that needs to be maintained.  I thought I'd mention it in case the PDFBOX-5487 requester would be happy with PDFBOX-3774 as a solution.

2.  I noticed that there is a note about JDK7+ sorting requiring transitive comparators.  Given that the build requires JDK8+, I wonder if it is time to remove the Collections.sort path (and get rid of an exception throw, etc...)?

- K



On Mon, Dec 16, 2024 at 6:21 AM Tilman Hausherr <thaush...@t-online.de> wrote:

    On 16.12.2024 14:02, Kevin Day wrote:
    > I just realized that there is an incorrect note in the getter/setter
    > Javadocs about the setting only taking effect if sorting is enabled.
    >
    > That note can be removed. The new setting is valid regardless of
    whether
    > sorting is enabled.

    Hi,

    Could you please resend the patch as text attachment? Somehow the
    mail
    program messed this up.

     From what I understand, the patch is the suggestion from
    PDFBOX-3774but
    as an option, plus a test. The other change (re PDFBOX-5487) is a
    (useful) cosmetic change. I wonder why I missed that when I
    committed it.

    Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail:users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail:users-h...@pdfbox.apache.org

Reply via email to