On 22.03.2025 20:51, Niranjan Rao wrote:
Greetings,
Having glanced through PDF specifications I understand kind of
challenges PDFBox team faces. Scaling/transformation and matrix
manipulations can drive anyone crazy. PDF debugger tool is another
valuable tool that I found using more and more, its extremely useful
tool.
My use case is to extract text from PDF, but we're very picky about
what we want to read and ideally we don't want to scan whole PDF as we
can usually figure out on which page our changes will be. I found
there are some limitations based on current text extraction logic and
ended up copy/pasting the class and modifying it.
I can fork the repository and submit my pull requests if team is
willing to accept the PRs. Most of the changes, so far, are making
methods accessible or wrapping them in other function calls while
leaving core concept same. I'm willing to discuss my need and see if
there is better or already supported way.
Will PDFBox team be open to PRs and/or discussion? If so, what will be
the process? I'm working in corporate environment and managed to get
approval that our organization will be ok about submitting the changes
even before broaching the subject here.
Hi,
We're always open to discussions. We try to avoid "making everything
accessible" because this brings the risk that people come up with weird
ideas with unexpected side effects and this also prevents making some
changes under the hood in the future. Sometimes we also add things based
on users requests. We're not fully on gitbub, only as read-only mirror.
You can still create PRs. The best would be to discuss your ideas first,
try to break them in small parts.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org