On 22.03.2025 20:51, Niranjan Rao wrote:
Greetings,


Having glanced through PDF specifications I understand kind of challenges PDFBox team faces. Scaling/transformation and matrix manipulations can drive anyone crazy.  PDF debugger tool is another valuable tool that I found using more and more, its extremely useful tool.

My use case is to extract text from PDF, but we're very picky about what we want to read and ideally we don't want to scan whole PDF as we can usually figure out on which page our changes will be. I found there are some limitations based on current text extraction logic and ended up copy/pasting the class and modifying it.

I can fork the repository and submit my pull requests if team is willing to accept the PRs. Most of the changes, so far, are making methods accessible or wrapping them in other function calls while leaving core concept same. I'm willing to discuss my need and see if there is better or already supported way.

Will PDFBox team be open to PRs and/or discussion? If so, what will be the process? I'm working in corporate environment and managed to get approval that our organization will be ok about submitting the changes even before broaching the subject here.


Hi,

We're always open to discussions. We try to avoid "making everything accessible" because this brings the risk that people come up with weird ideas with unexpected side effects and this also prevents making some changes under the hood in the future. Sometimes we also add things based on users requests. We're not fully on gitbub, only as read-only mirror. You can still create PRs. The best would be to discuss your ideas first, try to break them in small parts.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to