Greetings,

Having glanced through PDF specifications I understand kind of challenges PDFBox team faces. Scaling/transformation and matrix manipulations can drive anyone crazy.  PDF debugger tool is another valuable tool that I found using more and more, its extremely useful tool.

My use case is to extract text from PDF, but we're very picky about what we want to read and ideally we don't want to scan whole PDF as we can usually figure out on which page our changes will be. I found there are some limitations based on current text extraction logic and ended up copy/pasting the class and modifying it.

I can fork the repository and submit my pull requests if team is willing to accept the PRs. Most of the changes, so far, are making methods accessible or wrapping them in other function calls while leaving core concept same. I'm willing to discuss my need and see if there is better or already supported way.

Will PDFBox team be open to PRs and/or discussion? If so, what will be the process? I'm working in corporate environment and managed to get approval that our organization will be ok about submitting the changes even before broaching the subject here.


Regards,


Niranjan


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to