Re: installing and running PDFBox within Python

2022-10-02 Thread Peter Murray-Rust
Many thanks, Tres and Tilman, (I am also delighted to see that PDFBox has integrated picocli - Remko Popma has been very helpful to us.). FWIW we (#semanticClimate) are converting the 10,000 pp of the IPCC's report on ClimateChange - one of the most important PDFs on the planet - to semantic form;

Re: installing and running PDFBox within Python

2022-10-02 Thread Tilman Hausherr
On 26.09.2022 10:53, Peter Murray-Rust wrote: * Does PDFBox3 have more functionality than PDFBox2 that would help? I don't think so, the main thing is the on demand parser. The rest are API changes and memory management changes. See also https://pdfbox.apache.org/3.0/migration.html Tilman

Re: installing and running PDFBox within Python

2022-09-26 Thread Tres Finocchiaro
Hi, On Mon, Sep 26, 2022 at 4:53 AM Peter Murray-Rust wrote: > TL;DR how to integrate PDFBox into a Python framework for installation and > use by non-computer-scientists? > I did a bit of digging and found the jpype project, which seems to work quite well for basic python->java functionality.

installing and running PDFBox within Python

2022-09-26 Thread Peter Murray-Rust
TL;DR how to integrate PDFBox into a Python framework for installation and use by non-computer-scientists? I have used PDFBox for at least 10 years and love it and the community. I use it to make (scientific) PDF's semantic, by trapping the events and saving as SVG, after which I can assemble stru