date:20240628

Re: Separating PDF Content in to Layers (OCGs)

2024-06-28 Thread PDF Developer

Hello, Thanks to all that responded. For anyone with a passing interest here is how things turned out. I ended up splitting the PDF to Text Only and Image Only files. For that I based my code on examples generously provided but other (thanks to mkl -> RemoveImages.java) and (thanks Ben Litchfie

Assistance Requested for Optimizing PDF Processing Pipeline Using PDFBox

2024-06-28 Thread Rohit Kohli

Hello, I hope this message finds you well. I am ROHIT KOHLI, and I am currently working on developing a robust PDF processing pipeline for extracting structured data from system-generated PDF documents, particularly bank statements. We aim to handle and analyze large volumes of data efficiently ha

Re: Assistance Requested for Optimizing PDF Processing Pipeline Using PDFBox

2024-06-28 Thread Tilman Hausherr

1. *Optimizing Data Extraction*: Best practices for configuring PDFBox to extract text and data most efficiently from system-generated PDFs. Any specific configurations or methods that enhance accuracy would be extremely helpful. Depending on the input, you should decide on