Hello, I have a question in extracting Texts from PDF file.

2016-05-17 Thread Kay_Lee
Hello, I'm living in South Korea in Far-East Asia and I'm usinig Apache PDFBox in extracting Texts from PDF files. Name: Su-Sang, Lee (English name: Kay Lee) Cell Phone: +82-10-3180-7976 Residence: Seoul, South Korea, Asia E-mail: heruri...@hotmail.com (or heruri...@gmail.com) My software

Re: Overlay 2 files partially

2016-05-17 Thread Romain Guillaume
Hi Tilman, Thank you for your advice. I didn't know PDFDebugger and it's very useful. My problem is less tricky than you imagine because the problematic pdf is always the same and it's just for one document from one software of one company :-) And we must deal with this old software because nobody

Re: After using 'PrinterJob' to print a PDF form, the form fields are empty (PDFBox 2.0.0)

2016-05-17 Thread Maruan Sahyoun
Hi, > Am 17.05.2016 um 17:04 schrieb Timo Rumland : > > Hello, > > after migrating from PDFBox 1.8 to 2.0.0, printing a PDF form via > 'PrinterJob' and ' PDFPrintable ', in which I filled out the form fields via > PDFBox, results in a PDF file (or a physical print)

RE: OCRing extracted inline images vs. fully rendered pages?

2016-05-17 Thread Allison, Timothy B.
>We have an experimental integration with Tesseract which was created a while >ago by a GSoC student. Because it requires >building C++ we’ve not integrated >it into trunk, but do have it on the todo list for 2.1. Ah, very cool. Y, I'd trust you all to do a better job of integrating OCR for

Re: After using 'PrinterJob' to print a PDF form, the form fields are empty (PDFBox 2.0.0)

2016-05-17 Thread John Hewson
> On 17 May 2016, at 08:04, Timo Rumland wrote: > > Hello, > > after migrating from PDFBox 1.8 to 2.0.0, printing a PDF form via > 'PrinterJob' and ' PDFPrintable ', in which I filled out the form fields via > PDFBox, results in a PDF file (or a physical print) with

Re: OCRing extracted inline images vs. fully rendered pages?

2016-05-17 Thread John Hewson
> On 17 May 2016, at 05:25, Allison, Timothy B. wrote: > > All, > On Tika, users can choose to run OCR on inline images (and attached images, > of course). Would it be better for us to render each full page and then run > OCR on that? We have an experimental integration

OCRing extracted inline images vs. fully rendered pages?

2016-05-17 Thread Allison, Timothy B.
All, On Tika, users can choose to run OCR on inline images (and attached images, of course). Would it be better for us to render each full page and then run OCR on that? Best, Tim

Re: Overlay 2 files partially

2016-05-17 Thread Tilman Hausherr
Am 17.05.2016 um 04:20 schrieb Romain Guillaume: Hi everyone, I would like to overlay 2 pdf files but with particular modifications. I know how to overlay 2 pdf but sometimes I need to remove some elements of one of them during overlay operation. For example, imagine an invoice composed with 2

Overlay 2 files partially

2016-05-17 Thread Romain Guillaume
Hi everyone, I would like to overlay 2 pdf files but with particular modifications. I know how to overlay 2 pdf but sometimes I need to remove some elements of one of them during overlay operation. For example, imagine an invoice composed with 2 files: -one is the background page (containing logo