Re: Bad text extraction result

2016-03-01 Thread Tilman Hausherr
Am 01.03.2016 um 21:53 schrieb Francisco Andrés Fernández: I'm sorry. That was only the case when you use pdftotext to extract text. My apologize. No problem... now I understand what this /ActualText thing is about. This /Span << /ActualText (\376\377\000\255) >> BDC ( ) Tj EMC

Re: Bad text extraction result

2016-03-01 Thread Francisco Andrés Fernández
I'm sorry. That was only the case when you use pdftotext to extract text. My apologize. Francisco El mar., 1 de mar. de 2016 a la(s) 16:56, Francisco Andrés Fernández < fra...@gmail.com> escribió: > Hi Tilman, regarding this issue, I've found a workaround that does not > solve pdfbox problem

Re: Bad text extraction result

2016-03-01 Thread Francisco Andrés Fernández
Hi Tilman, regarding this issue, I've found a workaround that does not solve pdfbox problem but might help. I've filtered my documents replacing regex '[\xAD]' that is hex for 'soft hyphen', as that seems to be the symbol that gets included between normal characters. After that, texts appears to

Re: memory consumption PDFBox 2.0.0

2016-03-01 Thread Tilman Hausherr
Am 01.03.2016 um 12:35 schrieb Felix Benz-Baldas: Hello, we plan to use PDFBox 2.0.0 for converting PDFs to JPEG. We want to convert a very large number of documents (more than one million). One question: Is it possible to control the memory-consumption? When I start my java program with

Re: ScratchFileBuffer not closed

2016-03-01 Thread Peter Prusinowski
thank you Am 28.02.2016 um 17:38 schrieb Andreas Lehmkuehler: Hi, Am 22.02.2016 um 18:35 schrieb Tilman Hausherr: Am 22.02.2016 um 10:33 schrieb Peter Prusinowski: Hello, I have a method, that prints an image to a document. When calling this method multiple times, I get a lot of debug

Re: Rotating a new annotation to match the page's rotation

2016-03-01 Thread Gilad Denneboom
OK, here's a file that demonstrates the issue. I'm attaching the original as well as the version highlighted using PDFBox and the one I highlighted manually in Acrobat, for comparison purposes. The details of the highlight added in PDFBox are: Rect:[104.73436,751.22327,147.8024,757.2535]

Re: memory consumption PDFBox 2.0.0

2016-03-01 Thread Andreas Lehmkühler
Hi, > Felix Benz-Baldas hat am 1. März 2016 um 12:35 > geschrieben: > > > Hello, > > we plan to use PDFBox 2.0.0 for converting PDFs to JPEG. We want to convert a > very large number of documents (more than one million). > > One question: Is it possible to control