RE: Reading page using PDFTextStripper

2020-11-25 Thread Hesham Gneady
. -Original Message- From: Hesham Gneady Sent: Saturday, November 21, 2020 11:11 PM To: users@pdfbox.apache.org Subject: RE: Reading page using PDFTextStripper CAUTION: [EXTERNAL] I've tried it now, but it made no difference. I've actually explained the problem wrong, here's what actually

RE: Reading page using PDFTextStripper

2020-11-25 Thread Hesham Gneady
Thanks Maruan! Are you sure you're reading the PDF using PDFTextStripper.processTextPosition()? Because I've tried it again now using PDFBox 2.0.21 and I'm getting different results even when setting setSortByPosition( true ): Comments are made from 1905, / See: Certain Neurotic

RE: Reading page using PDFTextStripper

2020-11-23 Thread James Kelly
rue; } } The sort routine is pretty simple: internal void Sort() { Words = Words.OrderBy(w => w.BoundingBox.Left).ToList(); } -Original Message- From: sahy...@fileaffairs.de Sent: Monday, November 23, 2020 8:59 AM To: users@pdfbo

Re: Reading page using PDFTextStripper

2020-11-23 Thread sahy...@fileaffairs.de
Hi, Am Sonntag, den 22.11.2020, 07:10 +0200 schrieb Hesham Gneady: > I've tried it now, but it made no difference. I've actually explained > the > problem wrong, here's what actually happens: > > The 1st line in the PDF file is: > > 131 Comments are made from 1905, / See: Certain Neurotic

RE: Reading page using PDFTextStripper

2020-11-23 Thread James Kelly
the words in a line by x coordinates. I'm not sure if boss will allow me share some code snippits, but I'll ask. -Original Message- From: Hesham Gneady Sent: Saturday, November 21, 2020 11:11 PM To: users@pdfbox.apache.org Subject: RE: Reading page using PDFTextStripper CAUTION

RE: Reading page using PDFTextStripper

2020-11-21 Thread Hesham Gneady
I've tried it now, but it made no difference. I've actually explained the problem wrong, here's what actually happens: The 1st line in the PDF file is: 131 Comments are made from 1905, / See: Certain Neurotic Mechanisms in Where "131" is normal text, while the rest of the line has "Subscript"

Re: Reading page using PDFTextStripper

2020-11-21 Thread Andreas Lehmkuehler
Am 17.11.20 um 07:54 schrieb Hesham Gneady: Hi, I am trying to read this PDF file using PDFTextStripper.processTextPosition(): https://dl.dropboxusercontent.com/s/o660xrp4sgp9tbv/PDFTextStripper%20readin g%20sample.pdf?dl=0 But when I do that it reads it with wrong order. It reads

Reading page using PDFTextStripper

2020-11-16 Thread Hesham Gneady
Hi, I am trying to read this PDF file using PDFTextStripper.processTextPosition(): https://dl.dropboxusercontent.com/s/o660xrp4sgp9tbv/PDFTextStripper%20readin g%20sample.pdf?dl=0 But when I do that it reads it with wrong order. It reads the 2nd line before the 1st line because the 1st