As an addendum, I didn't realize when I sent this out - the numbers are a
combination of regular and superscript, since email won't support it,
mathematical operators it is. The numbers should be
8^5 (INSTEAD OF 85)
9^6 (INSTEAD OF 96)
4^7 (INSTEAD OF 47)
10^4 (INSTEAD OF 104)
________________________________________
From: Hawkins, Thomas A. - Student [[email protected]]
Sent: Friday, May 18, 2012 1:21 AM
To: [email protected]
Subject: PDFBox and superscript format .NET
I am using the .NET version of PDFBox and I have a pdf that contains data such
as this:
Name Location
Jim Daviees 85
Herschel Walker 96
Vince Gogh 47
Andrew Lincoln 104
I need both the name value and the location value. When I use the following
code:
Dim p As PDDocument = PDDocument.load(fi.FullName)
Dim r As PDFTextStripper = New PDFTextStripper
Dim stringVal As String = r.getText(p)
Dim bytes As Byte() =
System.Text.Encoding.ASCII.GetBytes(stringVal)
I get the following in the .txt file (also in html when I've converted it to
that)
Jim Daviees
Herschel Walker
Vince Gogh
Andrew Lincoln
85
96
47
104
I'm okay with the layout, as I've got a work around for that, my problem is
that it destroys any mention of the superscript exponents. Is there a way that
I can locate these superscript parts and encapsulate them in brackets or
something so as the returned value is more like this:
Jim Daviees
Herschel Walker
Vince Gogh
Andrew Lincoln
8[5]
9[6]
4[7]
10[4]
So, nutshell time. Can I use pdfbox (.NET Version) to locate the instances of
superscript in a pdf file (like locating <sup></sup> in html) and change it out
for an easily recognized symbol to be output to my destination file. I picked
brackets because I have no brackets in my source file whatsoever and they would
be very easy for me to code around. Thanks in advance.