As an addendum, I didn't realize when I sent this out - the numbers are a 
combination of regular and superscript, since email won't support it, 
mathematical operators it is. The numbers should be
8^5       (INSTEAD OF 85)
9^6       (INSTEAD OF 96)
4^7       (INSTEAD OF 47)
10^4     (INSTEAD OF 104)
________________________________________
From: Hawkins, Thomas A. - Student [[email protected]]
Sent: Friday, May 18, 2012 1:21 AM
To: [email protected]
Subject: PDFBox and superscript format .NET

I am using the .NET version of PDFBox and I have a pdf that contains data such 
as this:

Name                  Location
Jim Daviees              85
Herschel Walker          96
Vince Gogh               47
Andrew Lincoln        104

I need both the name value and the location value. When I use the following 
code:

    Dim p As PDDocument = PDDocument.load(fi.FullName)
                    Dim r As PDFTextStripper = New PDFTextStripper

                    Dim stringVal As String = r.getText(p)
                    Dim bytes As Byte() = 
System.Text.Encoding.ASCII.GetBytes(stringVal)

I get the following in the .txt file (also in html when I've converted it to 
that)
Jim Daviees
Herschel Walker
Vince Gogh
Andrew Lincoln
85
96
47
104

I'm okay with the layout, as I've got a work around for that, my problem is 
that it destroys any mention of the superscript exponents. Is there a way that 
I can locate these superscript parts and encapsulate them in brackets or 
something so as the returned value is more like this:
Jim Daviees
Herschel Walker
Vince Gogh
Andrew Lincoln
8[5]
9[6]
4[7]
10[4]

So, nutshell time. Can I use pdfbox (.NET Version) to locate the instances of 
superscript in a pdf file (like locating <sup></sup> in html) and change it out 
for an easily recognized symbol to be output to my destination file. I picked 
brackets because I have no brackets in my source file whatsoever and they would 
be very easy for me to code around. Thanks in advance.

Reply via email to