PDFBox and superscript format .NET

Hawkins, Thomas A. - Student Thu, 17 May 2012 22:22:10 -0700

I am using the .NET version of PDFBox and I have a pdf that contains data such 
as this:


Name                  Location
Jim Daviees              85
Herschel Walker          96
Vince Gogh               47
Andrew Lincoln        104

I need both the name value and the location value. When I use the following 
code:

    Dim p As PDDocument = PDDocument.load(fi.FullName)
                    Dim r As PDFTextStripper = New PDFTextStripper

                    Dim stringVal As String = r.getText(p)
                    Dim bytes As Byte() = 
System.Text.Encoding.ASCII.GetBytes(stringVal)

I get the following in the .txt file (also in html when I've converted it to 
that)
Jim Daviees
Herschel Walker
Vince Gogh
Andrew Lincoln
85
96
47
104

I'm okay with the layout, as I've got a work around for that, my problem is 
that it destroys any mention of the superscript exponents. Is there a way that 
I can locate these superscript parts and encapsulate them in brackets or 
something so as the returned value is more like this:
Jim Daviees
Herschel Walker
Vince Gogh
Andrew Lincoln
8[5]
9[6]
4[7]
10[4]

So, nutshell time. Can I use pdfbox (.NET Version) to locate the instances of 
superscript in a pdf file (like locating <sup></sup> in html) and change it out 
for an easily recognized symbol to be output to my destination file. I picked 
brackets because I have no brackets in my source file whatsoever and they would 
be very easy for me to code around. Thanks in advance.

PDFBox and superscript format .NET

Reply via email to