Hi,
I am already having all the TextPosition objects for a particular Pdf page.
So I can always retrieve the font and font size for a particular string. For
instance, if we consider the earlier example:

String[75.0,278.8 fs=10.0 xscale=1.0 height=7.0000005 space=5.830001
width=108.87001]Primary Diagnosis: elder

Earlier if i had to find the x-coordinate of the word Diagnosis, I would
perform the following steps (considering the above example):

1. Find the PDFont object using the TextPosition

2. Then use the stringWidth function to calculate the string width of
"Primary ". Let's say it is sw. The current value of x-coordinate is x, the
x-scale is xs and the font size is fs.

3. Then to calculate the new x-coordinate of, let's say, the word
"Diagnosis", i use the following formula:
         New X-Coordinate = x+((sw/1000)*xs*fs)

4. Similarly i also found the string width for the word "Diagnosis".

The above steps worked satisfactorily for many PDF's substrings. But they
seem to fail for some. In case of success, it was observed that the string
width returned from the TextPosition object was very much near to the one
calculated by the above formula. In case of failure, it was observed that
the string width returned by the PDFont object was either zero or was
calculated incorrectly.

So can anyone help me in some way by which i can accurately calculate the
starting x-coordinate for a substring or in other words the actual width of
any string for a particular font.




On Tue, Mar 24, 2009 at 6:52 PM, Dexter Mishra <[email protected]>wrote:

> Shishir,
>
> PDF does not store the word co-ordinates in parts for this string. Primary
> Diagnosis: elder will be a single entry in the PDF. The information you can
> get is string length, width, height etc. So if you know the font point size
> you need to calculate the x-co-ordinate of Diagnosis:. but beware. This is
> quite tricky when you go for varriable pitch font (Arial, Times new roman
> etc.)
>
> ~Thanks
> Dexter
>
> On Tue, Mar 24, 2009 at 11:42 AM, Shishir Mane-Patil
> <[email protected]>wrote:
>
> > Hi,
> >
> > I wish to find accurately the width of a sub-string using the
> > PDFTextStripper. For e.g. part of the output of PDF Text extraction
> example
> > is as follows:
> >
> >
> >
> > String[75.0,278.8 fs=10.0 xscale=1.0 height=7.0000005 space=5.830001
> > width=108.87001]Primary Diagnosis: elder
> >
> >
> >
> > Here the width calculated for the entire string “Primary Diagnosis:
> elder”
> > is 108.87001. I wish to find the starting x-coordinate for just the word
> > ‘Diagnosis’ and the width of the same word. How can I find the exact
> > x-coordinate and the width of such substrings.
> >
> >
> >
> >
> >
> > Thanks and Regards,
> >
> > Shishir Mane
> >
>

Reply via email to