Hi Dexter,

 

First of all, thanks for replying so promptly.

 

As I had mentioned in my earlier mail, using the PDFTextStripper I get a
series of TextPosition objects. Each TextPosition object gives me its
respective PDFont object. So in a way I know which font is being used to
render the respective text in pdf. So going forward, I need to understand
that, for any font returned by a TextPosition object (PDFont), what are the
ways in which I can calculate the width of a any string.

 

Thanks,

Shishir.

 

From: Dexter Mishra [mailto:[email protected]] 
Sent: Wednesday, March 25, 2009 9:03 PM
To: Shishir Mane-Patil
Subject: Re: Finding the x-coordinate and width of a sub-string

 

Hi Shishir,
As I told again the width of string depends on the type of font you are
using. It will be different for the strings "AAA" and "lll", unless its is a
monopitched font like Courier. What font are you using? I havent seen the
implementation of the stringWidth function. I will have a look at it and see
what I can do, if there is a bug I will try to fix it. But again you need to
be very careful when you are calculating these widths. 
~Thank
Dexter

On Wed, Mar 25, 2009 at 4:27 PM, Shishir Mane-Patil <[email protected]>
wrote:

Hi,

I am already having all the TextPosition objects for a particular Pdf page.
So I can always retrieve the font and font size for a particular string. For
instance, if we consider the earlier example:

 

String[75.0,278.8 fs=10.0 xscale=1.0 height=7.0000005 space=5.830001
width=108.87001]Primary Diagnosis: elder

 

Earlier if i had to find the x-coordinate of the word Diagnosis, I would
perform the following steps (considering the above example):

 

1. Find the PDFont object using the TextPosition

 

2. Then use the stringWidth function to calculate the string width of
"Primary ". Let's say it is sw. The current value of x-coordinate is x, the
x-scale is xs and the font size is fs. 

 

3. Then to calculate the new x-coordinate of, let's say, the word
"Diagnosis", i use the following formula:
         New X-Coordinate = x+((sw/1000)*xs*fs)

 

4. Similarly i also found the string width for the word "Diagnosis".

 

The above steps worked satisfactorily for many PDF's substrings. But they
seem to fail for some. In case of success, it was observed that the string
width returned from the TextPosition object was very much near to the one
calculated by the above formula. In case of failure, it was observed that
the string width returned by the PDFont object was either zero or was
calculated incorrectly.

 

So can anyone help me in some way by which i can accurately calculate the
starting x-coordinate for a substring or in other words the actual width of
any string for a particular font.

 

 

 

On Tue, Mar 24, 2009 at 6:52 PM, Dexter Mishra <[email protected]>
wrote:

Shishir,

PDF does not store the word co-ordinates in parts for this string. Primary
Diagnosis: elder will be a single entry in the PDF. The information you can
get is string length, width, height etc. So if you know the font point size
you need to calculate the x-co-ordinate of Diagnosis:. but beware. This is
quite tricky when you go for varriable pitch font (Arial, Times new roman
etc.)

~Thanks
Dexter

On Tue, Mar 24, 2009 at 11:42 AM, Shishir Mane-Patil
<[email protected]>wrote:


> Hi,
>
> I wish to find accurately the width of a sub-string using the
> PDFTextStripper. For e.g. part of the output of PDF Text extraction
example
> is as follows:
>
>
>
> String[75.0,278.8 fs=10.0 xscale=1.0 height=7.0000005 space=5.830001
> width=108.87001]Primary Diagnosis: elder
>
>
>
> Here the width calculated for the entire string "Primary Diagnosis: elder"
> is 108.87001. I wish to find the starting x-coordinate for just the word
> 'Diagnosis' and the width of the same word. How can I find the exact
> x-coordinate and the width of such substrings.
>
>
>
>
>
> Thanks and Regards,
>
> Shishir Mane
>

 

 

Reply via email to