Hello VR,


I read the link you have send me. It is above my understanding of the PDFs 
and PDFBoxTextStripper. I am trying to parse this content from the PDF. 
With 0.8, the  PDFTextStripper.processTextPosition() was called for every 
column value(e.g: "Mt. Pleasant, SC 29466-8583"). So I thought I will use 
the getYDirAdj and getXDirAdj methods to sort them and take the values. 
Now I do not know where each of those column value end. For eg. How will I 
know "Mt. Pleasant, SC 29466-8583" is from one "field" if I get one 
character at a time and setSortByPosition(true) also doesn't work with the 
processTextPosition(). Could you please tell me if there is a better way 
of do that. Thank you.

Regards,
Rekha




From:
Villu Ruusmann <[email protected]>
To:
[email protected]
Date:
02/19/2010 05:42 AM
Subject:
Re: PDFTextStripper.processTextPosition



Hello there,

>
> I was using pdfbox  0.8 version and
> PDFTextStripper.processTextPosition(TextPosition text) was called for
> every "field"???. With 1.0 it looks like it is calling it for every
> character. Could you please tell me how to get it to call only on every
> "field". Thank you.
>

In short, your PDF document contains a "character spacing"
instruction, to which the PDFTextStripper now correctly abides to.

The change is detailed here:
https://issues.apache.org/jira/browse/PDFBOX-520

Since this change didn't have negative impact on the correctness of
the output of PDFTextStripper (quite the contrary!), could you please
elaborate what is the downside of this solution for you? A noticeable
performance degradation?


VR




This e-mail may contain data that is confidential, proprietary or
non-public personal information, as that term is defined in the
Gramm-Leach-Bliley Act (collectively, Confidential Information).
The Confidential Information is disclosed conditioned upon your
agreement that you will treat it confidentially and in accordance
with applicable law, ensure that such data isn't used or disclosed
except for the limited purpose for which it's being provided and
will notify and cooperate with us regarding any requested or
unauthorized disclosure or use of any Confidential Information. 
By accepting and reviewing the Confidential information, you agree
to indemnify us against any losses or expenses, including
attorney's fees that we may incur as a result of any unauthorized
use or disclosure of this data due to your acts or omissions. If a
party other than the intended recipient receives this e-mail, he or
she is requested to instantly notify us of the erroneous delivery
and return to us all data so delivered.

Reply via email to