Hi Yan, thanks for the answer but that is not my issue. What I'm saying is
that I think the PDFBox components responsible for extracting text
(PDFTextStripper and PDFTextStripperByArea) don't consider the current
clipping area, the way PageDrawer does.. or at least that's how it looks to
me, so I'm asking is this the case? Shouldn't it be taken into account? As
for the form xobject I mentioned, it's an example to explain my issue.

On Thu, Dec 1, 2016 at 10:22 AM, fx YAN BING <yan.b...@fujixerox.co.jp>
wrote:

> Hi, this is Yan from Japan.
> I'm also a user of PDFBox.
>
> About your problem, I've not understood clearly.
> Do you want to process the contents inside a form?
>
> I can give a sample code used in my project.
> It use PDFStreamEngine to get form objects in PDF.
> I hope it can help you.
>
>
>
>
>
> -----Original Message-----
> From: Andrea Vacondio [mailto:andrea.vacon...@gmail.com]
> Sent: Thursday, December 1, 2016 6:02 PM
> To: users@pdfbox.apache.org
> Subject: Text extraction and clip area
>
> Hi, I had a couple of issues with text extraction and I tried to dig a bit
> into the code. As far as I can see the "current clipping area" is never
> used during text extraction, is this correct? My issue is with a form
> xobject where the bounding box clips out part of the text but that text is
> returned by the text stripper.
>

Reply via email to