[jira] [Updated] (PDFBOX-3970) x,y co-ordinates of the text inside the cell are not getting correctly.

2017-10-18 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-3970:

Attachment: paragraphNextToTable-marked-1.png

You didn't attach any code so I don't know how you got your values. I have 
attached the result file of the DrawPrintTextLocations example.

> x,y co-ordinates of the text inside the cell are not getting correctly.
> ---
>
> Key: PDFBOX-3970
> URL: https://issues.apache.org/jira/browse/PDFBOX-3970
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.7
> Environment: Operating system: Windows 7 (64 bit).
>Reporter: Navnath Kumbhar
> Attachments: paragraphNextToTable-marked-1.png, 
> paragraphNextToTable.pdf
>
>
> Hello Support Team,
> I am working on a project which parses a whole PDF document and stores the 
> extracted text in some .txt file which can be read by other product.
> My issue is regarding extracting the text inside the cell of a table: 
> *x,y co-ordinates of the text inside the cell are not getting correctly.*
> Y value of the last text line in the cell is getting larger than cell's max-Y 
> value.
> I have attached the test file with this bug.
> As you can see in the test document, there is one cell along-with text in it 
> and a text paragraph next to that cell.
> x-y coordinates that I get from pdfbox for all the paths (two vertical and 
> two horizontal lines) of the cell are:
> (in x1,y1,x2,y2 format)
> Horizontal line 1: [100,88,220,88]
> Horizontal line 2: [100,120,220,120]
> Vertical line 1 : [100,88,100,120]
> Vertical line 2: [220,88,220,120]
> (Y values of the above paths are final values by subtracting the actual value 
> given by pdfbox from height of the page as I see that for paths, y-values are 
> processed from bottom to up)
> And bounding box of the last line in that cell is : [102,114,59,7] and hence 
> max-Y of that line becomes 121 (min-Y + height)
>  
> So, if we consider max-Y value of that cell (i.e. 120)  and that of last line 
> in that cell (i.e. 121), clearly, that line goes out of that cell.
> What can be the possible reason for this?
> Thank you in advance!
> Regards,
> Navnath Kumbhar



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3970) x,y co-ordinates of the text inside the cell are not getting correctly.

2017-10-18 Thread Navnath Kumbhar (JIRA)
Navnath Kumbhar created PDFBOX-3970:
---

 Summary: x,y co-ordinates of the text inside the cell are not 
getting correctly.
 Key: PDFBOX-3970
 URL: https://issues.apache.org/jira/browse/PDFBOX-3970
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
Affects Versions: 2.0.7
 Environment: Operating system: Windows 7 (64 bit).
Reporter: Navnath Kumbhar
 Attachments: paragraphNextToTable.pdf

Hello Support Team,

I am working on a project which parses a whole PDF document and stores the 
extracted text in some .txt file which can be read by other product.

My issue is regarding extracting the text inside the cell of a table: 
*x,y co-ordinates of the text inside the cell are not getting correctly.*
Y value of the last text line in the cell is getting larger than cell's max-Y 
value.

I have attached the test file with this bug.

As you can see in the test document, there is one cell along-with text in it 
and a text paragraph next to that cell.

x-y coordinates that I get from pdfbox for all the paths (two vertical and two 
horizontal lines) of the cell are:
(in x1,y1,x2,y2 format)
Horizontal line 1: [100,88,220,88]
Horizontal line 2: [100,120,220,120]
Vertical line 1 : [100,88,100,120]
Vertical line 2: [220,88,220,120]

(Y values of the above paths are final values by subtracting the actual value 
given by pdfbox from height of the page as I see that for paths, y-values are 
processed from bottom to up)

And bounding box of the last line in that cell is : [102,114,59,7] and hence 
max-Y of that line becomes 121 (min-Y + height)
 
So, if we consider max-Y value of that cell (i.e. 120)  and that of last line 
in that cell (i.e. 121), clearly, that line goes out of that cell.

What can be the possible reason for this?

Thank you in advance!
Regards,
Navnath Kumbhar





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3969) Splitting starts counting for cutting out pages wrongly

2017-10-18 Thread freddi fred (JIRA)
freddi fred created PDFBOX-3969:
---

 Summary: Splitting starts counting for cutting out pages wrongly
 Key: PDFBOX-3969
 URL: https://issues.apache.org/jira/browse/PDFBOX-3969
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.6
Reporter: freddi fred
 Attachments: 1000pages.pdf

Right now the usage of 'splitAtPage' is wrong. Let's assume there is a document 
with 1000pages,
startPage=238, endPage=977, splitAtPage=17 then pdfbox starts splitAtPage at 
page #0! This leads to the following
groups: 1, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 
17, 17, 17, 17, 17, 17, 17, 17, 17,
17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 8

I would have expected: 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 
17, 17, 17, 17, 17, 17, 17, 17, 17,
17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 
9

This does not allow for example cutting out some parts of a document.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Pdf cannot rendered correct/ current Trunk-Version

2017-10-18 Thread Manfred Pock
Hello,



no it's not created at our company, we got all kinds of pdf's from different 
companies to view at our software and have no influence tot he pdf-sources. 
Thanks for your answer.



Manfred



On 2017-10-16 19:13, Tilman Hausherr 
> wrote:

> Am 16.10.2017 um 11:32 schrieb Manfred Pock:>

> > Hello,>

> >>

> > the PDF at http://cloud.directupload.net/97hE is rendered incorrect at 
> > current trunk-version, may there be any solution?>

> >>

> > Best regarts, Manfred>

> >>

> >>

>

> Did you create this file yourself / in your company?>

>

> The font has the "symbolic" bit set but it is not symbolic.>

>

> Tilman>

>

>

>

> ->

> To unsubscribe, e-mail: 
> dev-unsubscr...@pdfbox.apache.org>

> For additional commands, e-mail: 
> dev-h...@pdfbox.apache.org>

>

>