RE: PDFTextStripper repeats Chinese characters 4 times.

2017-08-10 Thread Zubiri, Tomas
Hey Tilman,
We are not the same person, Tretonio must have mixed up.
I am using 1.8.13, I will upgrade my code to use pdfbox2
Thanks!

Tomas Zubiri
Research Associate, Ownership
S&P Global Market Intelligence
Buenos Aires, Argentina
tomas.zub...@spglobal.com
www.spglobal.com/marketintelligence




-Original Message-
From: Tilman Hausherr [mailto:thaush...@t-online.de]
Sent: Wednesday, August 09, 2017 4:28 PM
To: users@pdfbox.apache.org
Subject: Re: PDFTextStripper repeats Chinese characters 4 times.

Am 09.08.2017 um 21:25 schrieb Tretonio Tretis:
> no

But if you're not the same person, how can you know he's using 2.0.7 ?
Or did you mix up threads?

Tilman


>
> 2017-08-09 16:19 GMT-03:00 Tilman Hausherr :
>
>> Am 09.08.2017 um 21:02 schrieb Tretonio Tretis:
>>
>>> Version is PDFBox 2.0.7 release <https://pdfbox.apache.org/dow
>>> nload.cgi#20x>
>>>
>> Are you two the same person?
>>
>> Anyway, I just tried with 2.0.7 (previous was with 3.0) and I get
>> this, no NaN there.
>>
>> String[42.6,55.560303 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,79.6803 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,103.80029 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,127.92029 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,152.04028 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,176.16028 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,200.28027 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,224.40027 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[434.3998,254.6402 fs=20.04 xscale=20.027977 height=11.923801
>> space=5.567778 width=11.135559]1
>> String[445.5554,254.6402 fs=20.04 xscale=20.027977 height=11.923801
>> space=5.567778 width=11.135559]0
>> String[456.71097,254.6402 fs=20.04 xscale=20.027977 height=11.923801
>> space=5.567778 width=11.135559]3
>> String[473.0396,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>> space=20.027977 width=20.027985]?
>> String[493.08762,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>> space=20.027977 width=20.027985]?
>> String[513.1356,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>> space=20.027977 width=20.027954]?
>> String[533.1836,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>> space=20.027977 width=20.027954]?
>> String[552.8387,254.6402 fs=20.04 xscale=20.027977 height=11.553061
>> space=5.0069942 width=5.007019]
>>
>>
>>
>> Tilman
>>
>>
>>
>>
>>> 2017-08-09 15:47 GMT-03:00 Tilman Hausherr :
>>>
>>> Am 08.08.2017 um 22:36 schrieb Zubiri, Tomas:
>>>> Good evening!
>>>>> I am having trouble with the following Chinese file:
>>>>> http://www.filedropper.com/1327415361
>>>>>
>>>>>
>>>>> Page 2 contains only 7 characters, 3 numbers and 4 chinese
>>>>> characters, but TextStripper shows 19 TextPositions.
>>>>> The Chinese characters appear 4 times, sometimes with different x
>>>>> coordinates.
>>>>>
>>>>>
>>>>> That is page 3.
>>>>
>>>> It is worthy to note that TextPosition.getWidthOfSpace() returns
>>>> NaN for
>>>>> any of these characters.
>>>>>
>>>>>
>>>>> What version are you using?
>>>> Here's what I get:
>>>>
>>>>
>>>> String[42.6,55.560303 fs=14.04 xscale=14.031576 height=8.09406
>>>> space=3.507894 width=3.5078926]
>>>> String[42.6,79.6803 fs=14.04 xscale=14.031576 height=8.09406
>>>> space=3.507894 width=3.5078926]
>>>> String[42.6,103.80029 fs=14.04 xscale=14.031576 height=8.09406
>>>> space=3.507894 width=3.5078926]
>>>> String[42.6,127.92029 fs=14.04 xscale=14.031576 height=8.09406
>>>> space=3.507894 width=3.5078926]
>>>> String[42.6,152.04028 fs=14.04 xscale=14.031576 height=8.09406
>>>> space=3.507894 width=3.5078926]
>>>> String[42.6,176.16028 fs=14.04 xscale=14.031576 height=8.09406
>>>> space=3.507894 width=3.5078926]
>>>> String[42.6,200.28027 fs=14.04 xscale=14.031576 height=8.09406
>>>> space=3.507894 width=3.5078926]
>>>> String[42

Re: PDFTextStripper repeats Chinese characters 4 times.

2017-08-09 Thread Tilman Hausherr

Am 09.08.2017 um 21:25 schrieb Tretonio Tretis:

no


But if you're not the same person, how can you know he's using 2.0.7 ? 
Or did you mix up threads?


Tilman




2017-08-09 16:19 GMT-03:00 Tilman Hausherr :


Am 09.08.2017 um 21:02 schrieb Tretonio Tretis:


Version is PDFBox 2.0.7 release 


Are you two the same person?

Anyway, I just tried with 2.0.7 (previous was with 3.0) and I get this, no
NaN there.

String[42.6,55.560303 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,79.6803 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,103.80029 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,127.92029 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,152.04028 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,176.16028 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,200.28027 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,224.40027 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[434.3998,254.6402 fs=20.04 xscale=20.027977 height=11.923801
space=5.567778 width=11.135559]1
String[445.5554,254.6402 fs=20.04 xscale=20.027977 height=11.923801
space=5.567778 width=11.135559]0
String[456.71097,254.6402 fs=20.04 xscale=20.027977 height=11.923801
space=5.567778 width=11.135559]3
String[473.0396,254.6402 fs=20.04 xscale=20.027977 height=9.959881
space=20.027977 width=20.027985]?
String[493.08762,254.6402 fs=20.04 xscale=20.027977 height=9.959881
space=20.027977 width=20.027985]?
String[513.1356,254.6402 fs=20.04 xscale=20.027977 height=9.959881
space=20.027977 width=20.027954]?
String[533.1836,254.6402 fs=20.04 xscale=20.027977 height=9.959881
space=20.027977 width=20.027954]?
String[552.8387,254.6402 fs=20.04 xscale=20.027977 height=11.553061
space=5.0069942 width=5.007019]



Tilman





2017-08-09 15:47 GMT-03:00 Tilman Hausherr :

Am 08.08.2017 um 22:36 schrieb Zubiri, Tomas:

Good evening!

I am having trouble with the following Chinese file:
http://www.filedropper.com/1327415361


Page 2 contains only 7 characters, 3 numbers and 4 chinese characters,
but TextStripper shows 19 TextPositions.
The Chinese characters appear 4 times, sometimes with different x
coordinates.


That is page 3.


It is worthy to note that TextPosition.getWidthOfSpace() returns NaN for

any of these characters.


What version are you using?

Here's what I get:


String[42.6,55.560303 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,79.6803 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,103.80029 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,127.92029 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,152.04028 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,176.16028 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,200.28027 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,224.40027 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[434.3998,254.6402 fs=20.04 xscale=20.027977 height=11.923801
space=5.567778 width=11.135559]1
String[445.5554,254.6402 fs=20.04 xscale=20.027977 height=11.923801
space=5.567778 width=11.135559]0
String[456.71097,254.6402 fs=20.04 xscale=20.027977 height=11.923801
space=5.567778 width=11.135559]3
String[473.0396,254.6402 fs=20.04 xscale=20.027977 height=9.959881
space=20.027977 width=20.027985]?
String[493.08762,254.6402 fs=20.04 xscale=20.027977 height=9.959881
space=20.027977 width=20.027985]?
String[513.1356,254.6402 fs=20.04 xscale=20.027977 height=9.959881
space=20.027977 width=20.027954]?
String[533.1836,254.6402 fs=20.04 xscale=20.027977 height=9.959881
space=20.027977 width=20.027954]?
String[552.8387,254.6402 fs=20.04 xscale=20.027977 height=11.553061
space=5.0069942 width=5.007019]

There are some space characters. You can see their position with the
DrawPrintTextLocations example from the source code download.

The only weirdness is that the cyan rectangle is too wide for some. Maybe
a bug in getBounds2D(), or an invisible point...

Tilman





-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org





-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: PDFTextStripper repeats Chinese characters 4 times.

2017-08-09 Thread Tretonio Tretis
no

2017-08-09 16:19 GMT-03:00 Tilman Hausherr :

> Am 09.08.2017 um 21:02 schrieb Tretonio Tretis:
>
>> Version is PDFBox 2.0.7 release > nload.cgi#20x>
>>
>
> Are you two the same person?
>
> Anyway, I just tried with 2.0.7 (previous was with 3.0) and I get this, no
> NaN there.
>
> String[42.6,55.560303 fs=14.04 xscale=14.031576 height=8.09406
> space=3.507894 width=3.5078926]
> String[42.6,79.6803 fs=14.04 xscale=14.031576 height=8.09406
> space=3.507894 width=3.5078926]
> String[42.6,103.80029 fs=14.04 xscale=14.031576 height=8.09406
> space=3.507894 width=3.5078926]
> String[42.6,127.92029 fs=14.04 xscale=14.031576 height=8.09406
> space=3.507894 width=3.5078926]
> String[42.6,152.04028 fs=14.04 xscale=14.031576 height=8.09406
> space=3.507894 width=3.5078926]
> String[42.6,176.16028 fs=14.04 xscale=14.031576 height=8.09406
> space=3.507894 width=3.5078926]
> String[42.6,200.28027 fs=14.04 xscale=14.031576 height=8.09406
> space=3.507894 width=3.5078926]
> String[42.6,224.40027 fs=14.04 xscale=14.031576 height=8.09406
> space=3.507894 width=3.5078926]
> String[434.3998,254.6402 fs=20.04 xscale=20.027977 height=11.923801
> space=5.567778 width=11.135559]1
> String[445.5554,254.6402 fs=20.04 xscale=20.027977 height=11.923801
> space=5.567778 width=11.135559]0
> String[456.71097,254.6402 fs=20.04 xscale=20.027977 height=11.923801
> space=5.567778 width=11.135559]3
> String[473.0396,254.6402 fs=20.04 xscale=20.027977 height=9.959881
> space=20.027977 width=20.027985]?
> String[493.08762,254.6402 fs=20.04 xscale=20.027977 height=9.959881
> space=20.027977 width=20.027985]?
> String[513.1356,254.6402 fs=20.04 xscale=20.027977 height=9.959881
> space=20.027977 width=20.027954]?
> String[533.1836,254.6402 fs=20.04 xscale=20.027977 height=9.959881
> space=20.027977 width=20.027954]?
> String[552.8387,254.6402 fs=20.04 xscale=20.027977 height=11.553061
> space=5.0069942 width=5.007019]
>
>
>
> Tilman
>
>
>
>
>> 2017-08-09 15:47 GMT-03:00 Tilman Hausherr :
>>
>> Am 08.08.2017 um 22:36 schrieb Zubiri, Tomas:
>>>
>>> Good evening!

 I am having trouble with the following Chinese file:
 http://www.filedropper.com/1327415361


 Page 2 contains only 7 characters, 3 numbers and 4 chinese characters,
 but TextStripper shows 19 TextPositions.
 The Chinese characters appear 4 times, sometimes with different x
 coordinates.


 That is page 3.
>>>
>>>
>>> It is worthy to note that TextPosition.getWidthOfSpace() returns NaN for
 any of these characters.


 What version are you using?
>>>
>>> Here's what I get:
>>>
>>>
>>> String[42.6,55.560303 fs=14.04 xscale=14.031576 height=8.09406
>>> space=3.507894 width=3.5078926]
>>> String[42.6,79.6803 fs=14.04 xscale=14.031576 height=8.09406
>>> space=3.507894 width=3.5078926]
>>> String[42.6,103.80029 fs=14.04 xscale=14.031576 height=8.09406
>>> space=3.507894 width=3.5078926]
>>> String[42.6,127.92029 fs=14.04 xscale=14.031576 height=8.09406
>>> space=3.507894 width=3.5078926]
>>> String[42.6,152.04028 fs=14.04 xscale=14.031576 height=8.09406
>>> space=3.507894 width=3.5078926]
>>> String[42.6,176.16028 fs=14.04 xscale=14.031576 height=8.09406
>>> space=3.507894 width=3.5078926]
>>> String[42.6,200.28027 fs=14.04 xscale=14.031576 height=8.09406
>>> space=3.507894 width=3.5078926]
>>> String[42.6,224.40027 fs=14.04 xscale=14.031576 height=8.09406
>>> space=3.507894 width=3.5078926]
>>> String[434.3998,254.6402 fs=20.04 xscale=20.027977 height=11.923801
>>> space=5.567778 width=11.135559]1
>>> String[445.5554,254.6402 fs=20.04 xscale=20.027977 height=11.923801
>>> space=5.567778 width=11.135559]0
>>> String[456.71097,254.6402 fs=20.04 xscale=20.027977 height=11.923801
>>> space=5.567778 width=11.135559]3
>>> String[473.0396,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>>> space=20.027977 width=20.027985]?
>>> String[493.08762,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>>> space=20.027977 width=20.027985]?
>>> String[513.1356,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>>> space=20.027977 width=20.027954]?
>>> String[533.1836,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>>> space=20.027977 width=20.027954]?
>>> String[552.8387,254.6402 fs=20.04 xscale=20.027977 height=11.553061
>>> space=5.0069942 width=5.007019]
>>>
>>> There are some space characters. You can see their position with the
>>> DrawPrintTextLocations example from the source code download.
>>>
>>> The only weirdness is that the cyan rectangle is too wide for some. Maybe
>>> a bug in getBounds2D(), or an invisible point...
>>>
>>> Tilman
>>>
>>>
>>>
>>>
>
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>


Re: PDFTextStripper repeats Chinese characters 4 times.

2017-08-09 Thread Tilman Hausherr

Am 09.08.2017 um 21:02 schrieb Tretonio Tretis:

Version is PDFBox 2.0.7 release 


Are you two the same person?

Anyway, I just tried with 2.0.7 (previous was with 3.0) and I get this, 
no NaN there.


String[42.6,55.560303 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,79.6803 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,103.80029 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,127.92029 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,152.04028 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,176.16028 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,200.28027 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,224.40027 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[434.3998,254.6402 fs=20.04 xscale=20.027977 height=11.923801 
space=5.567778 width=11.135559]1
String[445.5554,254.6402 fs=20.04 xscale=20.027977 height=11.923801 
space=5.567778 width=11.135559]0
String[456.71097,254.6402 fs=20.04 xscale=20.027977 height=11.923801 
space=5.567778 width=11.135559]3
String[473.0396,254.6402 fs=20.04 xscale=20.027977 height=9.959881 
space=20.027977 width=20.027985]?
String[493.08762,254.6402 fs=20.04 xscale=20.027977 height=9.959881 
space=20.027977 width=20.027985]?
String[513.1356,254.6402 fs=20.04 xscale=20.027977 height=9.959881 
space=20.027977 width=20.027954]?
String[533.1836,254.6402 fs=20.04 xscale=20.027977 height=9.959881 
space=20.027977 width=20.027954]?
String[552.8387,254.6402 fs=20.04 xscale=20.027977 height=11.553061 
space=5.0069942 width=5.007019]




Tilman




2017-08-09 15:47 GMT-03:00 Tilman Hausherr :


Am 08.08.2017 um 22:36 schrieb Zubiri, Tomas:


Good evening!

I am having trouble with the following Chinese file:
http://www.filedropper.com/1327415361


Page 2 contains only 7 characters, 3 numbers and 4 chinese characters,
but TextStripper shows 19 TextPositions.
The Chinese characters appear 4 times, sometimes with different x
coordinates.



That is page 3.



It is worthy to note that TextPosition.getWidthOfSpace() returns NaN for
any of these characters.



What version are you using?

Here's what I get:


String[42.6,55.560303 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,79.6803 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,103.80029 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,127.92029 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,152.04028 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,176.16028 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,200.28027 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[42.6,224.40027 fs=14.04 xscale=14.031576 height=8.09406
space=3.507894 width=3.5078926]
String[434.3998,254.6402 fs=20.04 xscale=20.027977 height=11.923801
space=5.567778 width=11.135559]1
String[445.5554,254.6402 fs=20.04 xscale=20.027977 height=11.923801
space=5.567778 width=11.135559]0
String[456.71097,254.6402 fs=20.04 xscale=20.027977 height=11.923801
space=5.567778 width=11.135559]3
String[473.0396,254.6402 fs=20.04 xscale=20.027977 height=9.959881
space=20.027977 width=20.027985]?
String[493.08762,254.6402 fs=20.04 xscale=20.027977 height=9.959881
space=20.027977 width=20.027985]?
String[513.1356,254.6402 fs=20.04 xscale=20.027977 height=9.959881
space=20.027977 width=20.027954]?
String[533.1836,254.6402 fs=20.04 xscale=20.027977 height=9.959881
space=20.027977 width=20.027954]?
String[552.8387,254.6402 fs=20.04 xscale=20.027977 height=11.553061
space=5.0069942 width=5.007019]

There are some space characters. You can see their position with the
DrawPrintTextLocations example from the source code download.

The only weirdness is that the cyan rectangle is too wide for some. Maybe
a bug in getBounds2D(), or an invisible point...

Tilman






-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: PDFTextStripper repeats Chinese characters 4 times.

2017-08-09 Thread Tretonio Tretis
Version is PDFBox 2.0.7 release 

2017-08-09 15:47 GMT-03:00 Tilman Hausherr :

> Am 08.08.2017 um 22:36 schrieb Zubiri, Tomas:
>
>>
>> Good evening!
>>
>> I am having trouble with the following Chinese file:
>> http://www.filedropper.com/1327415361
>>
>>
>> Page 2 contains only 7 characters, 3 numbers and 4 chinese characters,
>> but TextStripper shows 19 TextPositions.
>> The Chinese characters appear 4 times, sometimes with different x
>> coordinates.
>>
>>
> That is page 3.
>
>
>> It is worthy to note that TextPosition.getWidthOfSpace() returns NaN for
>> any of these characters.
>>
>>
> What version are you using?
>
> Here's what I get:
>
>
> String[42.6,55.560303 fs=14.04 xscale=14.031576 height=8.09406
> space=3.507894 width=3.5078926]
> String[42.6,79.6803 fs=14.04 xscale=14.031576 height=8.09406
> space=3.507894 width=3.5078926]
> String[42.6,103.80029 fs=14.04 xscale=14.031576 height=8.09406
> space=3.507894 width=3.5078926]
> String[42.6,127.92029 fs=14.04 xscale=14.031576 height=8.09406
> space=3.507894 width=3.5078926]
> String[42.6,152.04028 fs=14.04 xscale=14.031576 height=8.09406
> space=3.507894 width=3.5078926]
> String[42.6,176.16028 fs=14.04 xscale=14.031576 height=8.09406
> space=3.507894 width=3.5078926]
> String[42.6,200.28027 fs=14.04 xscale=14.031576 height=8.09406
> space=3.507894 width=3.5078926]
> String[42.6,224.40027 fs=14.04 xscale=14.031576 height=8.09406
> space=3.507894 width=3.5078926]
> String[434.3998,254.6402 fs=20.04 xscale=20.027977 height=11.923801
> space=5.567778 width=11.135559]1
> String[445.5554,254.6402 fs=20.04 xscale=20.027977 height=11.923801
> space=5.567778 width=11.135559]0
> String[456.71097,254.6402 fs=20.04 xscale=20.027977 height=11.923801
> space=5.567778 width=11.135559]3
> String[473.0396,254.6402 fs=20.04 xscale=20.027977 height=9.959881
> space=20.027977 width=20.027985]?
> String[493.08762,254.6402 fs=20.04 xscale=20.027977 height=9.959881
> space=20.027977 width=20.027985]?
> String[513.1356,254.6402 fs=20.04 xscale=20.027977 height=9.959881
> space=20.027977 width=20.027954]?
> String[533.1836,254.6402 fs=20.04 xscale=20.027977 height=9.959881
> space=20.027977 width=20.027954]?
> String[552.8387,254.6402 fs=20.04 xscale=20.027977 height=11.553061
> space=5.0069942 width=5.007019]
>
> There are some space characters. You can see their position with the
> DrawPrintTextLocations example from the source code download.
>
> The only weirdness is that the cyan rectangle is too wide for some. Maybe
> a bug in getBounds2D(), or an invisible point...
>
> Tilman
>
>
>


Re: PDFTextStripper repeats Chinese characters 4 times.

2017-08-09 Thread Tilman Hausherr

Am 08.08.2017 um 22:36 schrieb Zubiri, Tomas:


Good evening!

I am having trouble with the following Chinese file:
http://www.filedropper.com/1327415361


Page 2 contains only 7 characters, 3 numbers and 4 chinese characters, 
but TextStripper shows 19 TextPositions.
The Chinese characters appear 4 times, sometimes with different x 
coordinates.




That is page 3.



It is worthy to note that TextPosition.getWidthOfSpace() returns NaN 
for any of these characters.




What version are you using?

Here's what I get:


String[42.6,55.560303 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,79.6803 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,103.80029 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,127.92029 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,152.04028 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,176.16028 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,200.28027 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,224.40027 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[434.3998,254.6402 fs=20.04 xscale=20.027977 height=11.923801 
space=5.567778 width=11.135559]1
String[445.5554,254.6402 fs=20.04 xscale=20.027977 height=11.923801 
space=5.567778 width=11.135559]0
String[456.71097,254.6402 fs=20.04 xscale=20.027977 height=11.923801 
space=5.567778 width=11.135559]3
String[473.0396,254.6402 fs=20.04 xscale=20.027977 height=9.959881 
space=20.027977 width=20.027985]?
String[493.08762,254.6402 fs=20.04 xscale=20.027977 height=9.959881 
space=20.027977 width=20.027985]?
String[513.1356,254.6402 fs=20.04 xscale=20.027977 height=9.959881 
space=20.027977 width=20.027954]?
String[533.1836,254.6402 fs=20.04 xscale=20.027977 height=9.959881 
space=20.027977 width=20.027954]?
String[552.8387,254.6402 fs=20.04 xscale=20.027977 height=11.553061 
space=5.0069942 width=5.007019]


There are some space characters. You can see their position with the 
DrawPrintTextLocations example from the source code download.


The only weirdness is that the cyan rectangle is too wide for some. 
Maybe a bug in getBounds2D(), or an invisible point...


Tilman