Hi, all:
I'm using pdfbox 2.0.8, the test pdf file can download from
http://proj.gz-yibo.com:2880/nk7.pdf
eg:
a text line in page 19:
7.放射性核素扫描应用133 氙或99m 锝-二乙三胺五乙酸(99mTc-DTPA)雾化吸人。99m 锝
becomes:
133 99m 99m 99m
7.放射性核素扫描应用 氙或 锝-二乙三胺五乙酸(Tc-DTPA)雾化吸人。 锝
------------------
With best regards
Daniel
------------------ Original ------------------
From: "139250065";<[email protected]>;
Date: Wed, Dec 20, 2017 10:39 AM
To: "users"<[email protected]>;
Subject: 1 text line becomes 2 line after extraction
such as:
1 line: 肺具有广泛的呼吸面积,成人的总呼吸面积约有100m2(3 亿-7.5 亿肺泡),在呼吸过程中,
becomes 2 lines after extraction:
2
肺具有广泛的呼吸面积,成人的总呼吸面积约有100m(3 亿-7.5 亿肺泡),在呼吸过程中,
since y coordinate of char '2' is smaller than other chars.
------------------
with best regards
daniel