Hi, all:
    I'm using pdfbox 2.0.8, the test pdf file can download from  
http://proj.gz-yibo.com:2880/nk7.pdf

eg: 
a text line in page 19:
7.放射性核素扫描应用133 氙或99m 锝-二乙三胺五乙酸(99mTc-DTPA)雾化吸人。99m 锝
becomes:
133 99m 99m 99m
7.放射性核素扫描应用 氙或 锝-二乙三胺五乙酸(Tc-DTPA)雾化吸人。 锝


------------------
  With best regards


Daniel


------------------ Original ------------------
From:  "139250065";<[email protected]>;
Date:  Wed, Dec 20, 2017 10:39 AM
To:  "users"<[email protected]>;

Subject:  1 text line becomes 2 line after extraction



such as:
1 line: 肺具有广泛的呼吸面积,成人的总呼吸面积约有100m2(3 亿-7.5 亿肺泡),在呼吸过程中,
  ‍

  ‍

becomes 2 lines after extraction:
2
肺具有广泛的呼吸面积,成人的总呼吸面积约有100m(3 亿-7.5 亿肺泡),在呼吸过程中,

since y coordinate of char '2' is smaller than other chars. 


------------------


with best regards


daniel

Reply via email to