Am 04.06.2017 um 16:45 schrieb 二川村田:
Thank you for your reply, Mr. Hausherr.

I send my codes.

It looks similar to the codes you sent.

Hi,

The difference is, you're subclassing PDFTextStripper to get the actual text position from the PDF. And this way you won't get any spaces because there are none in the PDF.

To illustrate this, I've uploaded page 3 treated with the DrawPrintImageLocations.java example from the source code download. See its source code for explanation on the colors.

http://imgur.com/a/H5CNR

The spaces from text extraction (that you get e.g. with "stripper.getText(doc);" ) are added by PDFBox but these have no TextPosition object.

Tilman


I want to use Java program, not commandline application.

I use the library pdfbox-2.0.6.jar

=====================
//class extends PDFTextStripper
class PDFTextCordinateStripper extends PDFTextStripper {

public List<TextPosition> list_text = new ArrayList<TextPosition>();

public PDFTextCordinateStripper() throws IOException {
super();
}

protected void processTextPosition(TextPosition text) {
super.processTextPosition(text);
list_text.add(text);
}

}


=====================
// main(omited)
PDFTextCordinateStripper stripper = new PDFTextCordinateStripper();

int len_page = doc.getNumberOfPages();
for (int ind = 1; ind <= len_page; ind++) {

PDPage pg = doc.getPage(ind - 1);

String str_page_num = "PageNum: " + ind;

String str_page_size =
"Width: " + pg_w
+ "\tHeight: " + pg_h;

System.out.println(str_page_num + "\t" + str_page_size);

stripper.list_text.clear();
stripper.setStartPage(ind);
stripper.setEndPage(ind);
stripper.getText(doc);

Iterator<TextPosition> it_text = stripper.list_text.iterator();
while (it_text.hasNext()) {
TextPosition rec = it_text.next();
String str_rec
= "Text: " + rec.toString()
+ "\tx: " + rec.getX()
+ "\ty: " + rec.getY()
+ "\tw: " + rec.getWidth()
+ "\th: " + rec.getHeight()
+ "\tfont_size: " + rec.getFontSizeInPt();
System.out.println(str_rec);
}
}

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to