Am 04.06.2017 um 16:45 schrieb 二川村田:
Thank you for your reply, Mr. Hausherr.
I send my codes.
It looks similar to the codes you sent.
Hi,
The difference is, you're subclassing PDFTextStripper to get the actual
text position from the PDF. And this way you won't get any spaces
because there are none in the PDF.
To illustrate this, I've uploaded page 3 treated with the
DrawPrintImageLocations.java example from the source code download. See
its source code for explanation on the colors.
http://imgur.com/a/H5CNR
The spaces from text extraction (that you get e.g. with
"stripper.getText(doc);" ) are added by PDFBox but these have no
TextPosition object.
Tilman
I want to use Java program, not commandline application.
I use the library pdfbox-2.0.6.jar
=====================
//class extends PDFTextStripper
class PDFTextCordinateStripper extends PDFTextStripper {
public List<TextPosition> list_text = new ArrayList<TextPosition>();
public PDFTextCordinateStripper() throws IOException {
super();
}
protected void processTextPosition(TextPosition text) {
super.processTextPosition(text);
list_text.add(text);
}
}
=====================
// main(omited)
PDFTextCordinateStripper stripper = new PDFTextCordinateStripper();
int len_page = doc.getNumberOfPages();
for (int ind = 1; ind <= len_page; ind++) {
PDPage pg = doc.getPage(ind - 1);
String str_page_num = "PageNum: " + ind;
String str_page_size =
"Width: " + pg_w
+ "\tHeight: " + pg_h;
System.out.println(str_page_num + "\t" + str_page_size);
stripper.list_text.clear();
stripper.setStartPage(ind);
stripper.setEndPage(ind);
stripper.getText(doc);
Iterator<TextPosition> it_text = stripper.list_text.iterator();
while (it_text.hasNext()) {
TextPosition rec = it_text.next();
String str_rec
= "Text: " + rec.toString()
+ "\tx: " + rec.getX()
+ "\ty: " + rec.getY()
+ "\tw: " + rec.getWidth()
+ "\th: " + rec.getHeight()
+ "\tfont_size: " + rec.getFontSizeInPt();
System.out.println(str_rec);
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]