Hello, thank you for your reply. I update the PDFBox library to 2.0.7.
But I couldn't get the character position yet. I try to add the image file including a result. 2017-08-23 0:57 GMT+09:00 Tilman Hausherr <[email protected]>: > Hi, > > Sorry about that. > > What PDFBox version are you using? The current one is 2.0.7. The generic > example is PrintTextLocations.java, and DrawPrintTextLocations.java is the > same visually (see output: http://imgur.com/a/1awtu ) > > Which characters were you not able to retrieve the location? Please describe > where it is, e.g. "top left", whatever, or please explain what you were > expecting and missed. > > Tilman > > > Am 22.08.2017 um 17:44 schrieb 二川村田: >> >> Hello >> >> I tried to get texts from below pdf. >> >> http://jpdb.nihs.go.jp/jp17e/000217651.pdf >> >> On first page, there were some characters that I could retrieve locations, >> but there were also characters that I couldn't. >> >> What is reason of this problem? >> >> >> ======================== >> my source to retrieve character's locations >> ======================== >> >> ===================== >> //class extends PDFTextStripper >> class PDFTextCordinateStripper extends PDFTextStripper { >> >> public List<TextPosition> list_text = new ArrayList<TextPosition>(); >> >> public PDFTextCordinateStripper() throws IOException { >> super(); >> } >> >> protected void processTextPosition(TextPosition text) { >> super.processTextPosition(text); >> list_text.add(text); >> } >> >> } >> >> >> ===================== >> // main(omited) >> PDFTextCordinateStripper stripper = new PDFTextCordinateStripper(); >> >> int len_page = doc.getNumberOfPages(); >> for (int ind = 1; ind <= len_page; ind++) { >> >> PDPage pg = doc.getPage(ind - 1); >> >> String str_page_num = "PageNum: " + ind; >> >> String str_page_size = >> "Width: " + pg_w >> + "\tHeight: " + pg_h; >> >> System.out.println(str_page_num + "\t" + str_page_size); >> >> stripper.list_text.clear(); >> stripper.setStartPage(ind); >> stripper.setEndPage(ind); >> stripper.getText(doc); >> >> String p_text = stripper.getText(doc); >> >> Iterator<String> it_str = Arrays.asList(p_text.split("")).iterator(); >> int ind_tp = 0; >> List<TextPosition> list_tp = stripper.list_text; >> int len_list_tp = list_tp.size(); >> while (it_str.hasNext()) { >> String ch = it_str.next(); >> String str_rec = "Text: " + ch; >> >> if (ind_tp < len_list_tp) { >> TextPosition tp = list_tp.get(ind_tp); >> if (ch.equals(tp.toString())){ >> str_rec += "\tx: " + tp.getX() >> + "\ty: " + tp.getY() >> + "\tw: " + tp.getWidth() >> + "\th: " + tp.getHeight() >> + "\tfont_size: " + tp.getFontSizeInPt(); >> ind_tp++; >> } >> } >> >> System.out.println(str_rec); >> } >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] >
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

