Hello. I have a problem with the class "org.apache.pdfbox.util.PDFTextStripperByArea":
If I add several regions to this class to extract the text from, it is only retrieved from one of them. The example I build was to create two regions with the same values (with different names), add them to the text stripper, and use the "extractRegions" function. I really appreciate if someone can answer me what I am doing wrong, or if this is a bug in the tool. Please, see at the end of the message the code with which I get this issue; the final result buffers (localResult1 and localResult2) have different content (one of them is empty). If you need a PDF document to reproduce this, please ask me for it. Thanks in advance, Ismael //Opening the document and getting the page PDFParser parser = new PDFParser(new ByteArrayInputStream(documentInBytes)); parser.parse(); PDDocument doc = parser.getPDDocument(); PDPage page = (PDPage) doc.getDocumentCatalog().getAllPages().get(pageNumber); // Creating the stripper PDFTextStripperByArea areaStripper = new PDFTextStripperByArea(); // Creation and addition of the regions to the stripper Rectangle2D rectangle = new Rectangle2D.Float(); rectangle.setRect(0, 0, 500, 100); areaStripper.addRegion("1", rectangle); Rectangle2D rectangle2 = new Rectangle2D.Float(); rectangle2.setRect(0, 0, 500, 100); areaStripper.addRegion("2", rectangle2); // Extracting the regions and getting the results areaStripper.extractRegions(page); String localResult1 = areaStripper.getTextForRegion("1"); String localResult2 = areaStripper.getTextForRegion("2");