Hello, I'm using pdfbox to go through a list of pdfs and attempt to extract a phrase out of the files. Thus far, everything has been working great until I scaled it up to use a large list of files. I am running into a problem when there is a list of 112 files. After successfully going through a handful of them (maybe 20), subsequent files are not able to be opened. I have tried using the same method on just the offending files one at a time, and they are able to be opened.
Here is the method that is being problematic, which is called for each pdf file: PDDocument doc = null; String regExDoi = "[Dd][Oo][Ii]:[0-9\\s]*\\.[\\ n\\r0-9]*/[A-Za-z0-9\\.\\-;\\(\\)/]*"; String regExDoiSplit = "[Dd][Oo][Ii]:"; Pattern findDoiString = Pattern.compile(regExDoi); try { try { doc = PDDocument.load(file); System.out.println("======= "+file.getName()+" loaded ========"); decrypt(doc); if (!isFailedFile) { PDFTextStripper strip = new PDFTextStripper(); int pageCount = doc.getNumberOfPages(); System.out.println("Pages: "+pageCount); for (int page = 1; page < pageCount; page++) { // restrict pdftextstripper to current page strip.setStartPage(page); strip.setEndPage(page); // get text on page String text = strip.getText(doc); // try to find the doi string Matcher m = findDoiString.matcher(text); if (m.find()) { String foundGroup = m.group(); String foundIt[] = foundGroup.split(regExDoiSplit); // split at regexDoiSplit, should be String[] = {"", "the doi numbers"} if (foundIt.length > 0) { System.out.println("\tDOI: '"+foundIt[1]+"'"); if (doc != null) { System.out.println("Closing document, found doi."); doc.close(); } // return the doi numbers, stripping any white space return foundIt[1].replaceAll("[\\s]*", ""); } } } } else System.out.println(isFailedFile + failedReason.toString()); } finally { if (doc != null) { doc.close(); } } } catch (IOException e) { isFailedFile = true; failedReason = FailedReason.BADFILE; if (doc != null) { doc.close(); } } if (doc != null) { doc.close(); System.out.println("close it again"); } return null } I think the problem is arising because I keep getting a "warning, you did not close the pdf" and in such a long list, after getting that warning so many times, it won't open the files anymore. I thought I closed the document at all points that needed to be closed, did I forget something else? Thank you. -Sophia -- ~~~~~~~~~~~~~~~~~~~~~~~~~ Aim for the moon. If you miss, you may hit a star. -W. Clement Stone