> The question is, do you close the input files properly?
Yes, I do, but only at the very end of the operation, as I was merging all
these individual files into one large one, so I had to keep the originals
open until I save this merged file for the last time, or it would throw an
exception about the PDDocument being closed.
I know this is not the best way of merging documents, by the way. I might
try to switch to using PDFMergerUtility, instead.

On Wed, Mar 15, 2023 at 8:30 AM Andreas Lehmkuehler <andr...@lehmi.de>
wrote:

> Hi Gilad,
>
> PDFBox is using a scratch file per document as long as you are using
> setupTempFileOnly. Handling thousands of documents ends up in thousands of
> scratch files. Those scratch files should be closed once the corresponding
> documents are closed.
>
> The question is, do you close the input files properly?
>
> Andreas
>
> Am 14.03.23 um 19:16 schrieb Gilad Denneboom:
> > Hi Maruan,
> >
> > Yes, I saw that, but it would be nice if this issue can be solved within
> > PDFBox, too.
> >
> > Gilad
> >
> > On Tue, Mar 14, 2023 at 4:52 PM Maruan Sahyoun <sahy...@fileaffairs.de>
> > wrote:
> >
> >> You can set the ulimit on Linux - Standard is 1024 open files.
> >>
> >> BR
> >> Maruan
> >>
> >>> Am 14.03.2023 um 16:05 schrieb Gilad Denneboom <
> >> gilad.denneb...@gmail.com>:
> >>>
> >>> Hi all,
> >>>
> >>> I created an application that opens many files (I'm talking thousands),
> >>> searching them for specific pages and then merges those pages into new
> >> PDF
> >>> files. The way I do it is by using the importPage command from the
> >> original
> >>> files into the split ones.
> >>> However, I'm getting an IOException ("Too many open files") from
> >>> ScratchFile after several thousands files were processed. I had a look
> at
> >>> the source code for that class and I think it might have to do with a
> >>> RandomAccessFile variable ("raf") not being properly closed.
> >>> All of the documents are opened using MemoryUsageSetting set to
> >>> setupTempFileOnly, by the way.
> >>> Could someone confirm this is the issue, and maybe help solve it? I'm
> >> using
> >>> PDFBox 2.0.26, by the way, and the app runs on a Mac.
> >>>
> >>> The stack-trace:
> >>> Exception in thread "main" java.io.IOException: Too many open files
> >>> at java.base/java.io.UnixFileSystem.createFileExclusively0(Native
> >> Method)
> >>> at
> >>> java.base/java.io
> >> .UnixFileSystem.createFileExclusively(UnixFileSystem.java:356)
> >>> at java.base/java.io.File.createTempFile(File.java:2179)
> >>> at org.apache.pdfbox.io.ScratchFile.enlarge(ScratchFile.java:217)
> >>> at org.apache.pdfbox.io.ScratchFile.getNewPage(ScratchFile.java:167)
> >>> at
> >>> org.apache.pdfbox.io
> >> .ScratchFileBuffer.addPage(ScratchFileBuffer.java:126)
> >>> at org.apache.pdfbox.io.ScratchFileBuffer.
> >> <init>(ScratchFileBuffer.java:84)
> >>> at org.apache.pdfbox.io.ScratchFile.createBuffer(ScratchFile.java:424)
> >>> at
> >>
> org.apache.pdfbox.cos.COSStream.createRaw0utputStream(COSStream.java:273)
> >>> at
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1140)
> >>> at
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:929)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:888)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:800)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:760)
> >>> at
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
> >>> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
> >>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1107)
> >>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1090)
> >>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1014)
> >>> at MergeStudentRecords_2021.main(MergeStudentRecords_2021.java:324)
> >>>
> >>> Thanks in advance!
> >>>
> >>> Gilad
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> >> For additional commands, e-mail: users-h...@pdfbox.apache.org
> >>
> >>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>

Reply via email to