Hi,

I have an issue with pdfbox leaving tmp files (+~JF*.tmp) in the JVM temp
folder.
After enough runs, that temp folder just gets full...

Here are the Groovy code source and unit test to reproduce (ok.pdf is a file
parsed with no error, ko.pdf is a file with one corrupted object, but
parsing can continue) :

package com.c4soft.mvntest

import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.util.PDFTextStripper

/**
 * Example Groovy class.
 */
class MvnTest {
static main(args) {
if(!args || args.size() != 2) {
 println 'Exactly two arguments are required:\n- path to the pdf to read\n-
path to the file were to put the pdf content'
return
 }
File pdfFile = new File(args[0])
if(!pdfFile.exists()) {
 println "${args[0]} does not exist"
return
}
 InputStream pdfStream = new FileInputStream(pdfFile)
OutputStream text = new FileOutputStream(args[1])
 PDDocument document
try {
document = PDDocument.load(pdfStream, true)
 PDFTextStripper stripper = new PDFTextStripper('UTF-8')
text << stripper.getText(document)
 } finally {
document.close()
}
 }
}




package com.c4soft.mvntest;

import static org.junit.Assert.*
import java.util.regex.Pattern
import groovy.io.FileType
import org.junit.Test

class MvnTestTest {
@Test
 public void testMain() {
File tmpDir = new File(System.getProperty('java.io.tmpdir'))
 Pattern p = ~/\+~JF.+\.tmp/
tmpDir.eachFileMatch(FileType.FILES, p) { File tmpFile ->
 tmpFile.delete()
}
Integer cnt = 0
 tmpDir.eachFileMatch(FileType.FILES, p) { File tmpFile ->
cnt++
}
 assertEquals(0, cnt)
 OutputStream out = new FileOutputStream('out.txt')
 Integer i = 1
while(i < 21) {
MvnTest.main('src/test/resources/ok.pdf', 'ok.txt')
 cnt = 0
tmpDir.eachFileMatch(FileType.FILES, p) { File tmpFile ->
cnt++
 }
out << "${i} ok: ${cnt}" << '\n'
i++
 }
out << '\n'
i = 1
 while(i < 21) {
 MvnTest.main('src/test/resources/ko.pdf', 'ko.txt')
 cnt = 0
tmpDir.eachFileMatch(FileType.FILES, p) { File tmpFile ->
cnt++
 }
out << "${i} ko: ${cnt}" << '\n'
i++
 }
}
}


And here is corresponding output (count of pdfbox temp files in JVM temp
folder):
1 ok: 35
2 ok: 36
3 ok: 37
4 ok: 42
5 ok: 43
6 ok: 46
7 ok: 47
8 ok: 50
9 ok: 50
10 ok: 53
11 ok: 53
12 ok: 56
13 ok: 65
14 ok: 68
15 ok: 73
16 ok: 75
17 ok: 80
18 ok: 81
19 ok: 84
20 ok: 85

1 ko: 62
2 ko: 62
3 ko: 62
4 ko: 62
5 ko: 62
6 ko: 62
7 ko: 62
8 ko: 62
9 ko: 71
10 ko: 65
11 ko: 67
12 ko: 68
13 ko: 68
14 ko: 68
15 ko: 68
16 ko: 68
17 ko: 68
18 ko: 77
19 ko: 71
20 ko: 73

After JUnit returns, 3 tmp files are left in temp folder

Has anyone ever noticed something like that ?
Do I miss use the API ?

Regards,
Ch4mp

Reply via email to