Hi,
I have an issue with pdfbox leaving tmp files (+~JF*.tmp) in the JVM temp
folder.
After enough runs, that temp folder just gets full...
Here are the Groovy code source and unit test to reproduce (ok.pdf is a file
parsed with no error, ko.pdf is a file with one corrupted object, but
parsing can continue) :
package com.c4soft.mvntest
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.util.PDFTextStripper
/**
* Example Groovy class.
*/
class MvnTest {
static main(args) {
if(!args || args.size() != 2) {
println 'Exactly two arguments are required:\n- path to the pdf to read\n-
path to the file were to put the pdf content'
return
}
File pdfFile = new File(args[0])
if(!pdfFile.exists()) {
println "${args[0]} does not exist"
return
}
InputStream pdfStream = new FileInputStream(pdfFile)
OutputStream text = new FileOutputStream(args[1])
PDDocument document
try {
document = PDDocument.load(pdfStream, true)
PDFTextStripper stripper = new PDFTextStripper('UTF-8')
text << stripper.getText(document)
} finally {
document.close()
}
}
}
package com.c4soft.mvntest;
import static org.junit.Assert.*
import java.util.regex.Pattern
import groovy.io.FileType
import org.junit.Test
class MvnTestTest {
@Test
public void testMain() {
File tmpDir = new File(System.getProperty('java.io.tmpdir'))
Pattern p = ~/\+~JF.+\.tmp/
tmpDir.eachFileMatch(FileType.FILES, p) { File tmpFile ->
tmpFile.delete()
}
Integer cnt = 0
tmpDir.eachFileMatch(FileType.FILES, p) { File tmpFile ->
cnt++
}
assertEquals(0, cnt)
OutputStream out = new FileOutputStream('out.txt')
Integer i = 1
while(i < 21) {
MvnTest.main('src/test/resources/ok.pdf', 'ok.txt')
cnt = 0
tmpDir.eachFileMatch(FileType.FILES, p) { File tmpFile ->
cnt++
}
out << "${i} ok: ${cnt}" << '\n'
i++
}
out << '\n'
i = 1
while(i < 21) {
MvnTest.main('src/test/resources/ko.pdf', 'ko.txt')
cnt = 0
tmpDir.eachFileMatch(FileType.FILES, p) { File tmpFile ->
cnt++
}
out << "${i} ko: ${cnt}" << '\n'
i++
}
}
}
And here is corresponding output (count of pdfbox temp files in JVM temp
folder):
1 ok: 35
2 ok: 36
3 ok: 37
4 ok: 42
5 ok: 43
6 ok: 46
7 ok: 47
8 ok: 50
9 ok: 50
10 ok: 53
11 ok: 53
12 ok: 56
13 ok: 65
14 ok: 68
15 ok: 73
16 ok: 75
17 ok: 80
18 ok: 81
19 ok: 84
20 ok: 85
1 ko: 62
2 ko: 62
3 ko: 62
4 ko: 62
5 ko: 62
6 ko: 62
7 ko: 62
8 ko: 62
9 ko: 71
10 ko: 65
11 ko: 67
12 ko: 68
13 ko: 68
14 ko: 68
15 ko: 68
16 ko: 68
17 ko: 68
18 ko: 77
19 ko: 71
20 ko: 73
After JUnit returns, 3 tmp files are left in temp folder
Has anyone ever noticed something like that ?
Do I miss use the API ?
Regards,
Ch4mp