with debug log levels set in tika config
cat tika-server-config-custom.xml
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<server>
<params>
<logLevel>debug</logLevel>
<port>9998</port>
<host>127.0.0.1</host>
<javaPath>/usr/bin/java</javaPath>
<noFork>false</noFork>
<forkedJvmArgs>
<arg>-Xms1g</arg>
<arg>-Xmx1g</arg>
<arg>-Dpdfbox.fontcache=/var/tika</arg>
<arg>-Dlog4j2.debug</arg>
</forkedJvmArgs>
...
i don't get any more useful info on failure,
--> https://pastebin.com/raw/DsrLxbeg
. unless there's more relevant debug info to squeeze out from config alone,
On 7/15/22 10:43 PM, Tilman Hausherr wrote:
The next that could be done is to debug this, if possible. Tim suggested the
file might be truncated.
I don't know if it is possible, if you can run tika in a debugger, then stop at
org.apache.pdfbox.pdfparser.PDFParser.initialParse() where the exception "Page tree
root must be a dictionary" happens. There try to access this.fileLen . Compare that
number to your file length.
, I'll figure out how to debug the tika-server java backend, while being fed by
the dovecot attachment submission task.
guessing 'jdb',
jdb -version
This is jdb version 18.0 (Java SE version 18.0.1)
is the right tool for that.