with debug log levels set in tika config

cat tika-server-config-custom.xml
        <?xml version="1.0" encoding="UTF-8"?>
                <properties>
                  <server>
                    <params>
                      <logLevel>debug</logLevel>
                      <port>9998</port>
                      <host>127.0.0.1</host>
                      <javaPath>/usr/bin/java</javaPath>
                      <noFork>false</noFork>
                      <forkedJvmArgs>
                        <arg>-Xms1g</arg>
                        <arg>-Xmx1g</arg>
                        <arg>-Dpdfbox.fontcache=/var/tika</arg>
                        <arg>-Dlog4j2.debug</arg>
                      </forkedJvmArgs>
                ...

i don't get any more useful info on failure,

        --> https://pastebin.com/raw/DsrLxbeg

.  unless there's more relevant debug info to squeeze out from config alone,

On 7/15/22 10:43 PM, Tilman Hausherr wrote:
The next that could be done is to debug this, if possible. Tim suggested the 
file might be truncated.

I don't know if it is possible, if you can run tika in a debugger, then stop at 
org.apache.pdfbox.pdfparser.PDFParser.initialParse() where the exception "Page tree 
root must be a dictionary" happens. There try to access this.fileLen . Compare that 
number to your file length.

, I'll figure out how to debug the tika-server java backend, while being fed by 
the dovecot attachment submission task.
guessing 'jdb',

        jdb -version
          This is jdb version 18.0 (Java SE version 18.0.1)

is the right tool for that.

Reply via email to