Right. I think Tilman was suggesting adding new debug logging to
tika-server.

On Sat, Jul 16, 2022 at 12:43 PM PGNet Dev <[email protected]> wrote:

> with debug log levels set in tika config
>
> cat tika-server-config-custom.xml
>         <?xml version="1.0" encoding="UTF-8"?>
>                 <properties>
>                   <server>
>                     <params>
>                       <logLevel>debug</logLevel>
>                       <port>9998</port>
>                       <host>127.0.0.1</host>
>                       <javaPath>/usr/bin/java</javaPath>
>                       <noFork>false</noFork>
>                       <forkedJvmArgs>
>                         <arg>-Xms1g</arg>
>                         <arg>-Xmx1g</arg>
>                         <arg>-Dpdfbox.fontcache=/var/tika</arg>
>                         <arg>-Dlog4j2.debug</arg>
>                       </forkedJvmArgs>
>                 ...
>
> i don't get any more useful info on failure,
>
>         --> https://pastebin.com/raw/DsrLxbeg
>
> .  unless there's more relevant debug info to squeeze out from config
> alone,
>
> On 7/15/22 10:43 PM, Tilman Hausherr wrote:
> > The next that could be done is to debug this, if possible. Tim suggested
> the file might be truncated.
> >
> > I don't know if it is possible, if you can run tika in a debugger, then
> stop at org.apache.pdfbox.pdfparser.PDFParser.initialParse() where the
> exception "Page tree root must be a dictionary" happens. There try to
> access this.fileLen . Compare that number to your file length.
>
> , I'll figure out how to debug the tika-server java backend, while being
> fed by the dovecot attachment submission task.
> guessing 'jdb',
>
>         jdb -version
>           This is jdb version 18.0 (Java SE version 18.0.1)
>
> is the right tool for that.
>
>

Reply via email to