Right. I think Tilman was suggesting adding new debug logging to tika-server.
On Sat, Jul 16, 2022 at 12:43 PM PGNet Dev <[email protected]> wrote: > with debug log levels set in tika config > > cat tika-server-config-custom.xml > <?xml version="1.0" encoding="UTF-8"?> > <properties> > <server> > <params> > <logLevel>debug</logLevel> > <port>9998</port> > <host>127.0.0.1</host> > <javaPath>/usr/bin/java</javaPath> > <noFork>false</noFork> > <forkedJvmArgs> > <arg>-Xms1g</arg> > <arg>-Xmx1g</arg> > <arg>-Dpdfbox.fontcache=/var/tika</arg> > <arg>-Dlog4j2.debug</arg> > </forkedJvmArgs> > ... > > i don't get any more useful info on failure, > > --> https://pastebin.com/raw/DsrLxbeg > > . unless there's more relevant debug info to squeeze out from config > alone, > > On 7/15/22 10:43 PM, Tilman Hausherr wrote: > > The next that could be done is to debug this, if possible. Tim suggested > the file might be truncated. > > > > I don't know if it is possible, if you can run tika in a debugger, then > stop at org.apache.pdfbox.pdfparser.PDFParser.initialParse() where the > exception "Page tree root must be a dictionary" happens. There try to > access this.fileLen . Compare that number to your file length. > > , I'll figure out how to debug the tika-server java backend, while being > fed by the dovecot attachment submission task. > guessing 'jdb', > > jdb -version > This is jdb version 18.0 (Java SE version 18.0.1) > > is the right tool for that. > >
