On 7/15/22 10:43 PM, Tilman Hausherr wrote:
That's what I also get.
The next that could be done is to debug this, if possible. Tim suggested the
file might be truncated.
I don't know if it is possible, if you can run tika in a debugger, then stop at
org.apache.pdfbox.pdfparser.PDFParser.initialParse() where the exception "Page tree
root must be a dictionary" happens. There try to access this.fileLen . Compare that
number to your file length.
1st stab at debugging this, i launch tika with debug tooling,
/usr/bin/java \
-agentlib:jdwp=transport=dt_socket,address=127.0.0.1:8080,server=y,suspend=n \
-jar /srv/tika/tika-server.jar \
-c /etc/tika/tika-server-config-custom.xml
in another shell, attach the debugger
jdb -attach 127.0.0.1:8080
then set the bp
> stop in org.apache.pdfbox.pdfparser.PDFParser.initialParse
Deferring breakpoint
org.apache.pdfbox.pdfparser.PDFParser.initialParse.
It will be set after the class is loaded.
i then send/receive the email with PDF attachment -- through dovecot>tika -- as
above
i again see the scan-fail error in tika logs, but never see a
Breakpoint hit: ...
dumping at prompt anyway,
> dump this.fileLen
No current thread
this.fileLen = null
> threads
Group system:
(java.lang.ref.Reference$ReferenceHandler)2788 Reference
Handler running
(java.lang.ref.Finalizer$FinalizerThread)2789 Finalizer
cond. waiting
(java.lang.Thread)2790 Signal
Dispatcher running
(java.lang.Thread)2791 Notification
Thread running
(java.lang.Thread)2792 process reaper
running
Group main:
(java.lang.Thread)1 main
cond. waiting
(java.lang.Thread)2780
pool-2-thread-1 cond. waiting
(java.lang.Thread)2795 Thread-2
running
Group InnocuousThreadGroup:
(jdk.internal.misc.InnocuousThread)2796 Common-Cleaner
cond. waiting
am i even setting the stop correctly, in order to get at the fail?
An alternative would be that 1) I add the file length in PDFBox exception 2)
you create a Tika build with the PDFBox snapshot.
atm, i'm not building tika-server myself. rather, using just the DL'd runnable
jar from
https://dlcdn.apache.org/tika/2.4.1/tika-server-standard-2.4.1.jar