On 7/18/22 11:56 AM, Tilman Hausherr wrote:
Something doesn't work properly on your side, I get a lot of "DEBUG" lines. I
opened tika-server-standard-2.4.2-SNAPSHOT.jar with 7zip, extracted it, changed it, and
put it back. This is how it looks (comment removed):
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<Configuration status="WARN">
<Appenders>
<Console name="Console" target="SYSTEM_ERR">
<PatternLayout pattern="%-5p [%t] %d{HH:mm:ss,SSS} %c %m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="debug">
<AppenderRef ref="Console"/>
</Root>
</Loggers>
</Configuration>
editing log4j2.xml directly in the jar, and repacking works. no idea why other
method doesn't.
D="/srv/tika"
F="tika-server-standard-2.4.2-20220718.165252-94.jar"
cd ${D}
rm -rf TMP
mkdir -p TMP/mod
cd TMP
rm -f ${F}*
wget
https://repository.apache.org/content/groups/snapshots/org/apache/tika/tika-server-standard/2.4.2-SNAPSHOT/${F}
cd mod
jar -xfv ../${F}
perl -pi -e 's|Root level="info"|Root level="debug"|g' log4j2.xml
jar -cvmf META-INF/MANIFEST.MF ../mod.jar *
launch tika using 'mod.jar'
verify
ls -al /srv/tika/tika-server.jar
lrwxrwxrwx 1 root root 11 Jul 18 14:46 /srv/tika/tika-server.jar
-> TMP/mod.jar
systemctl status tika -ln0
● tika.service - Apache Tika server
Loaded: loaded (/etc/systemd/system/tika.service; enabled;
vendor preset: disabled)
Active: active (running) since Mon 2022-07-18 21:24:40
EDT; 18s ago
Main PID: 18935 (java)
Tasks: 54 (limit: 8811)
Memory: 174.0M
CPU: 24.491s
CGroup: /system.slice/tika.service
├─ 18935 /usr/bin/java -jar
/srv/tika/tika-server.jar -c /etc/tika/tika-server-config-custom.xml
└─ 18970 /usr/bin/java -Xms1g -Xmx1g
-Dpdfbox.fontcache=/var/tika -Dlog4j2.debug -Djava.awt.headless=true -cp
/srv/tika/tika-server.jar -Dtika.server.id= org.apache.tika.server.core.TikaServerProcess
-h 127.0.0.1 -p 9998 -i "" -c /etc/tika/tika-server-config-custom.xml
-forkedStatusFile /tmp/apache-tika-server-forked-tmp-1104448251575803884 -numRestarts 0
re-send message with attachment ...
verbose/DEBUG logs
journalctl -f -u dovecot
-> https://pastebin.com/raw/sk5xevAM
The output contains a line with "DEBUG" and
"org.apache.tika.parser.pdf.PDFParser".
I've just improved the output, I'm adding an MD5 checksum. This would be
another indicator that something is wrong (or not).
indeed.
i now see in the logs
Jul 18 21:28:23 mx-test tika[18970]: DEBUG [qtp977522995-24]
21:28:23,264 org.apache.tika.parser.pdf.PDFParser File:
/tmp/apache-tika-9115808773791090696.tmp, length: 104932, md5:
092bf24b2cac33fac27965549c99613a
checking the original attachment
ls -al Get_Started_With_Smallpdf.pdf
-rw-r--r-- 1 root root 68K Jul 15 12:16
Get_Started_With_Smallpdf.pdf
file Get_Started_With_Smallpdf.pdf
Get_Started_With_Smallpdf.pdf: PDF document, version 1.7
md5sum Get_Started_With_Smallpdf.pdf
14266e428c6a5f371c5abe164026c762 Get_Started_With_Smallpdf.pdf
checking,
ls -al /tmp/apache-tika-9115808773791090696.tmp
ls: cannot access '/tmp/apache-tika-9115808773791090696.tmp':
No such file or directory
is not persisted.
in any case, the /tmp file's NOT the same size as the orig pdf -- oddly,
LARGER than the original file.
dunno what to make of that yet.
fwiw, the received attachment is verified to be identical to the sent original.