On 7/18/22 11:56 AM, Tilman Hausherr wrote:
Something doesn't work properly on your side, I get a lot of "DEBUG" lines. I 
opened tika-server-standard-2.4.2-SNAPSHOT.jar with 7zip, extracted it, changed it, and 
put it back. This is how it looks (comment removed):

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<Configuration status="WARN">
   <Appenders>
     <Console name="Console" target="SYSTEM_ERR">
       <PatternLayout pattern="%-5p [%t] %d{HH:mm:ss,SSS} %c %m%n"/>
     </Console>
   </Appenders>
   <Loggers>
     <Root level="debug">
       <AppenderRef ref="Console"/>
     </Root>
   </Loggers>
</Configuration>

editing log4j2.xml directly in the jar, and repacking works.  no idea why other 
method doesn't.

        D="/srv/tika"
        F="tika-server-standard-2.4.2-20220718.165252-94.jar"
        cd ${D}
        rm -rf TMP
        mkdir -p TMP/mod
        cd TMP
        rm -f ${F}*
        wget 
https://repository.apache.org/content/groups/snapshots/org/apache/tika/tika-server-standard/2.4.2-SNAPSHOT/${F}
        cd mod
        jar -xfv ../${F}
        perl -pi -e 's|Root level="info"|Root level="debug"|g' log4j2.xml
        jar -cvmf META-INF/MANIFEST.MF ../mod.jar *

launch tika using 'mod.jar'

verify

        ls -al /srv/tika/tika-server.jar
                lrwxrwxrwx 1 root root 11 Jul 18 14:46 /srv/tika/tika-server.jar 
-> TMP/mod.jar

        systemctl status tika -ln0
                ● tika.service - Apache Tika server
                     Loaded: loaded (/etc/systemd/system/tika.service; enabled; 
vendor preset: disabled)
                     Active: active (running) since Mon 2022-07-18 21:24:40 
EDT; 18s ago
                   Main PID: 18935 (java)
                      Tasks: 54 (limit: 8811)
                     Memory: 174.0M
                        CPU: 24.491s
                     CGroup: /system.slice/tika.service
                             ├─ 18935 /usr/bin/java -jar 
/srv/tika/tika-server.jar -c /etc/tika/tika-server-config-custom.xml
                             └─ 18970 /usr/bin/java -Xms1g -Xmx1g 
-Dpdfbox.fontcache=/var/tika -Dlog4j2.debug -Djava.awt.headless=true -cp 
/srv/tika/tika-server.jar -Dtika.server.id= org.apache.tika.server.core.TikaServerProcess 
-h 127.0.0.1 -p 9998 -i "" -c /etc/tika/tika-server-config-custom.xml 
-forkedStatusFile /tmp/apache-tika-server-forked-tmp-1104448251575803884 -numRestarts 0

re-send message with attachment ...

verbose/DEBUG logs

        journalctl -f -u dovecot

                ->   https://pastebin.com/raw/sk5xevAM

The output contains a line with "DEBUG" and 
"org.apache.tika.parser.pdf.PDFParser".

I've just improved the output, I'm adding an MD5 checksum. This would be 
another indicator that something is wrong (or not).

indeed.

i now see in the logs

        Jul 18 21:28:23 mx-test tika[18970]: DEBUG [qtp977522995-24] 
21:28:23,264 org.apache.tika.parser.pdf.PDFParser File: 
/tmp/apache-tika-9115808773791090696.tmp, length: 104932, md5: 
092bf24b2cac33fac27965549c99613a

checking the original attachment

        ls -al Get_Started_With_Smallpdf.pdf
                -rw-r--r-- 1 root root 68K Jul 15 12:16 
Get_Started_With_Smallpdf.pdf

        file Get_Started_With_Smallpdf.pdf
                Get_Started_With_Smallpdf.pdf: PDF document, version 1.7

        md5sum Get_Started_With_Smallpdf.pdf
                14266e428c6a5f371c5abe164026c762  Get_Started_With_Smallpdf.pdf

checking,

        ls -al /tmp/apache-tika-9115808773791090696.tmp
                ls: cannot access '/tmp/apache-tika-9115808773791090696.tmp': 
No such file or directory

is not persisted.

in any case, the  /tmp file's NOT the same size as the orig pdf -- oddly, 
LARGER than the original file.
dunno what to make of that yet.

fwiw, the received attachment is verified to be identical to the sent original.

Reply via email to