On 7/17/22 11:52 AM, Tilman Hausherr wrote:
https://issues.apache.org/jira/browse/TIKA-3819
This will show filename and length but only if logging is in DEBUG log level. 
The modified version will appear at
https://repository.apache.org/content/groups/snapshots/org/apache/tika/
in a few hours.

thx o/

checking

        https://issues.apache.org/jira/browse/TIKA-3819

i see

        Fix Version/s: 2.4.2
        https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/697/
        Build #697 (Jul 17, 2022, 3:47:56 PM)

i installed

        tika-server-standard-2.4.2-20220717.154907-90.jar

set

        cat tika-server-config-custom.xml
                <?xml version="1.0" encoding="UTF-8"?>

                <properties>
                  <server>
                    <params>
!                     <logLevel>debug</logLevel>
                      ...
                      <forkedJvmArgs>
                        ...
!                       <arg>-Dlog4j2.debug</arg>
                        ...

and launched,

        systemctl status tika -l
                ● tika.service - Apache Tika server
                     Loaded: loaded (/etc/systemd/system/tika.service; enabled; 
vendor preset: disabled)
                     Active: active (running) since Sun 2022-07-17 20:51:36 
EDT; 5min ago
                   Main PID: 25001 (java)
                      Tasks: 54 (limit: 8811)
                     Memory: 208.3M
                        CPU: 31.115s
                     CGroup: /system.slice/tika.service
                             ├─ 25001 /usr/bin/java -jar 
/srv/tika/tika-server.jar -c /etc/tika/tika-server-config-custom.xml
                             └─ 25039 /usr/bin/java -Xms1g -Xmx1g 
-Dpdfbox.fontcache=/var/tika -Dlog4j2.debug -Djava.awt.headless=true -cp 
/srv/tika/tika-server.jar -Dtika.server.id= org.apache.tika.server.core.TikaServerProcess 
-h 127.0.0.1 -p 9998 -i "" -c /etc/tika/tika-server-config-custom.xml 
-forkedStatusFile /tmp/apache-tika-server-forked-tmp-8013562591697588923 -numRestarts 0

                Jul 17 20:52:15 mx-test tika[25039]:         at 
org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:198) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:52:15 mx-test tika[25039]:         at 
org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:52:15 mx-test tika[25039]:         at 
org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1230) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:52:15 mx-test tika[25039]:         at 
org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1204) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:52:15 mx-test tika[25039]:         at 
org.apache.tika.parser.pdf.PDFParser.getPDDocument(PDFParser.java:291) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:52:15 mx-test tika[25039]:         at 
org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:178) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:52:15 mx-test tika[25039]:         at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:52:15 mx-test tika[25039]:         ... 37 more
                Jul 17 20:52:15 mx-test tika[25039]: ERROR [qtp1401737458-25] 
20:52:15,597 org.apache.cxf.jaxrs.utils.JAXRSUtils Problem with writing the 
data, class 
org.apache.tika.server.core.resource.TikaResource$$Lambda$344/0x0000000800eb2e78,
 ContentType: text/plain
                Jul 17 20:52:15 mx-test tika[25039]: TRACE StatusLogger 
Log4jLoggerFactory.getContext() found anchor class 
org.apache.cxf.common.logging.Slf4jLogger

on receipt of email + pdf attachment, FAIL as before,

        journalctl -f -u tika

                Jul 17 20:59:42 mx-test tika[25039]: INFO  [qtp1401737458-25] 
20:59:42,066 org.apache.tika.server.core.resource.TikaResource /tika 
(application/pdf)
                Jul 17 20:59:42 mx-test tika[25039]: WARN  [qtp1401737458-25] 
20:59:42,243 org.apache.pdfbox.pdfparser.COSParser The end of the stream 
doesn't point to the correct offset, using workaround to read the stream, 
stream start position: 104319, length: 366, expected end position: 104685
                Jul 17 20:59:42 mx-test tika[25039]: ERROR [qtp1401737458-25] 
20:59:42,245 org.apache.pdfbox.filter.FlateFilter FlateFilter: stop reading 
corrupt stream due to a DataFormatException
                Jul 17 20:59:42 mx-test tika[25039]: WARN  [qtp1401737458-25] 
20:59:42,467 org.apache.pdfbox.pdfparser.COSParser The end of the stream 
doesn't point to the correct offset, using workaround to read the stream, 
stream start position: 101704, length: 1475, expected end position: 103179
                Jul 17 20:59:42 mx-test tika[25039]: ERROR [qtp1401737458-25] 
20:59:42,469 org.apache.pdfbox.filter.FlateFilter FlateFilter: stop reading 
corrupt stream due to a DataFormatException
                Jul 17 20:59:42 mx-test tika[25039]: WARN  [qtp1401737458-25] 
20:59:42,481 org.apache.pdfbox.pdfparser.COSParser The end of the stream 
doesn't point to the correct offset, using workaround to read the stream, 
stream start position: 101514, length: 66, expected end position: 101580
                Jul 17 20:59:42 mx-test tika[25039]: ERROR [qtp1401737458-25] 
20:59:42,482 org.apache.pdfbox.filter.FlateFilter FlateFilter: stop reading 
corrupt stream due to a DataFormatException
                Jul 17 20:59:42 mx-test tika[25039]: WARN  [qtp1401737458-25] 
20:59:42,493 org.apache.pdfbox.pdfparser.COSParser The end of the stream 
doesn't point to the correct offset, using workaround to read the stream, 
stream start position: 2011, length: 2482, expected end position: 4493
                Jul 17 20:59:42 mx-test tika[25039]: WARN  [qtp1401737458-25] 
20:59:42,495 org.apache.tika.server.core.resource.TikaResource tika/: Text 
extraction failed (Get_Started_With_Smallpdf.pdf)
                Jul 17 20:59:42 mx-test tika[25039]: 
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from 
org.apache.tika.parser.pdf.PDFParser@4f3e230b
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:304) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:167) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:152) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:55) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.tika.server.core.resource.TikaResource.parse(TikaResource.java:352) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.tika.server.core.resource.TikaResource.lambda$produceText$1(TikaResource.java:502)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:177)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1616) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:249)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:122)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:84)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.eclipse.jetty.server.Server.handle(Server.java:516) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
 ~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
java.lang.Thread.run(Thread.java:833) ~[?:?]
                Jul 17 20:59:42 mx-test tika[25039]: Caused by: 
java.io.IOException: Page tree root must be a dictionary
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:198) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1230) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1204) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.tika.parser.pdf.PDFParser.getPDDocument(PDFParser.java:291) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:178) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) 
~[tika-server-standard-2.4.2-20220717.154907-90.jar:2.4.2-SNAPSHOT]
                Jul 17 20:59:42 mx-test tika[25039]:         ... 37 more
                Jul 17 20:59:42 mx-test tika[25039]: ERROR [qtp1401737458-25] 
20:59:42,499 org.apache.cxf.jaxrs.utils.JAXRSUtils Problem with writing the 
data, class 
org.apache.tika.server.core.resource.TikaResource$$Lambda$344/0x0000000800eb2e78,
 ContentType: text/plain


where, the attachment is,

        pdfinfo Get_Started_With_Smallpdf.pdf
                Creator:         Adobe InDesign 15.1 (Macintosh)
                Producer:        Adobe PDF Library 15.0
                CreationDate:    Wed Oct 14 11:08:10 2020 EDT
                ModDate:         Wed Oct 14 11:08:10 2020 EDT
                Custom Metadata: no
                Metadata Stream: yes
                Tagged:          no
                UserProperties:  no
                Suspects:        no
                Form:            none
                JavaScript:      no
                Pages:           1
                Encrypted:       no
                Page size:       595.276 x 841.89 pts (A4)
                Page rot:        0
                File size:       69451 bytes
                Optimized:       no
                PDF version:     1.7

i don't see any additional DEBUG info, or the file length targeted.

additional steps/config needed to enable the DEBUG output from the snapshot?

Reply via email to