On 7/17/22 10:24 AM, Tilman Hausherr wrote:
That is in pdfbox, not in tika.
There's also a PDFParser.parse() in tika, which then calls PDDocument.load().
However I don't know if this will use the InputStream call, or the one with
File. If it uses the one with the file, then check the length and content of
the file (tika does sometimes store streams into a temporary file).
i see the same results -- i.e., nada -- with explicit stop in PDFParser.parse
Re the failed build: remove the segment with ossindex-maven-plugin from the
parent pom.xml . That plugin (or rather, the company behind it) has gone crazy,
we've partly disabled it in the current trunk.
no idea what specifically to do there.
trying building 'main' with those partial disables, rather than '2.4.1', that
also fails,
INFO [pool-6-thread-1] 10:59:03,890 org.apache.tika.pipes.PipesClient
pipesClientId=2 parse success: myId in 58 ms
ERROR [main] 10:59:03,907 org.apache.tika.pipes.PipesServer oom: myId
java.lang.OutOfMemoryError: oom message
at
jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:67)
~[?:?]
at
java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]
at java.lang.reflect.Constructor.newInstance(Constructor.java:483)
~[?:?]
at org.apache.tika.parser.mock.MockParser.throwIt(MockParser.java:428)
~[test-classes/:?]
at org.apache.tika.parser.mock.MockParser.throwIt(MockParser.java:374)
~[test-classes/:?]
at
org.apache.tika.parser.mock.MockParser.executeAction(MockParser.java:155)
~[test-classes/:?]
at org.apache.tika.parser.mock.MockParser.parse(MockParser.java:134)
~[test-classes/:?]
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
~[classes/:?]
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
~[classes/:?]
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:167)
~[classes/:?]
at
org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:163)
~[classes/:?]
at
org.apache.tika.pipes.PipesServer.parseRecursive(PipesServer.java:540)
~[classes/:?]
at org.apache.tika.pipes.PipesServer.parse(PipesServer.java:473)
~[classes/:?]
at org.apache.tika.pipes.PipesServer.parseIt(PipesServer.java:420)
~[classes/:?]
at
org.apache.tika.pipes.PipesServer.actuallyParse(PipesServer.java:340)
~[classes/:?]
at org.apache.tika.pipes.PipesServer.parseOne(PipesServer.java:311)
~[classes/:?]
at
org.apache.tika.pipes.PipesServer.processRequests(PipesServer.java:232)
~[classes/:?]
at org.apache.tika.pipes.PipesServer.main(PipesServer.java:168)
~[classes/:?]
my 1st priority is a stable dovecot search env, so i've removed tika from use &
its config.
for now, i'll have to pass this^ on to an admin here that works regularly in a
full java env, and won't have to keep guessing at how to debug the app.