How to enable "enableUnsecureFeatures" in Tika Server 2.0.0?

2021-06-02 Thread David Martinez
Hi I noted that the "enableFileUrl" and "enableUnsecureFeatures" parameters for tika-server.jar don't work in the 2.0.0 version. Instead I tried to enable it using a config file but it fails: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP)

Re: sandboxing Tika

2021-06-02 Thread Cristian Zamfir
Processing untrusted content + motivated attackers means that the JVM + docker may not be sufficient to prevent moving laterally from a compromised container or taking data out from the container. There are of course measures to do network-level isolation of the container if that happens, but some

Re: Tika app JAR 2.0.0 shows warnings that makes output unparseable

2021-06-02 Thread David Martinez
Hi again, I downloaded and compiled the master branch and all tests passed. Thanks for your help!

Re: sandboxing Tika

2021-06-02 Thread Tim Allison
Interesting. I haven't done this personally. What are your goals/fears? How is Docker not enough to, erm, contain Tika? On Wed, Jun 2, 2021 at 11:04 AM Cristian Zamfir wrote: > > Hi, > > I was looking at options to sandbox Tika (running in Docker). > > One option is seccomp, but I suspect

sandboxing Tika

2021-06-02 Thread Cristian Zamfir
Hi, I was looking at options to sandbox Tika (running in Docker). One option is seccomp, but I suspect that many syscals are being used by the JVM so it will not be very useful. Another option is gvisor https://gvisor.dev/docs Did anyone try any of these, do you have experience with them?

Re: Tika app JAR 2.0.0 shows warnings that makes output unparseable

2021-06-02 Thread David Martinez
Thanks Tim. Will try to build and test it. Will keep you posted

Re: Tika app JAR 2.0.0 shows warnings that makes output unparseable

2021-06-02 Thread Tim Allison
Hi David, I just fixed this in our main branch. If you're able to build locally or grab an artifact from Jenkins and let us know if this fixes the issue for you, I'd appreciate it. Please let us know what else you find! Thank you! Best, Tim On Fri, May 28,

Fwd: best practices for avoiding OOM for tika docker

2021-06-02 Thread Tim Allison
dropped cc... >I noticed that Tika prints in the logs OOM (null), but seems to recover by >itself even when not using -spawnChild. Is this the expected behavior? When not in -spawnChild mode, Tika is catching OOM exceptions (when it can), but it isn't "recovering"... the jvm may be in an

Re: best practices for avoiding OOM for tika docker

2021-06-02 Thread Nick Burch
On Wed, 2 Jun 2021, Cristian Zamfir wrote: 1. Do you have a recommendation for a stress test that would allow me to easily test OOM behavior? Depends what kind of OOM you're interested in. If you fire a lot of memory-hungry documents at a single server at once, you can trigger an OOM.

Re: best practices for avoiding OOM for tika docker

2021-06-02 Thread Cristian Zamfir
Hi! I noticed that Tika prints in the logs OOM (null), but seems to recover by itself even when not using -spawnChild. Is this the expected behavior? I am trying to figure out when logs containing "OOM" are critical and would require a container restart. I also wanted to bring up two of my