Hello! *tl;dr*: settings in `env.java.opts` seem to stop having impact when a job is canceled or fails and then is restarted (with or without savepoint/checkpoints). If I restart the task-managers, the `env.java.opts` seem to start having impact again and our job will run without failure. More below.
We use consume Snappy-compressed sequence files in our flink job. This requires access to the hadoop native libraries. In our `flink-conf.yaml` for both the task manager and the job manager, we put: ``` env.java.opts: -Djava.library.path=/usr/local/hadoop/lib/native ``` If I launch our job on freshly-restarted task managers, the job operates fine. If at some point I cancel the job or if the job restarts for some other reason, the job will begin to crashloop because it tries to open a Snappy-compressed file but doesn't have access to the codec from the native hadoop libraries in `/usr/local/hadoop/lib/native`. If I then restart the task manager while the job is crashlooping, the job is start running without any codec failures. The only reason I can conjure that would cause the Snappy compression to fail is if the `env.java.opts` were not being passed through to the job on restart for some reason. Does anyone know what's going on? Am I missing some additional configuration? I really appreciate any help! About our setup: - Flink Version: 1.7.0 - Deployment: Standalone in HA - Hadoop/S3 setup: we do *not* set `HADOOP_CLASSPATH`. We use Flinkās shaded jars to access our files in S3. We do not use the `bundled-with-hadoop` distribution of Flink. Best, Aaron Levin