Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-29 Thread Ufuk Celebi
Hey Aaron, I'm glad to hear that you resolved the issue. I think a docs contribution for this would be very helpful and could update this page: https://github.com/apache/flink/blob/master/docs/monitoring/debugging_classloading.md. If you want to create a separate JIRA ticket for this, ping me

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-29 Thread Aaron Levin
Hi Ufuk, I'll answer your question, but first I'll give you an update on how we resolved the issue: * adding `org.apache.hadoop.io.compress.SnappyCodec` to `classloader.parent-first-patterns.additional` in `flink-conf.yaml` (though, putting `org.apache.hadoop.util.NativeCodeLoader` also worked)

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-28 Thread Ufuk Celebi
Hey Aaron, sorry for the late reply (again). (1) I think that your final result is in line with what I have reproduced in https://issues.apache.org/jira/browse/FLINK-11402. (2) I think renaming the file would not help as it will still be loaded multiple times when the jobs restarts (as it

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-24 Thread Guowei Ma
This may be caused by a jvm process can only load a so once.So a triky way is to rename it。 发自我的 iPhone > 在 2019年1月25日,上午7:12,Aaron Levin 写道: > > Hi Ufuk, > > Update: I've pinned down the issue. It's multiple classloaders loading > `libhadoop.so`: > > ``` > failed to load native hadoop

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-24 Thread Aaron Levin
Hi Ufuk, I'm starting to believe the bug is much deeper than the originally reported error because putting the libraries in `/usr/lib` or `/lib` does not work. This morning I dug into why putting `libhadoop.so` into `/usr/lib` didn't work, despite that being in the `java.library.path` at the call

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-23 Thread Aaron Levin
Hi Ufuk, One more update: I tried copying all the hadoop native `.so` files (mainly `libhadoop.so`) into `/lib` and am I still experiencing the issue I reported. I also tried naively adding the `.so` files to the jar with the flink application and am still experiencing the issue I reported

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-23 Thread Aaron Levin
Hi Ufuk, Two updates: 1. As suggested in the ticket, I naively copied the every `.so` in `hadoop-3.0.0/lib/native/` into `/lib/` and this did not seem to help. My knowledge of how shared libs get picked up is hazy, so I'm not sure if blindly copying them like that should work. I did check what

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-22 Thread Aaron Levin
Hey Ufuk, So, I looked into this a little bit: 1. clarification: my issues are with the hadoop-related snappy libraries and not libsnappy itself (this is my bad for not being clearer, sorry!). I already have `libsnappy` on my classpath, but I am looking into including the hadoop snappy

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-22 Thread Aaron Levin
Hey, Thanks so much for the help! This is awesome. I'll start looking into all of this right away and report back. Best, Aaron Levin On Mon, Jan 21, 2019 at 5:16 PM Ufuk Celebi wrote: > Hey Aaron, > > sorry for the late reply. > > (1) I think I was able to reproduce this issue using

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-21 Thread Ufuk Celebi
Hey Aaron, sorry for the late reply. (1) I think I was able to reproduce this issue using snappy-java. I've filed a ticket here: https://issues.apache.org/jira/browse/FLINK-11402. Can you check the ticket description whether it's in line with what you are experiencing? Most importantly, do you