> I am already using Tez (sorry, forgot to mention this), and my goal is >indeed to build the instance once per container.
Put a log line in your UDF init() and check if it is being called multiple times per container. If you¹re loading the data everytime, then that might be something to fix. The other aspect is that there¹s GC pauses that can happen due to that and such extraneous reasons for the slow-down. But first, look at how many times you are loading the distributed cache data per container. Cheers, Gopal