Best way to load heavy object into memory on nodes (python sdk)

Vilhelm von Ehrenheim Wed, 24 May 2017 06:19:19 -0700

Hi all!
I would like to load a heavy object (think ML model) into memory that
should be available in a ParDo for quick predictions.


What is the preferred way of doing this without loading the model for each
ParDo call (slow and will flood memory on the nodes). I don't seem to be
able to do it in the DoFn's __init__ block either as this is only done once
for all nodes (my guess here though) and then it breaks when replicated
internally (even on the DirectRunner, I suspect it is pickled and this
object cannot be pickled). If I load it as a side input it seems to still
be loaded into memory separately for each ParDo.

If there is a better way to handle it in Java I'm happy to do it there
instead. It was just easier to attack the problem w python as the models
were developed in python.

Any sort of pointers or tips are welcome!

Thanks!
Vilhelm von Ehrenheim

Best way to load heavy object into memory on nodes (python sdk)

Reply via email to