Hi all!
I would like to load a heavy object (think ML model) into memory that
should be available in a ParDo for quick predictions.

What is the preferred way of doing this without loading the model for each
ParDo call (slow and will flood memory on the nodes). I don't seem to be
able to do it in the DoFn's __init__ block either as this is only done once
for all nodes (my guess here though) and then it breaks when replicated
internally (even on the DirectRunner, I suspect it is pickled and this
object cannot be pickled). If I load it as a side input it seems to still
be loaded into memory separately for each ParDo.

If there is a better way to handle it in Java I'm happy to do it there
instead. It was just easier to attack the problem w python as the models
were developed in python.

Any sort of pointers or tips are welcome!

Thanks!
Vilhelm von Ehrenheim

Reply via email to