Re: Best way to load heavy object into memory on nodes (python sdk)

Ahmet Altay Thu, 25 May 2017 14:13:31 -0700

Awesome! I will remember your offer next time I visit Stockholm :)

On Thu, May 25, 2017 at 1:48 PM, Vilhelm von Ehrenheim <
[email protected]> wrote:


> Wow! That answer truly solved my problem. I would have never thought of
> using threading.local for this. Thank you so much! If you ever stop by
> Stockholm I'll be happy to buy you guys a beer!
>
>
> On Wed, May 24, 2017 at 6:38 PM, Ahmet Altay <[email protected]> wrote:
>
>> You can see an example implementation of Luke's suggestion in the
>> tensorflow-transform project [1]. Thread local is used in that case, this
>> will work for runners that re-use the same thread to execute bundles.
>>
>> [1] 
>> *https://github.com/tensorflow/transform/blob/master/tensorflow_transform/beam/impl.py#L253
>> <https://github.com/tensorflow/transform/blob/master/tensorflow_transform/beam/impl.py#L253>*
>>
>> On Wed, May 24, 2017 at 8:00 AM, Lukasz Cwik <[email protected]> wrote:
>>
>>> Why not use a singleton like pattern and have a function which either
>>> loads and caches the ML model from a side input or returns the singleton if
>>> it has been loaded.
>>> You'll want to use some form of locking to ensure that you really only
>>> load the ML model once.
>>>
>>> On Wed, May 24, 2017 at 6:18 AM, Vilhelm von Ehrenheim <
>>> [email protected]> wrote:
>>>
>>>> Hi all!
>>>> I would like to load a heavy object (think ML model) into memory that
>>>> should be available in a ParDo for quick predictions.
>>>>
>>>> What is the preferred way of doing this without loading the model for
>>>> each ParDo call (slow and will flood memory on the nodes). I don't seem to
>>>> be able to do it in the DoFn's __init__ block either as this is only done
>>>> once for all nodes (my guess here though) and then it breaks when
>>>> replicated internally (even on the DirectRunner, I suspect it is pickled
>>>> and this object cannot be pickled). If I load it as a side input it seems
>>>> to still be loaded into memory separately for each ParDo.
>>>>
>>>> If there is a better way to handle it in Java I'm happy to do it there
>>>> instead. It was just easier to attack the problem w python as the models
>>>> were developed in python.
>>>>
>>>> Any sort of pointers or tips are welcome!
>>>>
>>>> Thanks!
>>>> Vilhelm von Ehrenheim
>>>>
>>>
>>>
>>
>

Re: Best way to load heavy object into memory on nodes (python sdk)

Reply via email to