This is neat. On Fri, Sep 29, 2017 at 1:26 PM, Vilhelm von Ehrenheim < [email protected]> wrote:
> Hi Steve! > I have several pipelines that successfully use both numpy and scikit > models without any problems. I don't think I use Pandas atm but I'm sure > that is fine too. > > However, you might have to do some special stuff if you encounter > serializabillity problems. I also have tensorflow models in use, which were > a bit trickier to get to work because of serialization problems as you > mention. For that I needed to load one model instance per thread using > thread.local as is done here: > > https://github.com/tensorflow/transform/blob/master/ > tensorflow_transform/beam/impl.py > > (I realize that this file has evolved a bit since i last looked at it. > Might be worth looking at an older version of the file as its quite > advanced now.) > > So, when serializability is not possible, you can still initialize objects > locally in threads and let bundles that are executed in the same thread use > the locally instantiated objects instead of sharing one intantiation across > all bundles and threads. > > Br, > Vilhelm > > On 29 Sep 2017 17:17, "Steven DeLaurentis" <[email protected]> wrote: > > Hi everyone, > > Came across this interesting project recently. Read through some of the > docs and still had a question: is it possible to use NumPy/Pandas in the > DoFn of a Beam? Or does the requirement of a serializable function preclude > this possibility? > > Thanks, > Steve > > >
