Hi Eila, You can find a list of dependencies installed in Dataflow workers in [1]. Dataflow workers will have a set of dependencies that will satisfy the requirements from setup.py.
Which bigquery library you are using? There is a google-cloud-bigquery==0.25.0 dependency, I am not sure where the 0.23.0 is coming from. Workers do not pick up libraries from the client environment as part of the job submission. I am not sure how datalab UI integration works however you have a few options for installing any set of dependencies in the workers. Using requirements.txt is one of those options. Ahmet [1] https://cloud.google.com/dataflow/docs/concepts/sdk-worker-dependencies#version-250_1 On Thu, Jul 12, 2018 at 8:51 AM, OrielResearch Eila Arich-Landkof < [email protected]> wrote: > Hi all, > > I am running python pipeline with google.cloud.bigquery library. > on the local runner, everything runs great > bigquery.__version__ is 0.28.0 > > on the dataflow runner, the version is 0.23.0 bigquery.__version__ is > 0.23.0 > and there are many API changes between these versions. > > What will be the best way to change the installed version on the workers? > I was assuming the the worker has all the master machine libraries > installed when the execution is done from datalab - is that true? > I am not generating any requirements.txt, the execution is done through > the run button on the datalab UI. > > > please help me solve that issue. > Thanks, > -- > Eila > www.orielresearch.org > https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/> > p.co <https://www.meetup.com/Deep-Learning-In-Production/>m/Deep- > Learning-In-Production/ > <https://www.meetup.com/Deep-Learning-In-Production/> > > >
