Hi Matthias, Glad you are trying out the python beam sdk. The datastoreio should work on any runner.
I was able to run the datastore wordcount example <https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py> successfully (both locally (DirectRunner) and on GCP). Here is the command, Locally -> "python -m apache_beam.examples.cookbook.datastore_wordcount --output gs://<my output dir> --project '<my project>' --kind '<my kind>' --read_only" GCP -> "python -m apache_beam.examples.cookbook.datastore_wordcount --output gs://<my output dir> --project '<my project>' --kind '<my kind>' --read_only --staging_location gs://<my staging loc> --runner DataflowPipelineRunner --job_name <my job name>" Did you get a chance to look at the job worker logs, for any errors that the pipeline is throwing? (that should give us a better idea). Regards, Vikas On Thu, Jan 12, 2017 at 7:39 AM, Matthias Baetens < [email protected]> wrote: > Hi all, > > Using the Python SDK (the one installed using pip install > google-cloud-dataflow) I have implemented a very simple pipeline trying to > read from Datastore and print the result on a dataset with just 3 entities: > > entities = p \ > | 'read from datastore' >> > ReadFromDatastore(project='project-name', query=ds_query) \ > | 'printing' >> beam.Map( lambda row : println(row) ) > > > Running this locally, this seems to work fine. Running it on the cloud this > results in the following graph: > > [image: Inline image 1] > > but the execution stops in the GroupByKey step after which the rest of the > pipeline fails. Anything that should be added code-wise to make this > working? > Or is this only working locally for now? > > Thanks :) > > Matthias > >
