Hi all,
Using the Python SDK (the one installed using pip install
google-cloud-dataflow) I have implemented a very simple pipeline trying to
read from Datastore and print the result on a dataset with just 3 entities:
entities = p \
| 'read from datastore' >>
ReadFromDatastore(project='project-name', query=ds_query) \
| 'printing' >> beam.Map( lambda row : println(row) )
Running this locally, this seems to work fine. Running it on the cloud
this results in the following graph:
[image: Inline image 1]
but the execution stops in the GroupByKey step after which the rest of the
pipeline fails. Anything that should be added code-wise to make this
working?
Or is this only working locally for now?
Thanks :)
Matthias