Hi Vikas, Thanks for your reply! I was able to run both as well, and I think I figured out why it wasn't working: the project where I was staging was different from the one I was reading datastore from - there was a warning in the logs (saying PERMISSION_DENIED) while it failed with an error only in the read/flatten-step.
But it's all working now, thanks for your help! :) Matthias On Fri, Jan 13, 2017 at 6:51 AM, Vikas Kedigehalli <[email protected]> wrote: > Hi Matthias, > > Glad you are trying out the python beam sdk. The datastoreio should work > on any runner. > > I was able to run the datastore wordcount example > <https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py> > successfully (both locally (DirectRunner) and on GCP). Here is the command, > > Locally -> "python -m apache_beam.examples.cookbook.datastore_wordcount > --output gs://<my output dir> --project '<my project>' --kind '<my kind>' > --read_only" > > GCP -> "python -m apache_beam.examples.cookbook.datastore_wordcount > --output gs://<my output dir> --project '<my project>' --kind '<my kind>' > --read_only --staging_location gs://<my staging loc> --runner > DataflowPipelineRunner --job_name <my job name>" > > Did you get a chance to look at the job worker logs, for any errors that > the pipeline is throwing? (that should give us a better idea). > > Regards, > Vikas > > On Thu, Jan 12, 2017 at 7:39 AM, Matthias Baetens < > [email protected]> wrote: > >> Hi all, >> >> Using the Python SDK (the one installed using pip install >> google-cloud-dataflow) I have implemented a very simple pipeline trying to >> read from Datastore and print the result on a dataset with just 3 entities: >> >> entities = p \ >> | 'read from datastore' >> >> ReadFromDatastore(project='project-name', query=ds_query) \ >> | 'printing' >> beam.Map( lambda row : println(row) ) >> >> >> Running this locally, this seems to work fine. Running it on the cloud this >> results in the following graph: >> >> [image: Inline image 1] >> >> but the execution stops in the GroupByKey step after which the rest of >> the pipeline fails. Anything that should be added code-wise to make this >> working? >> Or is this only working locally for now? >> >> Thanks :) >> >> Matthias >> >>
