Great, yw. On Jan 13, 2017 2:22 AM, "Matthias Baetens" <[email protected]> wrote:
> Hi Vikas, > > Thanks for your reply! I was able to run both as well, and I think I > figured out why it wasn't working: the project where I was staging was > different from the one I was reading datastore from - there was a warning > in the logs (saying PERMISSION_DENIED) while it failed with an error only > in the read/flatten-step. > > But it's all working now, thanks for your help! :) > > Matthias > > On Fri, Jan 13, 2017 at 6:51 AM, Vikas Kedigehalli <[email protected]> > wrote: > >> Hi Matthias, >> >> Glad you are trying out the python beam sdk. The datastoreio should >> work on any runner. >> >> I was able to run the datastore wordcount example >> <https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py> >> successfully (both locally (DirectRunner) and on GCP). Here is the command, >> >> Locally -> "python -m apache_beam.examples.cookbook.datastore_wordcount >> --output gs://<my output dir> --project '<my project>' --kind '<my kind>' >> --read_only" >> >> GCP -> "python -m apache_beam.examples.cookbook.datastore_wordcount >> --output gs://<my output dir> --project '<my project>' --kind '<my kind>' >> --read_only --staging_location gs://<my staging loc> --runner >> DataflowPipelineRunner --job_name <my job name>" >> >> Did you get a chance to look at the job worker logs, for any errors that >> the pipeline is throwing? (that should give us a better idea). >> >> Regards, >> Vikas >> >> On Thu, Jan 12, 2017 at 7:39 AM, Matthias Baetens < >> [email protected]> wrote: >> >>> Hi all, >>> >>> Using the Python SDK (the one installed using pip install >>> google-cloud-dataflow) I have implemented a very simple pipeline trying to >>> read from Datastore and print the result on a dataset with just 3 entities: >>> >>> entities = p \ >>> | 'read from datastore' >> >>> ReadFromDatastore(project='project-name', query=ds_query) \ >>> | 'printing' >> beam.Map( lambda row : println(row) ) >>> >>> >>> Running this locally, this seems to work fine. Running it on the cloud this >>> results in the following graph: >>> >>> [image: Inline image 1] >>> >>> but the execution stops in the GroupByKey step after which the rest of >>> the pipeline fails. Anything that should be added code-wise to make this >>> working? >>> Or is this only working locally for now? >>> >>> Thanks :) >>> >>> Matthias >>> >>> >
