Hi Matthias,

  Glad you are trying out the python beam sdk. The datastoreio should work
on any runner.

I was able to run the datastore wordcount example
<https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py>
successfully (both locally (DirectRunner) and on GCP). Here is the command,

Locally -> "python -m apache_beam.examples.cookbook.datastore_wordcount
--output gs://<my output dir> --project '<my project>' --kind '<my kind>'
--read_only"

GCP -> "python -m apache_beam.examples.cookbook.datastore_wordcount
 --output gs://<my output dir> --project '<my project>' --kind '<my kind>'
--read_only --staging_location gs://<my staging loc> --runner
DataflowPipelineRunner --job_name <my job name>"

Did you get a chance to look at the job worker logs, for any errors that
the pipeline is throwing? (that should give us a better idea).

Regards,
Vikas

On Thu, Jan 12, 2017 at 7:39 AM, Matthias Baetens <
[email protected]> wrote:

> Hi all,
>
> Using the Python SDK (the one installed using pip install
> google-cloud-dataflow) I have implemented a very simple pipeline trying to
> read from Datastore and print the result on a dataset with just 3 entities:
>
> entities = p \
>           | 'read from datastore' >> 
> ReadFromDatastore(project='project-name', query=ds_query) \
>           | 'printing' >> beam.Map( lambda row : println(row) )
>
>
> Running this locally, this seems to work fine. Running it on the cloud this 
> results in the following graph:
>
> [image: Inline image 1]
>
> but the execution stops in the GroupByKey step after which the rest of the
> pipeline fails. Anything that should be added code-wise to make this
> working?
> Or is this only working locally for now?
>
> Thanks :)
>
> Matthias
>
>

Reply via email to