Hi Vikas,

Thanks for your reply! I was able to run both as well, and I think I
figured out why it wasn't working: the project where I was staging was
different from the one I was reading datastore from - there was a warning
in the logs (saying PERMISSION_DENIED) while it failed with an error only
in the read/flatten-step.

But it's all working now, thanks for your help! :)

Matthias

On Fri, Jan 13, 2017 at 6:51 AM, Vikas Kedigehalli <vikasrk....@gmail.com>
wrote:

> Hi Matthias,
>
>   Glad you are trying out the python beam sdk. The datastoreio should work
> on any runner.
>
> I was able to run the datastore wordcount example
> <https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py>
> successfully (both locally (DirectRunner) and on GCP). Here is the command,
>
> Locally -> "python -m apache_beam.examples.cookbook.datastore_wordcount
> --output gs://<my output dir> --project '<my project>' --kind '<my kind>'
> --read_only"
>
> GCP -> "python -m apache_beam.examples.cookbook.datastore_wordcount
>  --output gs://<my output dir> --project '<my project>' --kind '<my kind>'
> --read_only --staging_location gs://<my staging loc> --runner
> DataflowPipelineRunner --job_name <my job name>"
>
> Did you get a chance to look at the job worker logs, for any errors that
> the pipeline is throwing? (that should give us a better idea).
>
> Regards,
> Vikas
>
> On Thu, Jan 12, 2017 at 7:39 AM, Matthias Baetens <
> matthias.baet...@datatonic.com> wrote:
>
>> Hi all,
>>
>> Using the Python SDK (the one installed using pip install
>> google-cloud-dataflow) I have implemented a very simple pipeline trying to
>> read from Datastore and print the result on a dataset with just 3 entities:
>>
>> entities = p \
>>           | 'read from datastore' >> 
>> ReadFromDatastore(project='project-name', query=ds_query) \
>>           | 'printing' >> beam.Map( lambda row : println(row) )
>>
>>
>> Running this locally, this seems to work fine. Running it on the cloud this 
>> results in the following graph:
>>
>> [image: Inline image 1]
>>
>> but the execution stops in the GroupByKey step after which the rest of
>> the pipeline fails. Anything that should be added code-wise to make this
>> working?
>> Or is this only working locally for now?
>>
>> Thanks :)
>>
>> Matthias
>>
>>

Reply via email to