Re: ReadFromDatastore in the cloud

vikas rk Fri, 13 Jan 2017 09:26:20 -0800

Great, yw.

On Jan 13, 2017 2:22 AM, "Matthias Baetens" <[email protected]>
wrote:


> Hi Vikas,
>
> Thanks for your reply! I was able to run both as well, and I think I
> figured out why it wasn't working: the project where I was staging was
> different from the one I was reading datastore from - there was a warning
> in the logs (saying PERMISSION_DENIED) while it failed with an error only
> in the read/flatten-step.
>
> But it's all working now, thanks for your help! :)
>
> Matthias
>
> On Fri, Jan 13, 2017 at 6:51 AM, Vikas Kedigehalli <[email protected]>
> wrote:
>
>> Hi Matthias,
>>
>>   Glad you are trying out the python beam sdk. The datastoreio should
>> work on any runner.
>>
>> I was able to run the datastore wordcount example
>> <https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py>
>> successfully (both locally (DirectRunner) and on GCP). Here is the command,
>>
>> Locally -> "python -m apache_beam.examples.cookbook.datastore_wordcount
>> --output gs://<my output dir> --project '<my project>' --kind '<my kind>'
>> --read_only"
>>
>> GCP -> "python -m apache_beam.examples.cookbook.datastore_wordcount
>>  --output gs://<my output dir> --project '<my project>' --kind '<my kind>'
>> --read_only --staging_location gs://<my staging loc> --runner
>> DataflowPipelineRunner --job_name <my job name>"
>>
>> Did you get a chance to look at the job worker logs, for any errors that
>> the pipeline is throwing? (that should give us a better idea).
>>
>> Regards,
>> Vikas
>>
>> On Thu, Jan 12, 2017 at 7:39 AM, Matthias Baetens <
>> [email protected]> wrote:
>>
>>> Hi all,
>>>
>>> Using the Python SDK (the one installed using pip install
>>> google-cloud-dataflow) I have implemented a very simple pipeline trying to
>>> read from Datastore and print the result on a dataset with just 3 entities:
>>>
>>> entities = p \
>>>           | 'read from datastore' >> 
>>> ReadFromDatastore(project='project-name', query=ds_query) \
>>>           | 'printing' >> beam.Map( lambda row : println(row) )
>>>
>>>
>>> Running this locally, this seems to work fine. Running it on the cloud this 
>>> results in the following graph:
>>>
>>> [image: Inline image 1]
>>>
>>> but the execution stops in the GroupByKey step after which the rest of
>>> the pipeline fails. Anything that should be added code-wise to make this
>>> working?
>>> Or is this only working locally for now?
>>>
>>> Thanks :)
>>>
>>> Matthias
>>>
>>>
>

Re: ReadFromDatastore in the cloud

Reply via email to