RE: Apache Beam Python word count example is failing for Flink Runner

2019-05-29 Thread Anjana Pydi
Hi Ankur, Thanks for the reply!! Please find my response as below : 1. Pipeline does not fail for this. It shows a warning message as 'WARN org.apache.beam.runners.fnexecution.environment.DockerCommand - Unable to pull docker image $userId-docker-apache.bintray.io/beam/python:latest, cause:

Re: Apache Beam Python word count example is failing for Flink Runner

2019-05-29 Thread Ankur Goenka
1. I see, the warning is harmless as it tries to update the image if hosted on a remote repository. In this case, the remote repository does not have the image hence the warning. Its not an error because the local image is still used. On Wed, May 29, 2019 at 5:32 PM Anjana Pydi wrote: > Hi

Re: Apache Beam Python word count example is failing for Flink Runner

2019-05-29 Thread Ankur Goenka
+user@beam.apache.org 1. Docker images created are only added to local docker repository and are not pushed to bintray.io. I have not seen this exact error earlier so my best guess would be that the request to fetch the remote docker image failes when you try to do "docker pull -

Re: Question about --environment_type argument

2019-05-29 Thread Robert Bradshaw
I was wondering if there was an error earlier in the logs, e.g. at startup, about this missing parameter (given that it was linked in, I'd assume it at least tried to load). As for the other question, if the Python worker is doing the read, then yes, it needs access to HDFS as well. On Wed, May

Re: Question about --environment_type argument

2019-05-29 Thread Maximilian Michels
The environment for your Python function also needs to configure access for HDFS. Unfortunately, this is a separate from Java. I haven't used HDFS with Python myself, but it looks like you have to configure these options in HadoopFileSystemOptions: -hdfs_host -hdfs_port -hdfs_user Now, I

Re: Question about --environment_type argument

2019-05-29 Thread 青雉(祁明良)
Was there any indication in the logs that the hadoop file system attempted to load but failed? Nope, same message “No filesystem found for scheme hdfs” when HADOOP_CONF_DIR not set. I guess I met the last problem. When I load input data from HDFS, the python sdk worker fails. It complains about

send Avro messages to Kafka using Beam

2019-05-29 Thread Nicolas Delsaux
Hello all I have a beam job that I use to read messages from RabbitMq t write them in kafka. As of now, messages are read/written as JSON. Obviously, it's not that optimal storage, so i would like to transform the messages to avro prior to write them in Kafka. I have the URL of a schema

Re: Question about --environment_type argument

2019-05-29 Thread Robert Bradshaw
Glad you were able to figure it out! Agree the error message was suboptimal. Was there any indication in the logs that the hadoop file system attempted to load but failed? On Wed, May 29, 2019 at 4:41 AM 青雉(祁明良) wrote: > > Thanks guys, I got it. It was because Flink taskmanager docker missing