Hi Ankur, Thanks for the reply!! Please find my response as below :
1. Pipeline does not fail for this. It shows a warning message as 'WARN org.apache.beam.runners.fnexecution.environment.DockerCommand - Unable to pull docker image $userId-docker-apache.bintray.io/beam/python:latest, cause: Received exit code 1 for command 'docker pull $userId-docker-apache.bintray.io/beam/python:latest'. stderr: Error response from daemon: Get https://$userId-docker-apache.bintray.io/v2/: x509: certificate is valid for *.bintray.io, bintray.io, not $userId-docker-apache.bintray.io ' 2. Logging is working as expected now. I need to set up log levels properly at jobserver and the script side. Regards, Anjana ________________________________ From: Ankur Goenka [[email protected]] Sent: Wednesday, May 29, 2019 3:55 PM To: Anjana Pydi; [email protected] Cc: Richard Amrith Lourdu Subject: Re: Apache Beam Python word count example is failing for Flink Runner [email protected]<mailto:[email protected]> 1. Docker images created are only added to local docker repository and are not pushed to bintray.io<http://bintray.io>. I have not seen this exact error earlier so my best guess would be that the request to fetch the remote docker image failes when you try to do "docker pull <your user name>-docker-apache.bintray.io/beam/python<http://docker-apache.bintray.io/beam/python>". However, as you already have docker image in your local docker repository, you don't have to pull it from bintray. You can check the images in your local docker repository by "docker images". Question: Do you see this error in the pipeline execution? And does your pipeline fail after this error? 2. All the logs from SDK Harness go to Flink. You should see them in the flink log files. Question: Are you referring to SDK Harness console? On Wed, May 29, 2019 at 2:09 PM Anjana Pydi <[email protected]<mailto:[email protected]>> wrote: Hi Ankur, Thank you very much for your suggestions. As you mentioned, it worked for local file system after adding a loopback worker. I still have few more questions like below. It would be great if you can suggest something on these too please. 1. After building SDK harness container using ./gradlew -p sdks/python/container docker , It gives below error when trying to do docker pull : Using default tag: latest Error response from daemon: Get https://$userId-docker- apache.bintray.io/v2/<http://apache.bintray.io/v2/>: x509: certificate is valid for *.bintray.io<http://bintray.io>, bintray.io<http://bintray.io>, not $userId-docker- apache.bintray.io<http://apache.bintray.io> 2. As different log levels can be used to control the verbosity of logging, When using the logging level as 'INFO' and attempt to log any messages using 'logging.info<http://logging.info>' , it does not show the messages on the console. Regards, Anjana ________________________________ From: Ankur Goenka [[email protected]<mailto:[email protected]>] Sent: Tuesday, May 28, 2019 8:27 PM To: [email protected]<mailto:[email protected]> Subject: Re: Apache Beam Python word count example is failing for Flink Runner Hi Anjana, When using local file system, Docker containers started during the pipeline execution can only access container's local filesystem. Also, multiple containers are started during pipeline execution which do not have access to other container's file system. So, in this case, files created by container 1 was not accessible to container 2 which might have been started later. Their are a few ways to solve this problem. 1. Use globally accessible file system HDFS, GCS etc. 2. Use loopback worker which does not use docker containers and hence use host's native local filesystem. Command: python -m apache_beam.examples.wordcount --input=local_input_file --output=local_output_file --job_endpoint=localhost:8099 --experiments beam_fn_api --runner=PortableRunner --environment_cache_millis=10000 3. Add caching to the docker containers. This is not 100% reliable as even after caching, its possible that multiple containers get started. Command: python -m apache_beam.examples.wordcount --input=local_input_file --output=local_output_file --job_endpoint=localhost:8099 --experiments beam_fn_api --runner=PortableRunner --environment_type=LOOPBACK On Tue, May 28, 2019 at 7:30 PM Anjana Pydi <[email protected]<mailto:[email protected]>> wrote: Hi Team, I am trying to run Apache Beam Python word count example on Apache's Flink with PortableRunner using a SDK harness/Job Server via Docker. 1.After building SDK harness container using ./gradlew -p sdks/python/container docker , It gives below error when trying to do docker pull : Using default tag: latest Error response from daemon: Get https://$userId-docker- apache.bintray.io/v2/<http://apache.bintray.io/v2/>: x509: certificate is valid for *.bintray.io<http://bintray.io>, bintray.io<http://bintray.io>, not $userId-docker- apache.bintray.io<http://apache.bintray.io> 2. Started Flink portable Jobservice endpoint using ./gradlew beam-runners-flink_1.5-job-server:runShadow.When trying to run apache beam word count example using below command, python -m apache_beam.examples.wordcount --input=local_input_file --output=local_output_file --job_endpoint=localhost:8099 --experiments beam_fn_api --runner=PortableRunner It gives below error: File "/usr/local/lib/python2.7/site-packages/apache_beam/io/localfilesystem.py", line 134, in _path_open raw_file = open(path, mode) RuntimeError: IOError: [Errno 2] No such file or directory: '/Users/$UserId/Desktop/Beam/output/beam-temp-wordsout.txt-162ea1c67b3311e9aa99025000000001/b6e6490f-9c73-4cae-9344-6400b6798eb7.wordsout.txt' [while running 'write/Write/WriteImpl/FinalizeWrite'] I have few questions like below - 1. When I check in /usr/local/lib, I can not see python2.7 folder there but the error points to that location. I want to understand how it is being done and if there is any way to point it to virtual environment python location? 2. How to fix the docker image x509 certificate issue. I installed certificates from openssl but it dint fix the issue. 3. Any detailed documentation on how to make wordcount example work with PortableRunner via Docker. I have posted a question on stack overflow before on the same context. Below is the link: https://stackoverflow.com/questions/56050608/apache-beam-python-word-count-example-is-failing-for-flink-runner-with-beamioerr Please let me know in case if any clarifications needed. Thanks, Anjana ----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. ----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. ----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
