Hi Ankur,

Thanks for the reply!! Please find my response as below :

1. Pipeline does not fail for this. It shows a warning message as 'WARN 
org.apache.beam.runners.fnexecution.environment.DockerCommand - Unable to pull 
docker image $userId-docker-apache.bintray.io/beam/python:latest, cause: 
Received exit code 1 for command 'docker pull 
$userId-docker-apache.bintray.io/beam/python:latest'. stderr: Error response 
from daemon: Get https://$userId-docker-apache.bintray.io/v2/: x509: 
certificate is valid for *.bintray.io, bintray.io, not 
$userId-docker-apache.bintray.io
'
2. Logging is working as expected now. I need to set up log levels properly at 
jobserver and the script side.

Regards,
Anjana
________________________________
From: Ankur Goenka [[email protected]]
Sent: Wednesday, May 29, 2019 3:55 PM
To: Anjana Pydi; [email protected]
Cc: Richard Amrith Lourdu
Subject: Re: Apache Beam Python word count example is failing for Flink Runner

[email protected]<mailto:[email protected]>

1. Docker images created are only added to local docker repository and are not 
pushed to bintray.io<http://bintray.io>. I have not seen this exact error 
earlier so my best guess would be that the request to fetch the remote docker 
image failes when you try to do "docker pull <your user 
name>-docker-apache.bintray.io/beam/python<http://docker-apache.bintray.io/beam/python>".
 However, as you already have docker image in your local docker repository, you 
don't have to pull it from bintray. You can check the images in your local 
docker repository by "docker images".
Question: Do you see this error in the pipeline execution? And does your 
pipeline fail after this error?

2. All the logs from SDK Harness go to Flink. You should see them in the flink 
log files.
Question: Are you referring to SDK Harness console?


On Wed, May 29, 2019 at 2:09 PM Anjana Pydi 
<[email protected]<mailto:[email protected]>> wrote:
Hi Ankur,

Thank you very much for your suggestions. As you mentioned, it worked for local 
file system after adding a loopback worker.

I still have few more questions like below. It would be great if you can 
suggest something on these too please.

1.  After building SDK harness container using ./gradlew -p 
sdks/python/container docker , It gives below error when trying to do docker 
pull  :

Using default tag: latest Error response from daemon: Get 
https://$userId-docker- apache.bintray.io/v2/<http://apache.bintray.io/v2/>: 
x509: certificate is valid for *.bintray.io<http://bintray.io>, 
bintray.io<http://bintray.io>, not $userId-docker- 
apache.bintray.io<http://apache.bintray.io>

2. As different log levels can be used to control the verbosity of logging, 
When using the logging level as 'INFO'  and attempt to log any messages using 
'logging.info<http://logging.info>' , it does not show the messages on the 
console.

Regards,
Anjana
________________________________
From: Ankur Goenka [[email protected]<mailto:[email protected]>]
Sent: Tuesday, May 28, 2019 8:27 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Apache Beam Python word count example is failing for Flink Runner

Hi Anjana,

When using local file system, Docker containers started during the pipeline 
execution can only access container's local filesystem.
Also, multiple containers are started during pipeline execution which do not 
have access to other container's file system.

So, in this case, files created by container 1 was not accessible to container 
2 which might have been started later.

Their are a few ways to solve this problem.
1. Use globally accessible file system HDFS, GCS etc.
2. Use loopback worker which does not use docker containers and hence use 
host's native local filesystem.
Command:
python -m apache_beam.examples.wordcount --input=local_input_file
--output=local_output_file --job_endpoint=localhost:8099
--experiments beam_fn_api --runner=PortableRunner 
--environment_cache_millis=10000
3. Add caching to the docker containers. This is not 100% reliable as even 
after caching, its possible that multiple containers get started.
Command:
python -m apache_beam.examples.wordcount --input=local_input_file
--output=local_output_file --job_endpoint=localhost:8099
--experiments beam_fn_api --runner=PortableRunner --environment_type=LOOPBACK




On Tue, May 28, 2019 at 7:30 PM Anjana Pydi 
<[email protected]<mailto:[email protected]>> wrote:


Hi Team,

I am trying to run Apache Beam Python word count example on Apache's Flink with 
PortableRunner using a SDK harness/Job Server via Docker.

1.After building SDK harness container using ./gradlew -p sdks/python/container 
docker , It gives below error when trying to do docker pull  :

Using default tag: latest Error response from daemon: Get 
https://$userId-docker- apache.bintray.io/v2/<http://apache.bintray.io/v2/>: 
x509: certificate is valid for *.bintray.io<http://bintray.io>, 
bintray.io<http://bintray.io>, not $userId-docker- 
apache.bintray.io<http://apache.bintray.io>

2. Started Flink portable Jobservice endpoint using ./gradlew 
beam-runners-flink_1.5-job-server:runShadow.When trying to run apache beam word 
count example using below command,


python -m apache_beam.examples.wordcount --input=local_input_file
--output=local_output_file --job_endpoint=localhost:8099
--experiments beam_fn_api --runner=PortableRunner

It gives below error:

  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/io/localfilesystem.py", 
line 134, in _path_open
    raw_file = open(path, mode)
RuntimeError: IOError: [Errno 2] No such file or directory: 
'/Users/$UserId/Desktop/Beam/output/beam-temp-wordsout.txt-162ea1c67b3311e9aa99025000000001/b6e6490f-9c73-4cae-9344-6400b6798eb7.wordsout.txt'
 [while running 'write/Write/WriteImpl/FinalizeWrite']

I have few questions like below -

1. When I check in /usr/local/lib, I can not see python2.7 folder there but the 
error points to that location. I want to understand how it is being done and if 
there is any way to point it to virtual environment python location?
2. How to fix the docker image x509 certificate issue. I installed certificates 
from openssl but it dint fix the issue.
3. Any detailed documentation on how to make wordcount example work with 
PortableRunner via Docker.

I have posted a question on stack overflow before on the same context. Below is 
the link:
https://stackoverflow.com/questions/56050608/apache-beam-python-word-count-example-is-failing-for-flink-runner-with-beamioerr

Please let me know in case if any clarifications needed.

Thanks,
Anjana


-----------------------------------------------------------------------------------------------------------------------
 The information contained in this communication is intended solely for the use 
of the individual or entity to whom it is addressed and others authorized to 
receive it. It may contain confidential or legally privileged information. If 
you are not the intended recipient you are hereby notified that any disclosure, 
copying, distribution or taking any action in reliance on the contents of this 
information is strictly prohibited and may be unlawful. If you are not the 
intended recipient, please notify us immediately by responding to this email 
and then delete it from your system. Bahwan Cybertek is neither liable for the 
proper and complete transmission of the information contained in this 
communication nor for any delay in its receipt.
-----------------------------------------------------------------------------------------------------------------------
 The information contained in this communication is intended solely for the use 
of the individual or entity to whom it is addressed and others authorized to 
receive it. It may contain confidential or legally privileged information. If 
you are not the intended recipient you are hereby notified that any disclosure, 
copying, distribution or taking any action in reliance on the contents of this 
information is strictly prohibited and may be unlawful. If you are not the 
intended recipient, please notify us immediately by responding to this email 
and then delete it from your system. Bahwan Cybertek is neither liable for the 
proper and complete transmission of the information contained in this 
communication nor for any delay in its receipt.
-----------------------------------------------------------------------------------------------------------------------
 The information contained in this communication is intended solely for the use 
of the individual or entity to whom it is addressed and others authorized to 
receive it. It may contain confidential or legally privileged information. If 
you are not the intended recipient you are hereby notified that any disclosure, 
copying, distribution or taking any action in reliance on the contents of this 
information is strictly prohibited and may be unlawful. If you are not the 
intended recipient, please notify us immediately by responding to this email 
and then delete it from your system. Bahwan Cybertek is neither liable for the 
proper and complete transmission of the information contained in this 
communication nor for any delay in its receipt.

Reply via email to