On this note, making local files easy to read is something we'd definitely
like to improve, as the current behavior is quite surprising. This could be
useful not just for running with docker and the portable runner locally,
but more generally when running on a distributed system (e.g. a Flink/Spark
cluster or Dataflow). It would be very convenient if we could automatically
stage local files to be read as artifacts that could be consumed by any
worker (possibly via external directory mounting in the local docker case
rather than an actual copy), and conversely copy small outputs back to the
local machine (with the similar optimization for local docker).

At the very least, however, obvious messaging when the local filesystem is
used from within docker, which is often a (non-obvious and hard to debug)
mistake should be added.


On Thu, Sep 12, 2019 at 10:34 AM Lukasz Cwik <lc...@google.com> wrote:

> When you use a local filesystem path and a docker environment, "/tmp" is
> written inside the container. You can solve this issue by:
> * Using a "remote" filesystem such as HDFS/S3/GCS/...
> * Mounting an external directory into the container so that any "local"
> writes appear outside the container
> * Using a non-docker environment such as external or process.
>
> On Thu, Sep 12, 2019 at 4:51 AM Yu Watanabe <yu.w.ten...@gmail.com> wrote:
>
>> Hello.
>>
>> I would like to ask for help with my sample code using portable runner
>> using apache flink.
>> I was able to work out the wordcount.py using this page.
>>
>> https://beam.apache.org/roadmap/portability/
>>
>> I got below two files under /tmp.
>>
>> -rw-r--r-- 1 ywatanabe ywatanabe    185 Sep 12 19:56
>> py-wordcount-direct-00001-of-00002
>> -rw-r--r-- 1 ywatanabe ywatanabe    190 Sep 12 19:56
>> py-wordcount-direct-00000-of-00002
>>
>> Then I wrote sample code with below steps.
>>
>> 1.Install apache_beam using pip3 separate from source code directory.
>> 2. Wrote sample code as below and named it "test-protable-runner.py".
>> Placed it separate directory from source code.
>>
>> -----------------------------------------------------------------------------------
>> (python) ywatanabe@debian-09-00:~$ ls -ltr
>> total 16
>> drwxr-xr-x 18 ywatanabe ywatanabe 4096 Sep 12 19:06 beam (<- source code
>> directory)
>> -rw-r--r--  1 ywatanabe ywatanabe  634 Sep 12 20:25
>> test-portable-runner.py
>>
>> -----------------------------------------------------------------------------------
>> 3. Executed the code with "python3 test-protable-ruuner.py"
>>
>>
>> ==========================================================================================
>> #!/usr/bin/env
>>
>> import apache_beam as beam
>> from apache_beam.options.pipeline_options import PipelineOptions
>> from apache_beam.io import WriteToText
>>
>>
>> def printMsg(line):
>>
>>     print("OUTPUT: {0}".format(line))
>>
>>     return line
>>
>> options = PipelineOptions(["--runner=PortableRunner",
>> "--job_endpoint=localhost:8099", "--shutdown_sources_on_final_watermark"])
>>
>> p = beam.Pipeline(options=options)
>>
>> output = ( p | 'create' >> beam.Create(["a", "b", "c"])
>>              | beam.Map(printMsg)
>>          )
>>
>> output | 'write' >> WriteToText('/tmp/sample.txt')
>>
>> =======================================================================================
>>
>> Job seemed to went all the way to "FINISHED" state.
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------
>> [DataSource (Impulse) (1/1)] INFO
>> org.apache.flink.runtime.taskmanager.Task - Registering task at network:
>> DataSource (Impulse) (1/1) (9df528f4d493d8c3b62c8be23dc889a8) [DEPLOYING].
>> [DataSource (Impulse) (1/1)] INFO
>> org.apache.flink.runtime.taskmanager.Task - DataSource (Impulse) (1/1)
>> (9df528f4d493d8c3b62c8be23dc889a8) switched from DEPLOYING to RUNNING.
>> [flink-akka.actor.default-dispatcher-3] INFO
>> org.apache.flink.runtime.executiongraph.ExecutionGraph - DataSource
>> (Impulse) (1/1) (9df528f4d493d8c3b62c8be23dc889a8) switched from DEPLOYING
>> to RUNNING.
>> [flink-akka.actor.default-dispatcher-3] INFO
>> org.apache.flink.runtime.executiongraph.ExecutionGraph - CHAIN MapPartition
>> (MapPartition at [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at
>> core.py:2415>), Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0])
>> (1/1) (0b60f1a9e620b061f85e97998aa4b181) switched from CREATED to SCHEDULED.
>> [flink-akka.actor.default-dispatcher-3] INFO
>> org.apache.flink.runtime.executiongraph.ExecutionGraph - CHAIN MapPartition
>> (MapPartition at [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at
>> core.py:2415>), Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0])
>> (1/1) (0b60f1a9e620b061f85e97998aa4b181) switched from SCHEDULED to
>> DEPLOYING.
>> [flink-akka.actor.default-dispatcher-3] INFO
>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying CHAIN
>> MapPartition (MapPartition at
>> [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2415>),
>> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/1) (attempt #0)
>> to aad5f142-dbb7-4900-a5ae-af8a6fdcc787 @ localhost (dataPort=-1)
>> [flink-akka.actor.default-dispatcher-2] INFO
>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Received task CHAIN
>> MapPartition (MapPartition at
>> [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2415>),
>> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/1).
>> [DataSource (Impulse) (1/1)] INFO
>> org.apache.flink.runtime.taskmanager.Task - Loading JAR files for task
>> DataSource (Impulse) (1/1) (d51e4171c0af881342eaa8a7ab06def8) [DEPLOYING].
>> [DataSource (Impulse) (1/1)] INFO
>> org.apache.flink.runtime.taskmanager.Task - Registering task at network:
>> DataSource (Impulse) (1/1) (d51e4171c0af881342eaa8a7ab06def8) [DEPLOYING].
>> [DataSource (Impulse) (1/1)] INFO
>> org.apache.flink.runtime.taskmanager.Task - DataSource (Impulse) (1/1)
>> (9df528f4d493d8c3b62c8be23dc889a8) switched from RUNNING to FINISHED.
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> But I ended up with docker error on client side.
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------
>> (python) ywatanabe@debian-09-00:~$ python3 test-portable-runner.py
>> /home/ywatanabe/python/lib/python3.5/site-packages/apache_beam/__init__.py:84:
>> UserWarning: Some syntactic constructs of Python 3 are not yet fully
>> supported by Apache Beam.
>>   'Some syntactic constructs of Python 3 are not yet fully supported by '
>> ERROR:root:java.io.IOException: Received exit code 125 for command
>> 'docker run -d --network=host --env=DOCKER_MAC_CONTAINER=null
>> ywatanabe-docker-apache.bintray.io/beam/python3:latest --id=13-1
>> --logging_endpoint=localhost:39049 --artifact_endpoint=localhost:33779
>> --provision_endpoint=localhost:34827 --control_endpoint=localhost:36079'.
>> stderr: Unable to find image '
>> ywatanabe-docker-apache.bintray.io/beam/python3:latest' locallydocker:
>> Error response from daemon: unknown: Subject ywatanabe was not found.See
>> 'docker run --help'.
>> Traceback (most recent call last):
>>   File "test-portable-runner.py", line 27, in <module>
>>     result.wait_until_finish()
>>   File
>> "/home/ywatanabe/python/lib/python3.5/site-packages/apache_beam/runners/portability/portable_runner.py",
>> line 446, in wait_until_finish
>>     self._job_id, self._state, self._last_error_message()))
>> RuntimeError: Pipeline
>> BeamApp-ywatanabe-0912112516-22d84d63_d3b5e778-d771-4118-9ba7-c9dde85938e9
>> failed in state FAILED: java.io.IOException: Received exit code 125 for
>> command 'docker run -d --network=host --env=DOCKER_MAC_CONTAINER=null
>> ywatanabe-docker-apache.bintray.io/beam/python3:latest --id=13-1
>> --logging_endpoint=localhost:39049 --artifact_endpoint=localhost:33779
>> --provision_endpoint=localhost:34827 --control_endpoint=localhost:36079'.
>> stderr: Unable to find image '
>> ywatanabe-docker-apache.bintray.io/beam/python3:latest' locallydocker:
>> Error response from daemon: unknown: Subject ywatanabe was not found.See
>> 'docker run --help'.
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> As a result , I got nothing under /tmp . Code works when using
>> DirectRunner.
>> May I ask , where should I look for in order to get the pipeline to write
>> results to text files under /tmp ?
>>
>> Best Regards,
>> Yu Watanabe
>>
>> --
>> Yu Watanabe
>> Weekend Freelancer who loves to challenge building data platform
>> yu.w.ten...@gmail.com
>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
>> Twitter icon] <https://twitter.com/yuwtennis>
>>
>

Reply via email to