Hello.
I would like to ask for help with my sample code using portable runner
using apache flink.
I was able to work out the wordcount.py using this page.
https://beam.apache.org/roadmap/portability/
I got below two files under /tmp.
-rw-r--r-- 1 ywatanabe ywatanabe185 Sep 12 19:56
Hi Cham
Any update on this?
Best
Ziyad
On Thu, Sep 5, 2019 at 5:43 PM Ziyad Muhammed wrote:
> Hi Cham
>
> I tried that before. Apparently it's not accepted by either direct runner
> or dataflow runner. I get the below error:
>
> Exception in thread "main" java.lang.IllegalArgumentException:
When you use a local filesystem path and a docker environment, "/tmp" is
written inside the container. You can solve this issue by:
* Using a "remote" filesystem such as HDFS/S3/GCS/...
* Mounting an external directory into the container so that any "local"
writes appear outside the container
*
I am new to Beam, and I am pretty excited to get started. I have been
doing quite a bit of research and playing around with the API. But for my
use case, unless I am not approaching it correctly, suggests that I will
need to process multiple PCollections in some parts of my pipeline.
I am
+dev I think we should probably point new users of
the portable Flink/Spark runners to use loopback or some other non-docker
environment, as Docker adds some operational complexity that isn't really
needed to run a word count example. For example, Yu's pipeline errored here
because the expected
Yes you can create multiple output PCollections using a ParDo with multiple
outputs instead of inserting them into Mongo.
It could be useful to read through the programming guide related to
PCollections[1] and PTransforms with multiple outputs[2] and feel free to
return with more questions.
1:
Hi Pawel, could you tell us which version of the Beam Python SDK you are
using?
For the record, this looks like a known issue:
https://issues.apache.org/jira/browse/BEAM-6860
Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com
On Wed, Sep 11, 2019 at 6:33 AM Paweł Kordek
This should become much better with 2.16 when we have the Docker images
prebuilt.
Docker is probably still the best option for Python on a JVM based runner
in a local environment that does not have a development setup.
On Thu, Sep 12, 2019 at 1:09 PM Kyle Weaver wrote:
> +dev I think we
I'm bit confused since we mention
https://issues.apache.org/jira/browse/BEAM-1438 before that error but that
JIRA has been fixed a few years ago.
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java#L312
+Reuven Lax can you comment on
I got hung up on that issue earlier this week. Was using Flink 1.7. V2.15
of Beam. Wasn't using Kubernetes.
Then gave up, so don't have a solution :-/
I don't understand the job server enough, but think I was getting error
when I did not have it running
(I still don't understand portability
I prefer loopback because a) it writes output files to the local
filesystem, as the user expects, and b) you don't have to pull or build
docker images, or even have docker installed on your system -- which is one
less point of failure.
Kyle Weaver | Software Engineer | github.com/ibzib |
Lukasaz
Thank you for the reply.
I will try apache flink.
Thanks,
Yu
On Sun, Sep 8, 2019 at 11:59 PM Lukasz Cwik wrote:
> Try using Apache Flink.
>
> On Sun, Sep 8, 2019 at 6:23 AM Yu Watanabe wrote:
>
>> Hello .
>>
>> I would like to ask question related to timely processing as stated in
Hi All,
I am trying to come up with a framework that would help debugging dataflow
job end to end locally.
Dataflow can work with 100s of source and sinks.
Is there a framework already to setup these sources and sinks locally e.g.
if my dataflow reads from BQ and inserts to Bigtable.
Please let me
13 matches
Mail list logo