Re: Try Beam Katas Today

2020-05-12 Thread Pablo Estrada
Sharing Damon's email with the user@ list as well. Thanks Damon! On Tue, May 12, 2020 at 9:02 PM Damon Douglas wrote: > Hello Everyone, > > If you don't already know, there are helpful instructional tools for > learning the Apache Beam SDKs called Beam Katas hosted on > https://stepik.org. Simi

Webinar: Best practices towards a production-ready Beam pipeline

2020-05-12 Thread Aizhamal Nurmamat kyzy
Hi all, A friendly reminder that the webinar on 'Best practices towards a production ready pipeline' will start tomorrow at 10:00 PST. You can join by signing up here: https://learn.xnextcon.com/event/eventdetails/W20051310 If you cannot get into the meeting room on Zoom, you can go to this Yout

Re: Behavior of KafkaIO

2020-05-12 Thread Eleanore Jin
Hi Alex, Thanks a lot for the suggestion, it seems that with my previous experiment, I did not pre-ingest enough amount of messages. So it looks like each partition gets a slice of time to be consumed by the same consumer. And maybe during partition1's time slice, it already drill down to zero, an

Re: GoogleCloudOptions.worker_machine_type = 'n1-highcpu-96'

2020-05-12 Thread OrielResearch Eila Arich-Landkof
Thank you! added workerOptions (GoogleCloudOption fired an error on worker machine) code below for anyone else that might need it: Apache beam SDK: !pip show apache-beam Name: apache-beam Version: 2.20.0 Summary: Apache Beam SDK for Python Home-page: https://beam.apache.org Author: Apache Softwa

Re: Behavior of KafkaIO

2020-05-12 Thread Alexey Romanenko
Hi Eleanore, Interesting topic, thank you for more information. I don’t see that this is unexpected behavior for KafkaIO since, as Heejong said before, it relies on implementation of KafkaConsumer that is used in your case. According to KafkaConsumer Javadoc [1], in most cases it should read f

Re: Running NexMark Tests

2020-05-12 Thread Maximilian Michels
A heads-up if anybody else sees this, we have removed the flag: https://jira.apache.org/jira/browse/BEAM-9900 Further contributions are very welcome :) -Max On 11.05.20 17:05, Sruthi Sree Kumar wrote: > I have opened a PR with the documentation change. > https://github.com/apache/beam/pull/11662

Re: GoogleCloudOptions.worker_machine_type = 'n1-highcpu-96'

2020-05-12 Thread Brian Hulette
Hi Eila, It looks like you're attempting to set the option on the GoogleCloudOptions class directly, I think you want to set it on an instance of PipelineOptions that you've viewed as GoogleCloudOptions. Like this example from https://cloud.google.com/dataflow/docs/guides/specifying-exec-params#co

GoogleCloudOptions.worker_machine_type = 'n1-highcpu-96'

2020-05-12 Thread OrielResearch Eila Arich-Landkof
Hello, I am trying to check if the setting of the resources are actually being implemented. What will be the right way to do it. *the code is:* GoogleCloudOptions.worker_machine_type = 'n1-highcpu-96' and *the dataflow view is *the following (nothing that reflects the highcpu machine. Please advi

Unbounded stream to FileIO.write

2020-05-12 Thread Nathan Fisher
Hi Folks, Cross-posting from the Slack channel from the other day. I started looking at Beam again over the weekend. I have an unbounded stream with a CassandraIO input and am trying to write files using FileIO and ParquetIO. I'm using the following: Beam: 2.20.0 Flink Runner/Cluster: 1.9(.3)

Re: TextIO. Writing late files

2020-05-12 Thread Jose Manuel
Hi, I would like to clarify that while TextIO is writing every data are in the files (shards). The losing happens when file names emitted by getPerDestinationOutputFilenames are processed by a window. I have created a pipeline to reproduce the scenario in which some filenames are loss after the g

Re: resources management at worker machine or how to debug hanging execution on worker machine

2020-05-12 Thread OrielResearch Eila Arich-Landkof
Hi all, The first error that eventually disconnects the worker from the service is: *root disk error*. I have changed the machine type to highmem and high-cpu. The error still appears. What is required to make sure that the running container will have enough Persistent Disk space? insertId: "s=da

Re: PubSub latency for Beam pipeline on Flink runner

2020-05-12 Thread Reza Ardeshir Rokni
Hi Vincent, Did you mean <=3000 or did you want that to be <=3? Cheers Reza On Fri, 8 May 2020 at 04:23, Vincent Domingues < vincent.doming...@dailymotion.com> wrote: > Hi all, > > We are trying to work with Beam on Flink runner to consume PubSub messages. > > We are facing latency issue ev