Sharing Damon's email with the user@ list as well. Thanks Damon!
On Tue, May 12, 2020 at 9:02 PM Damon Douglas
wrote:
> Hello Everyone,
>
> If you don't already know, there are helpful instructional tools for
> learning the Apache Beam SDKs called Beam Katas hosted on
> https://stepik.org. Simi
Hi all,
A friendly reminder that the webinar on 'Best practices towards a
production ready pipeline' will start tomorrow at 10:00 PST. You can join
by signing up here: https://learn.xnextcon.com/event/eventdetails/W20051310
If you cannot get into the meeting room on Zoom, you can go to this Yout
Hi Alex,
Thanks a lot for the suggestion, it seems that with my previous experiment,
I did not pre-ingest enough amount of messages. So it looks like each
partition gets a slice of time to be consumed by the same consumer. And
maybe during partition1's time slice, it already drill down to zero, an
Thank you! added workerOptions (GoogleCloudOption fired an error on worker
machine)
code below for anyone else that might need it:
Apache beam SDK:
!pip show apache-beam
Name: apache-beam
Version: 2.20.0
Summary: Apache Beam SDK for Python
Home-page: https://beam.apache.org
Author: Apache Softwa
Hi Eleanore,
Interesting topic, thank you for more information. I don’t see that this is
unexpected behavior for KafkaIO since, as Heejong said before, it relies on
implementation of KafkaConsumer that is used in your case.
According to KafkaConsumer Javadoc [1], in most cases it should read f
A heads-up if anybody else sees this, we have removed the flag:
https://jira.apache.org/jira/browse/BEAM-9900
Further contributions are very welcome :)
-Max
On 11.05.20 17:05, Sruthi Sree Kumar wrote:
> I have opened a PR with the documentation change.
> https://github.com/apache/beam/pull/11662
Hi Eila,
It looks like you're attempting to set the option on the GoogleCloudOptions
class directly, I think you want to set it on an instance of
PipelineOptions that you've viewed as GoogleCloudOptions. Like this example
from
https://cloud.google.com/dataflow/docs/guides/specifying-exec-params#co
Hello,
I am trying to check if the setting of the resources are actually being
implemented.
What will be the right way to do it.
*the code is:*
GoogleCloudOptions.worker_machine_type = 'n1-highcpu-96'
and *the dataflow view is *the following (nothing that reflects the highcpu
machine.
Please advi
Hi Folks,
Cross-posting from the Slack channel from the other day.
I started looking at Beam again over the weekend. I have an unbounded
stream with a CassandraIO input and am trying to write files using FileIO
and ParquetIO.
I'm using the following:
Beam: 2.20.0
Flink Runner/Cluster: 1.9(.3)
Hi,
I would like to clarify that while TextIO is writing every data are in the
files (shards). The losing happens when file names emitted by
getPerDestinationOutputFilenames are processed by a window.
I have created a pipeline to reproduce the scenario in which some filenames
are loss after the g
Hi all,
The first error that eventually disconnects the worker from the
service is: *root
disk error*.
I have changed the machine type to highmem and high-cpu. The error
still appears.
What is required to make sure that the running container will have enough
Persistent Disk space?
insertId:
"s=da
Hi Vincent,
Did you mean <=3000 or did you want that to be <=3?
Cheers
Reza
On Fri, 8 May 2020 at 04:23, Vincent Domingues <
vincent.doming...@dailymotion.com> wrote:
> Hi all,
>
> We are trying to work with Beam on Flink runner to consume PubSub messages.
>
> We are facing latency issue ev
12 matches
Mail list logo