We used to run legacy sources using the old-style Read translation.
Changing it to SDF might have broken ReadFromPubSub. Could you check in
the Flink jobs whether it uses the SDF code or the Read translation? For
Read you should be seeing the UnboundedSourceWrapper.
Looking at the code, there
This type of stack trace occurs when the downstream operator is blocked
for some reason. Flink maintains a finite number of network buffers for
each network channel. If the receiving downstream operator does not
process incoming network buffers, the upstream operator blocks. This is
also
Which metrics specifically do you mean? Beam metrics (e.g. backlogBytes)
or Flink metrics (e.g. numRecordsOut)?
Quick look at the code (ReaderInvocationUtil) reveals that the Beam
metrics should be reported correctly. The Flink native metrics are
always reported, independently of the type of
Hey Teodor,
Copying is the default behavior. This is tunable via the pipeline option
'objectReuse', i.e. 'objectReuse=true'.
The option is disabled by default because users may not be aware of
object reuse and recycle objects in their process functions which will
have unexpected side
Thanks for reporting Gökhan! Please keep us updated. We'll likely merge
the patch by the end of the week.
-Max
On 03.09.20 08:40, Gökhan Imral wrote:
Thanks for the quick response. I tried with a fix applied build and can
see that memory is much more stable.
Gokhan
On 2 Sep 2020, at 12:51
Looks like you ran into a bug.
You could just run your program without specifying any arguments, since
running with Python's FnApiRunner should be enough.
Alternatively, how about trying to run the same pipeline with the
FlinkRunner? Use: --runner=FlinkRunner and do not specify an endpoint.
Thanks Pedro! Great to see the program! This is going to be an exciting
event.
Forwarding to the dev mailing list, in case people didn't see this here.
-Max
On 29.07.20 20:25, Pedro Galvan wrote:
Hello!
Just a quick message to let everybody know that we have published the
program for the
Hi Dinh,
The check only de-duplicates in case the consumer processes the same
offset multiple times. It ensures the offset is always increasing.
If this has been fixed in Kafka, which the comment assumes, the
condition will never be true.
Which Kafka version are you using?
-Max
On
implementations.
-Original Message-
From: Maximilian Michels
Sent: Monday, July 27, 2020 3:04 PM
To: user@beam.apache.org; Sunny, Mani Kolbe
Subject: Re: Unbounded sources unable to recover from checkpointMark when
withMaxReadTime() is used
CAUTION: This email originated from outside of D
nerate batched outputs. Without ability to resume from a
checkpoint, it will be reading entire stream every time.
Regards,
Mani
-Original Message-
From: Maximilian Michels mailto:m...@apache.org>>
Sent: Tuesday, July 21, 2020 11:38 AM
To:user@bea
Hi Mani,
BoundedReadFromUnboundedSource was originally intended to be used in
batch pipelines. In batch, runners typically do not perform
checkpointing. In case of failures, they re-run the entire pipeline.
Keep in mind that, even with checkpointing, reading for a finite time in
the
This used to be working but it appears @FinalizeBundle (which KafkaIO
requires) was simply ignored for portable (Python) pipelines. It looks
relatively easy to fix.
-Max
On 07.07.20 03:37, Luke Cwik wrote:
The KafkaIO implementation relies on checkpointing to be able to update
the last
Just to clarify: We could make the AsnycIO operator also available in
Beam but the operator has to be represented by a concept in Beam.
Otherwise, there is no way to know when to produce it as part of the
translation.
On 08.07.20 11:53, Maximilian Michels wrote:
Flink's AsycIO operator
Flink's AsycIO operator is useful for processing io-bound operations,
e.g. sending network requests. Like Luke mentioned, it is not available
in Beam.
-Max
On 07.07.20 22:11, Luke Cwik wrote:
Beam is a layer that sits on top of execution engines like Flink and
provides its own programming
output as
> well but I assume that is just receiving the error message from the flink
> cluster.
>
> Thanks,
> Jesse
>
> On 6/23/20, 8:11 AM, "Maximilian Michels" wrote:
>
> Hey Jesse,
>
> Could you share the context of the error? Wh
Hey Jesse,
Could you share the context of the error? Where does it occur? In the
client code or on the cluster?
Cheers,
Max
On 22.06.20 18:01, Jesse Lord wrote:
> I am trying to run the wordcount quickstart example on a flink cluster
> on AWS EMR. Beam version 2.22, Flink 1.10.
>
>
>
> I
a.lang.RuntimeException: java.lang.RuntimeException:
>>>>>> java.lang.NoSuchMethodException: java.time.Instant.()
>>>>>> at
>>>>>>
>>> org.apache.avro.specific.SpecificData.newInstance(SpecificData.ja
>>>>>> va
>
You are using a proprietary connector which only works on Dataflow. You
will have to use io.external.gcp.pubsub.ReadFromPubsub. PubSub support
is experimental from Python.
-Max
On 09.06.20 06:40, Pradip Thachile wrote:
> Quick update: this test code works just fine on Dataflow as well as the
>
>
>> Is not like that?
>>
>> Also I don't understand when you said that a reference to the Beam
>> Coder is saved into the checkpoint, because the error I'm getting is
>> referencing the java model class ("Caused by:
>> java.io.InvalidClassException: internal.model.dimension.POJOMode
ting the AVRO schema
>> using
>> Java reflection, and then that generated schema is saved within the
>> Flink checkpoint, right?
>>
>> On Wed, 2020-06-03 at 18:00 +0200, Maximilian Michels wrote:
>>> Hi Ivan,
>>>
>>> Moving to the new type serializer
ing the AVRO schema using
> Java reflection, and then that generated schema is saved within the
> Flink checkpoint, right?
>
> On Wed, 2020-06-03 at 18:00 +0200, Maximilian Michels wrote:
>> Hi Ivan,
>>
>> Moving to the new type serializer snapshot interface is not going
> KafkaIO or Protobuf.
*I meant to say "Avro or Protobuf".
On 03.06.20 18:00, Maximilian Michels wrote:
> Hi Ivan,
>
> Moving to the new type serializer snapshot interface is not going to
> solve this problem because we cannot version the coder through the Beam
> c
Hi Ivan,
Moving to the new type serializer snapshot interface is not going to
solve this problem because we cannot version the coder through the Beam
coder interface. That is only possible through Flink. However, it is
usually not trivial.
In Beam, when you evolve your data model, the only way
Awesome!
On 02.06.20 22:09, Austin Bennett wrote:
> Hi Beam Users,
>
> Wanted to share the Workshop that I'll give at Berlin Buzzword's next
> week:
> https://berlinbuzzwords.de/session/first-steps-apache-beam-writing-portable-pipelines-using-java-python-go
>
> Do consider joining if you are
The logs indicate that you are not running the Docker-based execution
but the `LOOPBACK` mode. In this mode the Flink cluster needs to connect
to the machine that started the pipeline. That will not be possible
unless you are running the Flink cluster on the same machine (we bind to
`localhost`
/...">> with beam.Pipeline(options=options) as p:
> (p | 'ReadMyFile' >> beam.io.ReadFromText(input_file_hdfs)
> |"WriteMyFile" >> beam.io.WriteToText(output_file_hdfs))
-Max
On 28.05.20 17:00, Ramanan, Buvana (Nokia - US/Murray Hill) wrote:
> Hi Max
Thanks Kyle!
On 28.05.20 13:16, Kyle Weaver wrote:
> The Apache Beam team is pleased to announce the release of version 2.21.0.
>
> Apache Beam is an open source unified programming model to define and
> execute data processing pipelines, including ETL, batch and stream
> (continuous)
You are using the LOOPBACK environment which requires that the Flink
cluster can connect back to your local machine. Since the loopback
environment by defaults binds to localhost that should not be possible.
I'd suggest using the default Docker environment.
On 28.05.20 14:06, Ashish Raghav
Potentially a Windows issue. Do you have a Unix environment for testing?
On 28.05.20 13:35, Ashish Raghav wrote:
> Hi Guys,
>
>
>
> I have another issue when I submit the python beam pipeline ( wordcount
> example provided by apache beam team) directly on flink cluster running
> local.
>
>
The configuration looks good but the HDFS file system implementation is
not intended to be used directly.
Instead of:
> lines = p | 'ReadMyFile' >> beam.Create(hdfs_client.open(input_file_hdfs))
Use:
> lines = p | 'ReadMyFile' >> beam.io.ReadFromText(input_file_hdfs)
Best,
Max
On 28.05.20
Thanks to everyone who joined and asked questions. Really enjoyed this
new format!
-Max
On 28.05.20 08:09, Marta Paes Moreira wrote:
> Thanks for sharing, Aizhamal - it was a great webinar!
>
> Marta
>
> On Wed, 27 May 2020 at 23:17, Aizhamal Nurmamat kyzy
> mailto:aizha...@apache.org>> wrote:
> We thank you for your understanding! See you soon!
>
> -Griselda Cuevas, Brittany Hermann, Maximilian Michels, Austin Bennett,
> Matthias Baetens, Alex Van Boxel
>
Hi Ivan,
Beam does not use Java serialization for checkpoint data. It uses Beam
coders which are wrapped in Flink's TypeSerializers. That said, Beam
does not support serializer migration yet.
I'm curious, what do you consider a "backwards-compatible" change? If
you are attempting to upgrade the
gards,
> Sruthi
>
> On Tue, May 12, 2020 at 7:21 PM Maximilian Michels <mailto:m...@apache.org>> wrote:
>
> A heads-up if anybody else sees this, we have removed the flag:
> https://jira.apache.org/jira/browse/BEAM-9900
>
> Further contributions are very
> Thanks
> Jose
>
>
> El lun., 18 may. 2020 a las 18:37, Reuven Lax ( <mailto:re...@google.com>>) escribió:
>
> This is still confusing to me - why would the messages be dropped as
> late in this case?
>
> On Mon, May 18, 2020 at 6:14 AM Maxim
All runners which use the Beam reference implementation drop the
PaneInfo for WriteFilesResult#getPerDestinationOutputFilenames(). That's
why we can observe this behavior not only in Flink but also Spark.
The WriteFilesResult is returned here:
A heads-up if anybody else sees this, we have removed the flag:
https://jira.apache.org/jira/browse/BEAM-9900
Further contributions are very welcome :)
-Max
On 11.05.20 17:05, Sruthi Sree Kumar wrote:
> I have opened a PR with the documentation change.
>
Hey Robbe,
The issue with a higher parallelism is likely due to the single Python
process which processes the data.
You may want to use the `sdk_worker_parallelism` pipeline option which
brings up multiple worker Python workers.
Best,
Max
On 30.04.20 23:56, Robbe Sneyders wrote:
> Yes, the
Generally, it is to be expected that the main input is buffered until
the side input is available. We really have no other option to correctly
process the data.
Have you tried using RocksDB as the state backend to prevent too much GC
churn?
-Max
On 07.05.20 06:27, Eleanore Jin wrote:
> Please
Beam and its Flink Runner do not allow setting the parallelism at the
operator level. The wish to configure per-operator came up numerous
times over the years. I'm not opposed to allowing for special cases,
e.g. via a pipeline option.
It doesn't look like it is necessary for the use case
2.17.0, 2.18.0, 2.19.0, 2.20.0)? Or I have
> to wait for the 2.20.1 release?
>
> Thanks a lot!
> Eleanore
>
> On Wed, Apr 22, 2020 at 2:31 AM Maximilian Michels <mailto:m...@apache.org>> wrote:
>
> Hi Eleanore,
>
> Exactly-once is not affect
t.
>
> Many thanks,
>
> Steve
>
>
> Stephen Hesketh | Client Analytics Technology
> S +44 (0)7968 039848
> + stephen.hesk...@natwestmarkets.co.uk
> 250 Bishopsgate | London | EC2M 4AA
> The information classification of this email is Confidential unless otherwise
kend (filesystem) using the config
>> file in the config directory specified by the env variable
>> ENV_FLINK_CONF_DIR.
>>
>> Regards,
>> Sruthi
>>
>> On Tue, Apr 28, 2020 at 11:35 AM Maximilian Michels wrote:
>>>
>>> Hi Sruthi,
>&
Hi Sruthi,
Not possible out-of-the-box at the moment. You'll have to add the
RocksDB Flink dependency in flink_runner.gradle, e.g.:
compile "org.apache.flink:flink-statebackend-rocksdb_2.11:$flink_version"
Also in the Flink config you have to set
state.backend: rocksdb
Then you can run
Hi Steve,
The Flink Runner buffers data as part of the checkpoint. This was
originally due to a limitation of Flink where we weren't able to end the
bundle before we persisted the state for a checkpoint. This is due to
how checkpoint barriers are emitted, I spare you the details*.
Does the data
Looking forward to this!
Cheers,
Max
On 22.04.20 21:09, Austin Bennett wrote:
> Hi All,
>
> We are excited to announce the Beam Digital Summit 2020!
>
> This will occur for partial days during the week of 15-19 June.
>
> CfP is open and found:
The flag is needed when checkpointing is enabled because Flink is unable
to create a new checkpoint when not all operators are running.
By default, operators shut down when all input has been read. That will
trigger sending out the maximum (final) watermark at the sources. The
flag name is a bit
I assume this will impact the Exactly Once Semantics that beam provided
> as in the KafkaExactlyOnceSink, the processElement method is also
> annotated with @RequiresStableInput?
>
> Thanks a lot!
> Eleanore
>
> On Tue, Apr 21, 2020 at 12:58 AM Maximilian Michels &
Hi Stephen,
Thanks for reporting the issue! David, good catch!
I think we have to resort to only using a single state cell for
buffering on checkpoints, instead of using a new one for every
checkpoint. I was under the assumption that, if the state cell was
cleared, it would not be checkpointed
our use case.
>
> Thanks a lot!
> Eleanore
>
> On Thu, Mar 5, 2020 at 12:46 AM Maximilian Michels <mailto:m...@apache.org>> wrote:
>
> Hi Eleanore,
>
> Good question. I think the easiest way is to configure this in the
> Flink
> config
Hi Eleanore,
Good question. I think the easiest way is to configure this in the Flink
configuration file, i.e. flink-conf.yaml. Then you don't need to set
anything in Beam.
If you want to go with your approach, then just use
getClass().getClassLoader() unless you have some custom
PM Maximilian Michels <mailto:m...@apache.org>> wrote:
Hi Tobi,
That makes sense to me. My argument was coming from having
"exactly-once" semantics for a pipeline. In this regard, the stop
functionality does not help. But I think having the option to
grace
ght now 31), rollout a
new Flink version and start them at the point where they left of with
their last committed offset in Kafka.
Does that make sense?
Best,
Tobi
On Sun, Mar 1, 2020 at 5:23 PM Maximilian Michels <mailto:m...@apache.org>> wrote:
In some sense, stop is differ
p-alive
content-length: 2137
Best,
Tobias
On Fri, Feb 28, 2020 at 10:29 AM Maximilian Michels mailto:m...@apache.org>> wrote:
The stop functionality has been removed in Beam. It was
semantically
identical to using cancel, so we decided to drop suppor
nd test
Best,
Tobi
On Mon, Feb 24, 2020 at 10:13 PM Maximilian
Michels mailto:m...@apache.org>> wrote:
Thank you for reporting / filing /
Thank you for reporting / filing / collecting the issues.
There is a fix pending: https://github.com/apache/beam/pull/10950
As for the upgrade issues, the 1.8 and 1.9 upgrade is trivial. I will
check out the Flink 1.10 PR tomorrow.
Cheers,
Max
On 24.02.20 09:26, Ismaël Mejía wrote:
We are
Hi Tison,
Beam has its own set of libraries to implement windowing. Hence, the
Flink Runner does not use Flink's windowing but deploys Beam's windowing
logic within a Flink operator. If you want to look in the code, have a
look at WindowDoFnOperator.
Cheers,
Max
On 21.01.20 10:35, tison
Thanks for the heads-up, Ning! I haven't tried out interactive Beam, but
this puts it back on my radar :)
Cheers,
Max
On 04.12.19 20:45, Ning Kang wrote:
*If you are not an Interactive Beam user, you can ignore this email.*
*
*
Hi Interactive Beam users,
We've recently made some changes to
Hi Benjamin,
There is a way to do this yourself:
To unsubscribe a different e-mail - e.g. you used to be subscribed as
user@oldname.example - send a message to
list-unsubscribe-user=oldname.exam...@apache.org
Source: https://apache.org/foundation/mailinglists.html
So send an email to:
uot;Maximilian Michels" mailto:m...@apache.org>> wrote:
Hi Preston,
Sorry about the name mixup, of course I meant to write Preston not
Magnus :) See my reply below.
cheers,
Max
On 25.09.19 08:31, Maximilian Michels wr
Hi Jitendra,
Side inputs are materialized based on their windowing. If you assign a
10 second window to the side inputs, they can be renewed every 10
seconds. Whenever you access the side input, the newest instance of the
side input will be retrieved.
Cheers,
Max
On 22.10.19 10:42,
Probably the most stable is running on Dataflow still. But I’m excited to see
the progress towards a Spark runner, can’t wait to try TFT on it :)
That is debatable. It is also hard to compare because Dataflow is a
managed service, whereas you'll have to spin up your own cluster for
other
Hi Preston,
Sorry about the name mixup, of course I meant to write Preston not
Magnus :) See my reply below.
cheers,
Max
On 25.09.19 08:31, Maximilian Michels wrote:
Hi Magnus,
Your observation seems to be correct. There is an issue with the file
system registration.
The two types
familiar about Flink sure why S3 is not properly being
registered when running the Flink job. Ccing some folks who are more
familiar about Flink.
+Ankur Goenka <mailto:goe...@google.com> +Maximilian Michels
<mailto:m...@apache.org>
Thanks,
Cham
On Sat, Sep 21, 2019 at 9:18 AM Kopri
That's even better.
On 19.09.19 16:35, Robert Bradshaw wrote:
On Thu, Sep 19, 2019 at 4:33 PM Maximilian Michels wrote:
This is obviously less than ideal for the user... Should we "fix" the
Java SDK? Of is the long-terms solution here to have runners do this
rewrite?
I think i
code paths for
Reads.
On 19.09.19 11:46, Robert Bradshaw wrote:
On Thu, Sep 19, 2019 at 11:22 AM Maximilian Michels wrote:
The flag is insofar relevant to the PortableRunner because it affects
the translation of the pipeline. Without the flag we will generate
primitive Reads which are unsupported in p
| Software Engineer | github.com/ibzib
<http://github.com/ibzib> | kcwea...@google.com <mailto:kcwea...@google.com>
On Tue, Sep 17, 2019 at 3:38 PM Ahmet Altay <mailto:al...@google.com>> wrote:
On Tue, Sep 17, 2019 at 2:26 PM Maximilian Michels mailto:m.
here are for the Create and Read transforms.)
> On Tue, Sep 17, 2019 at 11:33 AM Maximilian Michels
mailto:m...@apache.org>> wrote:
>>
>> +dev
>>
>>
+dev
The beam_fn_api flag and the way it is automatically set is error-prone.
Is there anything that prevents us from removing it? I understand that
some Runners, e.g. Dataflow Runner have two modes of executing Python
pipelines (legacy and portable), but at this point it seems clear that
Hi Andrea,
You could use the new worker_pool_main.py which was developed for the
Kubernetes use case. It works together with the external environment
factory.
Cheers,
Max
On 11.09.19 18:51, Lukasz Cwik wrote:
Yes.
On Wed, Sep 11, 2019 at 3:12 AM Andrea Medeghini
Hey,
I'm in as well! Austin and I recently talked about how we could organize
the hackathon. Likely it will be an hour per day for exchanging ideas
and learning about Beam. For example, there has been interest from the
Apache Streams project to discuss points for collaboration.
We will soon
Maximilian Michels <mailto:m...@apache.org>> wrote:
Hi Catlyn,
This option has never worked outside the Java SDK where it originates
from. For the upcoming Beam 2.16.0 release, we have replaced this
option
with a factory class:
https://github.com/apache/
Hi Catlyn,
This option has never worked outside the Java SDK where it originates
from. For the upcoming Beam 2.16.0 release, we have replaced this option
with a factory class:
> mailto:tobias.kay...@ricardo.ch>>
> wrote:
>
> * each time :)
>
> On Mon, Aug 12, 2019 at 9:48 PM Kaymak, Tobias
> <mailto:tobias.kay...@ricardo.ch>> wrote:
>
>
Dear users of the Flink Runner,
Just a heads-up that we may remove Flink 1.5/1.6 support for future
version of Beam (>=2.16.0): https://jira.apache.org/jira/browse/BEAM-7962
Why? We support 1.5 - 1.8 and have a 1.9 version in the making. The
increased number of supported versions manifests in a
Hi Tobias!
I've checked if there were any relevant changes to the RocksDB state backend in
1.8.1, but I couldn't spot anything. Could it be that an old version of RocksDB
is still in the Flink cluster path?
Cheers,
Max
On 06.08.19 16:43, Kaymak, Tobias wrote:
> And of course the moment I
Hi Chad,
This stub will only be replaced by the Dataflow service. It's an artifact of
the pre-portability era.
That said, we now have the option to replace ReadFromPubSub with an external
transform which would utilize Java's PubSubIO via the new cross-language
feature.
Thanks,
Max
On
r such crucial behavior i would expect the pipeline
> to fail with a clear message stating the reason, like in the same
> way when you implement a new Codec and forget to override the
> verifyDeterministic method (don't recall the right name of it).
>
> Just my 2
This has come up before: https://issues.apache.org/jira/browse/BEAM-4520
The issue is that checkpoints won't be acknowledged if checkpointing is
disabled in Flink. We throw a WARN when unbounded sources are used without
checkpointing. Not all unbounded sources actually need to finalize
0% sure, but it seems like the translation from Beam to Flink
> could potentially include the windows in the key of `.keyBy` to allow
> parallelizing window computations. Does this sound like a reasonable
> feature request?
>
> Mike
>
> Ladder <http://bit.ly/1VRtWfS>.
c of the data.
>
> If possible, can you share your pipeline code at a high level.
>
> On Mon, Jun 10, 2019 at 12:58 PM Mike Kaplinskiy
> mailto:m...@ladderlife.com>> wrote:
>
>
> Ladder <http://bit.ly/1VRtWfS>. The smart, modern way to insure
> y
Hi Mingliang,
You can increase the parallelism of the Python SDK Harness via the pipeline
option
--experimental worker_threads=
Note that the workers are Python threads which suffer from the Global
Interpreter Lock. We currently do not use real processes, e.g. via
multiprocessing.
There
Hi Mike,
If you set the number of shards to 1, you should get one shard per window;
unless you have "ignore windows" set to true.
> (The way I'm checking this is via the Flink UI)
I'm curious, how do you check this via the Flink UI?
Cheers,
Max
On 09.06.19 22:33, Mike Kaplinskiy wrote:
> Hi
Thanks for managing the release, Ankur!
@Chad Thanks for the feedback. I agree that we can improve our release notes.
The particular issue you were looking for was part of the detailed list [1]
linked in the blog post: https://jira.apache.org/jira/browse/BEAM-7029
Cheers,
Max
[1]
java#L63
>>
>> On Tue, May 28, 2019 at 11:39 AM 青雉(祁明良) > <mailto:m...@xiaohongshu.com>> wrote:
>>>
>>> Yes, I did (2). Since the job server successfully
>>> created the artifact directory, I think I did it correctly. And
>>>
the artifact directory is successfully created at HDFS by job
server, but fails at task manager when reading.
Best,
Mingliang
获取 Outlook for iOS <https://aka.ms/o0ukef>
On Tue, May 28, 2019 at 11:47 PM +0800, "Maximilian Michels"
mailto:m...@apache.org>> wrote:
Recen
Recent versions of Flink do not bundle Hadoop anymore, but they are
still "Hadoop compatible". You just need to include the Hadoop jars in
the classpath.
Beams's Hadoop does not bundle Hadoop either, it just provides Beam file
system abstractions which are similar to Flink "Hadoop
, and when I tested it for the
WordCount example it ran without problems.
Also, it runs successfully if I run on [local], if I am using incompatible
versions this should fail right?
Regards,
Jorik
-Original Message-
From: Maximilian Michels
Sent: Monday, May 13, 2019 12:37 PM
To: user
Just saw that the malformed master URL was due to HTML formatting. It
looks ok.
Please check your Flink JobManager logs. The JobManager might not
reachable and the submission is just blocked on it becoming available.
Thanks,
Max
On 13.05.19 20:05, Maximilian Michels wrote:
Hi Averell
till
stuck with it.
Again, thanks a lot for your help.
Regards,
Averell
On Sat, 11 May 2019, 12:35 am Maximilian Michels, mailto:m...@apache.org>> wrote:
Hi Averell,
What you want to do is possible today but at this point is an early
experimental feature
Hi Jorik,
It looks like the version of the Flink Runner and the Flink cluster
version do not match. For example, if you use Flink 1.7, make sure to
use the beam-runners-flink-1.7 artifact.
For more information: https://beam.apache.org/documentation/runners/flink/
Thanks,
Max
On 12.05.19
Hi Averell,
What you want to do is possible today but at this point is an early
experimental feature. The reason for that is that Kafka is a
cross-language Java transform in a Python pipeline. We just recently
enabled cross-language pipelines.
1. First of all, until 2.13.0 is released you
5.19 22:44, Jan Lukavský wrote:
Hi Max,
comments inline.
On 5/2/19 3:29 PM, Maximilian Michels wrote:
Couple of comments:
* Flink transforms
It wouldn't be hard to add a way to run arbitrary Flink operators
through the Beam API. Like you said, once you go down that road, you
loose the abil
is this for a typical user? Do
people stop using a particular tool because it is in an X
language? I personally would put features first over language
portability and it's completely fine that may not be in line with
Beam's priorities. All said I can agree Beam focus on language
portability is gre
eam.apache.org
On Sun, Apr 28, 2019 at 7:08 PM Reza Rokni wrote:
+1 I recall a fun afternoon a few years ago figuring this out ...
On Mon, 11 Mar 2019 at 18:36, Maximilian Michels wrote:
Hi,
I have seen several users including myself get confused by the "default"
triggering
Hi Austin,
Thanks for the heads-up! I just want to highlight that this is a great
chance for Beam. There will be a _dedicated_ Beam track which means that
there is potential for lots of new people to learn about Beam. Of
course, there will also be many people already involved in Beam.
-Max
ugusto>
>
> On 2019/03/26 12:31:54, Maximilian Michels <http://apache.org>> wrote: >
> > Hi Augusto,> >
> > >
> > Generally speaking Avro should provide very good performance. The
calls > >
> > you are seeing should not be significa
fixes it.
On Fri, Mar 29, 2019 at 11:53 AM Maximilian Michels <mailto:m...@apache.org>> wrote:
Hi Tobias,
Thank for reporting. Can confirm, this is a regression with the
detection of the execution mode. Everything should work fine if you set
the "streaming
Hi Augusto,
In Beam there is no way to specify how parallel a specific transform
should be. There is only a general indicator for how parallel a pipeline
should be, i.e. Dataflow has "numWorkers", Spark/Flink have "parallelism".
You should see 16 parallel operations for your Read if you
Hi Tobias,
Thank for reporting. Can confirm, this is a regression with the
detection of the execution mode. Everything should work fine if you set
the "streaming" flag to true. Will be fixed for the 2.12.0 release.
Thanks,
Max
On 28.03.19 17:28, Lukasz Cwik wrote:
+dev
1 - 100 of 157 matches
Mail list logo