,
Jan
On 7/16/21 2:00 PM, Jan Lukavský wrote:
Hi,
I hit another issue with the portable Flink runner. Long story short -
reading from Kafka is not working in portable Flink. After solving
issues with expansion service configuration (ability to add
use_deprecated_read) option, because flink
Hi,
I hit another issue with the portable Flink runner. Long story short -
reading from Kafka is not working in portable Flink. After solving
issues with expansion service configuration (ability to add
use_deprecated_read) option, because flink portable runner has issues
with SDF [1], [2].
m/blob/3e933b55f3d2072fb0248050f9091850933f33c7/model/pipeline/src/main/proto/beam_runner_api.proto#L784
<https://github.com/apache/beam/blob/3e933b55f3d2072fb0248050f9091850933f33c7/model/pipeline/src/main/proto/beam_runner_api.proto#L784>
On Thu, Jul 15, 2021 at 12:16 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
.
On 7/15/21 8:44 PM, Jan Lukavský wrote:
Hi,
I hit an issue when using x-lang python pipeline (with ReadFromKafka)
with subsequent WindowInto with trigger=Repeatedly(AfterCount(1)). The
Pipeline looks as follows:
(p | ReadFromKafka(
consumer_config={'bootstrap.servers
Hi,
I hit an issue when using x-lang python pipeline (with ReadFromKafka)
with subsequent WindowInto with trigger=Repeatedly(AfterCount(1)). The
Pipeline looks as follows:
(p | ReadFromKafka(
consumer_config={'bootstrap.servers': bootstrapServer},
topics=[inputTopic],
PM Alexey Romanenko
mailto:aromanenko@gmail.com>> wrote:
I agree that we should make this optional. What would be the best
way to it with gradle?
On 11 Jul 2021, at 16:40, Jan Lukavský mailto:je...@seznam.cz>> wrote:
I'd be +1 to making it optional as w
I'd be +1 to making it optional as well. Looks really like an overhead
for users not using avro.
On 7/11/21 10:36 AM, Alex Van Boxel wrote:
It worked before 2.30. It's fine for when you're using Confluent
Kafka, but feels like a hard dependency for non-Kafka users. Certainly
the requirement
Looks to me to be failing consistently. At least appears so both on PRs
and tested locally on master branch.
On 7/5/21 1:34 PM, Alexey Romanenko wrote:
Hello,
JavaPrecommit fails because of
Thu, Jul 1, 2021 at 2:11 PM Jan Lukavský wrote:
This really does not match my experience. Passing the correct
"use_deprecated_read" flag to the expansion service had the expected impact on
the Flink's execution DAG and - most of all - it started to work (at least seems so). The
UI i
tly itself. It could attempt to deserialize
the SDF ptransform and see if there is an UnboundedSource inside and
then do whatever it wants with it.
On Thu, Jul 1, 2021 at 11:39 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
I don't have complete comprehension of the topic, but f
do we need to do that in the expansion service?
On Thu, Jul 1, 2021 at 11:16 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hi,
after today's experience I think I have some arguments about why
we *should* pass (at least some) of the PipelineOptions from SDK
to expansi
https://github.com/apache/beam/pull/15082/commits/5a46664ceb9f03da3089925b30ecd0a802e8b3eb
On 7/1/21 9:33 AM, Jan Lukavský wrote:
On 7/1/21 3:26 AM, Kyle Weaver wrote:
I think it should accept complete list of PipelineOptions (or at
least some defined subset - PortabilityPipelineOptions,
ExperimentalOptions,
the current PR [1]?
[1] https://github.com/apache/beam/pull/15082
On Wed, Jun 30, 2021 at 8:48 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
> Not sure why we need the hacks with NoOpRunner
As noted earlier (and that was why I started this thread in the first
place
' runners. But that is a whole different
story. :)
Jan
On 6/30/21 5:35 PM, Robert Bradshaw wrote:
On Wed, Jun 30, 2021 at 7:41 AM Jan Lukavský wrote:
java -jar beam-sdks-java-io-expansion-service-2.30.0.jar
This does not accept any other parameters than the port. That is the first part of
th
:46 PM, Chamikara Jayalath wrote:
On Wed, Jun 30, 2021 at 7:41 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
> java -jar beam-sdks-java-io-expansion-service-2.30.0.jar
This does not accept any other parameters than the port. That is
the first part of this thre
ervice/ExpansionService.java#L398
On 6/30/21 3:57 PM, Chamikara Jayalath wrote:
On Wed, Jun 30, 2021 at 6:54 AM Chamikara Jayalath
mailto:chamik...@google.com>> wrote:
On Wed, Jun 30, 2021 at 1:20 AM Jan Lukavský mailto:je...@seznam.cz>> wrote:
On 6/30/21 1:16 AM, Robert Bradshaw wr
all of this working, mostly when you are starting to evaluate
Beam and don't have much experience with it.
We can get rid of b) (implement LOOPBACK in Flink) and c) (enable Python
SDK Kafka IO to spawn expansion service with the LOOPBACK environment
when submitting to Flink). That is why I still
ent in ExpansionService.
[1] https://github.com/apache/beam/pull/15099/files
<https://github.com/apache/beam/pull/15099/files>
On 6/29/21 11:51 PM, Chamikara Jayalath wrote:
On Tue, Jun 29, 2021 at 2:39 PM Jan Lukavský mailto:je...@seznam.cz>> wrote:
On 6/29/21
Jayalath wrote:
On Tue, Jun 29, 2021 at 2:39 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
On 6/29/21 11:04 PM, Robert Bradshaw wrote:
> You can configure the environment in the current state, you just
have
> to run your own expansion service that has a differ
rs library could be used.)
Yes, I also agree, that expansion service should be runner-dependent (or
at least runner-aware), as that brings optimizations. Runner could
ignore settings from previous point when it can be *sure* it can do so.
On Tue, Jun 29, 2021 at 1:29 PM Jan Lukavský wrote:
T
sform.
On Tue, Jun 29, 2021 at 11:47 AM Kyle Weaver wrote:
For context, there was a previous thread which touched on many of the same
points:
https://lists.apache.org/thread.html/r6f6fc207ed62e1bf2a1d41deeeab554e35cd2af389ce38289a303cea%40%3Cdev.beam.apache.org%3E
On Tue, Jun 29, 2021 at 11:16 AM J
.
On Tue, Jun 29, 2021 at 10:08 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
The argument for being able to accept (possibly ordered list of)
execution environments is in that this could make a single
instance of execution service reusable by various clients with
service API.
On Tue, Jun 29, 2021 at 9:42 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
If I understand it correctly, there is currently no place to set the
defaultEnvironmentType - python's KafkaIO uses either
'expansion_service' given by the user (which might be a
that
PortablePipelineOptions.setDefaultEnvironmentType(String) doesn’t work for you?
Or it’s only a specific case while using portable KafkaIO?
On 29 Jun 2021, at 09:51, Jan Lukavský wrote:
Hi,
I have come across an issue with cross-language transforms. My setup is I have
working environment type PROCESS and I
Hi,
I have come across an issue with cross-language transforms. My setup is
I have working environment type PROCESS and I cannot use DOCKER. When I
use Python's KafkaIO, it unfortunately - by default - expands to docker
environment, which then fails due to missing 'docker' command. I didn't
Hi,
I have come across an issue with cross-language transforms. My setup is
I have working environment type PROCESS and I cannot use DOCKER. When I
use Python's KafkaIO, it unfortunately - by default - expands to docker
environment, which then fails due to missing 'docker' command. I didn't
,
watermark can move after the cycle finishes, but the cycle finishes when
the watermark moves). That is where the vector watermark must come in
place and where all the stuff gets complicated.
Thanks for this discussion, it helped me clarify some things.
Jan
On 6/24/21 1:44 AM, Jan Lukavský wrote
at the Beam layer. For
engines where the Beam watermark is implemented more directly (for
example Dataflow & Flink) there would be a lot of added complexity,
probably performance loss, if it could be done at all.
Kenn
On Wed, Jun 23, 2021 at 3:31 PM Reuven Lax <mailto:re...@google.co
he
iteration "gap"? Iteration is for the time being the domain of batch,
which is where the unified approach looses its points.
On 6/24/21 12:31 AM, Reuven Lax wrote:
On Wed, Jun 23, 2021 at 2:33 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
On 6/23/21 11:13 PM, Reuv
On 6/23/21 11:13 PM, Reuven Lax wrote:
On Wed, Jun 23, 2021 at 2:00 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
The most qualitatively import use-case I see are ACID transactions
- transactions naturally involve cycles, because the most natural
implementat
the uses I'm aware of are ML-type algorithms (e.g. clustering) or
iterative algorithms on large graphs (connected components, etc.), and
it's unclear how many of these problems have a natural time component.
On Wed, Jun 23, 2021 at 1:49 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
. But that would only mean, we might need two timeouts - one for
releasing the watermark hold and another for cancelling the iteration
completely.
On 6/23/21 10:43 PM, Jan Lukavský wrote:
Right, one can "outsource" this functionality through external source,
but that is a sort-of hackis
, though the design started getting very
complex (watermarks of unbounded dimension, where each iteration has
its own watermark dimension). At the time, the exploration petered out.
On Wed, Jun 23, 2021 at 10:13 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hi,
I'd like t
mailto:re...@google.com>> wrote:
This was explored in the past, though the design started getting
very complex (watermarks of unbounded dimension, where each
iteration has its own watermark dimension). At the time, the
exploration petered out.
On Wed, Jun 23, 2021 at 10
Hi,
I'd like to discuss a very rough idea. I didn't walk through all the
corner cases and the whole idea has a lot of rough edges, so please bear
with me. I was thinking about non-IO applications of splittable DoFn,
and the main idea - and why it is called splittable - is that it can
handle
rrectly implementing RequiresStableInput.
On Wed, Jun 16, 2021 at 11:18 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
I think that the support for @RequiresStableInput is rather
limited. AFAIK it is supported by streaming Flink (where it is not
needed in this situation) and by Da
it correctly. In the case of FileIO (which do not use
@RequiresStableInput as it would not be supported on Spark) the
randomness is easily avoidable (hashCode of key?) and I it seems to me
it should be preferred.
Jan
On 6/16/21 6:23 PM, Kenneth Knowles wrote:
On Wed, Jun 16, 2021 at 5:18 AM Jan
Hi,
maybe a little unrelated, but I think we definitely should not use
random assignment of shard keys (RandomShardingFunction), at least for
bounded workloads (seems to be fine for streaming workloads). Many batch
runners simply recompute path in the computation DAG from the failed
node
Hi,
+1 to remove the support for 1.10.
Jan
On 5/28/21 10:00 PM, Ismaël Mejía wrote:
Hello,
With Beam support for Flink 1.13 just merged it is the time to discuss
the end of
support for Flink 1.10 following the agreed policy on supporting only
the latest
three Flink releases [1].
I would
mer extant with that output timestamp doesn't help here.
On Fri, May 21, 2021 at 2:06 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
If I understand it correctly (and what I have observed from the
actual behavior on both FlinkRunner and DirectRunner) a relative
timer
on most runners today).
Reuven
On Thu, May 20, 2021 at 2:11 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Sounds like you could solve that using second event time timer,
that would be actually used only to hold the output timestamp
(watermark hold). Somet
y the same questions apply to
timers re-setting later timers and +Jan Lukavský has raised this
already (among other people) so we kind of know the answer now,
and I think +Boyuan Zhang code was good (from my quick read). What
has changed is that we have a better idea of the
+1 (non-binding)
verified correct shading.
Jan
On 5/19/21 8:53 PM, Ismaël Mejía wrote:
This release is only to publish the vendored dependency artifacts. We
need those to integrate it and be able to verify if it causes problems
or not. The PR for this is already opened but it needs the
I'll just remind that Beam already supports (experimental)
@RequiresTimeSortedInput (which has several limitations, mostly in that
it orders only by timestamp and not some - time related - user field;
and of course - missing retractions). An arbitrary sorting seems to be
hard, even per-key, it
tml
<https://www.apache.org/legal/resolved.html>
Regards
JB
Le 10 mai 2021 à 12:56, Elliotte Rusty Harold
mailto:elh...@ibiblio.org>> a écrit :
Anyone have a link to the official Apache policy about
t
Hi,
we are bundling dependencies with LGPL-2.1, according to license header
in META-INF/maven/org.jboss.modules/jboss-modules/pom.xml. I think is
might be an issue, already reported here: [1]. I created [2] to track it
on our side.
Jan
[1] https://issues.apache.org/jira/browse/FLINK-22555
he cleanup timer to be 1ms
passed the end of the window (plus allowed lateness), it's
guaranteed to be the last timer that fires for that window.
On Wed, May 5, 2021 at 2:41 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hm, one thing in the last paragraph seems there might be s
y @OnWindowExpiration is a meaningful feature.
Kenn
On Tue, May 4, 2021 at 4:29 AM Jan Lukavský mailto:je...@seznam.cz>> wrote:
Hi Kenn,
I created BEAM-12276 [1] with PR [2].
Jan
[1] https://issues.apache.org/jira/browse/BEAM-12276
<http
ull stack trace
and/or a failing unit test in a PR?
Kenn
On Thu, Apr 29, 2021 at 12:51 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hi,
I have come across a bug with timer output timestamp - when using
event
time and relative timers, setting the timer can arbit
Hi,
I have come across a bug with timer output timestamp - when using event
time and relative timers, setting the timer can arbitrarily throw
IllegalArgumentException if the firing timestamp (input watermark) is
ahead of the output timestamp (like .java.lang.IllegalArgumentException:
re may be more
interesting ways to match the side input than simply using windowing. It might
be worth thinking about this topic more deeply.
On Tue, Apr 27, 2021 at 12:51 PM Jan Lukavský wrote:
On 4/27/21 9:26 PM, Robert Bradshaw wrote:
On Tue, Apr 27, 2021 at 12:05 PM Jan Lukavský wrote:
On 4/27/21 9:26 PM, Robert Bradshaw wrote:
On Tue, Apr 27, 2021 at 12:05 PM Jan Lukavský wrote:
On 4/27/21 8:51 PM, Robert Bradshaw wrote:
On Tue, Apr 27, 2021 at 11:25 AM Jan Lukavský wrote:
Are you asking for a way to ignore early triggers on side input mapping, and
only map to on-time
On 4/27/21 8:51 PM, Robert Bradshaw wrote:
On Tue, Apr 27, 2021 at 11:25 AM Jan Lukavský wrote:
Are you asking for a way to ignore early triggers on side input mapping, and
only map to on-time triggered values for the window?
No, that could for sure be done before applying the View transform
021 at 10:53 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
> If using early triggers, then the side input loads the latest
trigger for that window.
Does not the the word 'latest' imply processing time matching? The
watermark of main input might be arbitrari
e FlinkRunner implementation is incomplete, in
which case it should be fixed.
On Tue, Apr 27, 2021 at 9:36 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
It seems to me, that this is true only with matching windows on
both sides and default trigger of the side input. Then it will
(due
time."
Jan
[1] https://beam.apache.org/documentation/patterns/side-inputs/
On 4/27/21 6:31 PM, Reuven Lax wrote:
Side inputs are matched based on event time, not processing time.
On Tue, Apr 27, 2021 at 12:43 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hi,
I ha
everything is Event time by default, not Processing
time.
On Tue, Apr 27, 2021, 12:43 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hi,
I have a question about matching side inputs to main input. First a
recap, to make sure I understand the current state correctly:
a)
Hi,
I have a question about matching side inputs to main input. First a
recap, to make sure I understand the current state correctly:
a) elements from main input are pushed back (stored in state) until a
first side input pane arrives (that might be on time, or early)
b) after that,
iven
that a default value of 0 would not be very useful).
On Mon, Apr 26, 2021 at 10:05 AM Jan Lukavský mailto:je...@seznam.cz>> wrote:
>
> Hi Kenn,
>
> On 4/26/21 5:59 PM, Kenneth Knowles wrote:
>
> In +Reza Rokni's example of looping timer
er has already passed. It
will fire immediately then. In the @OnTimer you output from the
buffer. I think there may be more efficient ways to achieve this output.
Kenn
On Thu, Apr 22, 2021 at 2:48 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hi,
I have come across a "pr
Hi,
I have come across a "problem" while implementing some toy Pipeline. I
would like to split input PCollection into two parts - droppable data
(delayed for more than allowed lateness from the end of the window) from
the rest. I will not go into details, as that is not relevant, the
problem
Congrats Yichi!
On 4/22/21 4:58 AM, Ahmet Altay wrote:
Congratulations Yichi!
On Wed, Apr 21, 2021 at 6:48 PM Chamikara Jayalath
mailto:chamik...@google.com>> wrote:
Congrats Yichi!
On Wed, Apr 21, 2021 at 6:14 PM Heejong Lee mailto:heej...@google.com>> wrote:
mehow part of the proposed LTS support
process.
Jan
On 4/7/21 4:23 PM, Elliotte Rusty Harold wrote:
On Tue, Apr 6, 2021 at 9:43 PM Jan Lukavský wrote:
Hi,
do we know what is the reason users stay on an older version of Beam? My guess would be that it is
not related to API changes, but m
Hi,
do we know what is the reason users stay on an older version of Beam? My
guess would be that it is not related to API changes, but more likely to
state incompatibility. Maybe if we could figure out a way which would
enable a smooth migration of state (and timers) between Beam versions,
Thanks for filing that. Once it is fixed in IntelliJ, the
annotations actually add value for downstream users.
Kenn
On Thu, Apr 1, 2021 at 1:10 AM Jan Lukavský mailto:je...@seznam.cz>> wrote:
Hi,
I created the issue in JetBrains tr
Ismaël or Reuven - provide them with this issue
report? It sounds like Jan you have an example ready to go.
Kenn
On Mon, Mar 15, 2021 at 1:29 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Yes, annotations that we add to the code base on purpose (like
@Nullable or @S
there's an IntelliJ setting that
would stop this from happening?
On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský
mailto:je...@seznam.cz>> wrote:
I don't know the details of the checkerframework, but
there seems
also been annoyed at IntelliJ autogenenerating all these
annotations. I believe Kenn said that this was not the intention -
maybe there's an IntelliJ setting that would stop this from happening?
On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
I d
I don't know the details of the checkerframework, but there seems a
contradiction between what code is (currently) generated and what we
therefore release and what actually the checkerframework states [1]:
@UnknownKeyFor:
Used internally by the type system; should never be written by a
Yes, annotations that we add to the code base on purpose (like @Nullable
or @SuppressWarnings) are aboslutely fine. What is worse is that the
checked is not only checked, but a code generator. :)
For example when one wants to implement Coder by extending CustomCoder
and use auto-generating
10:38 AM Brian
Hulette mailto:bhule...@google.com>> wrote:
On Fri, Jan 22, 2021 at 1:18 AM Jan
Lukavský mailto:je...@seznam.cz>> wrote:
Hi,
ger with some different trigger than with each pane?
Jan
On 2/25/21 12:44 AM, Kenneth Knowles wrote:
On Wed, Feb 24, 2021 at 12:44 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hi Robert,
> Here "sink" is really any observable outside effect, so I
think
in downstream processing. Therefore the final triggering
cannot be "hourly output" at least not with regard to the rate of change
in inputs.
On 2/23/21 5:47 PM, Robert Bradshaw wrote:
On Tue, Feb 23, 2021 at 1:07 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
First,
for making
downstream triggering optional (could be useful prep for sink
triggers)
+1
[1] https://s.apache.org/beam-sink-triggers
[2] https://github.com/apache/beam/pull/4742
[3] https://github.com/apache/beam/pull/9199
[4] https://s.apache.org
The same holds true for pane accumulation mode.
Jan
On 2/22/21 10:21 AM, Jan Lukavský wrote:
Hi,
I'm not sure if I got everything from this thread right, but from my
point of view, triggers are property of GBK. They are property of
neither windowing, nor PCollection, but relate solely
Hi,
I'm not sure if I got everything from this thread right, but from my
point of view, triggers are property of GBK. They are property of
neither windowing, nor PCollection, but relate solely to GBK. This can
be seen from the fact, that unlike windowFn, triggers are completely
ignored in
ode only.
Jan
On 1/22/21 7:37 PM, Brian Hulette wrote:
On Fri, Jan 22, 2021 at 1:18 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hi,
I'll give my two cents here.
I'm not 100% sure that the 1-5% of bugs are as severe as other
types of bugs. Yes, throwing NPEs a
Hi,
I'll give my two cents here.
I'm not 100% sure that the 1-5% of bugs are as severe as other types of
bugs. Yes, throwing NPEs at user is not very polite. On the other hand,
many of these actually boil down to user errors. Then we might ask what
a correct solution would be. If we manage
+1 (non-binding).
I've validated the RC against my dependent projects (mainly Java SDK,
Flink and DirectRunner).
Thanks,
Jan
On 1/7/21 2:15 AM, Ahmet Altay wrote:
+1 (binding) - validated python quickstarts.
Thank you Pablo.
On Wed, Jan 6, 2021 at 1:57 PM Pablo Estrada
Hi Antonio,
can you please create one?
Thanks,
Jan
On 1/6/21 10:31 PM, Antonio Si wrote:
Thanks for the information. Do we have a jira to track this issue or do you
want me to create a jira for this?
Thanks.
Antonio.
On 2021/01/06 17:59:47, Kenneth Knowles wrote:
Agree with Boyuan &
Sorry for the typo in your name. :-)
On 1/6/21 10:11 AM, Jan Lukavský wrote:
Hi Antonie,
yes, for instance. I'd just like to rule out possibility that a single
DoFn processing multiple partitions (restrictions) brings some
overhead in your case.
Jan
On 12/31/20 10:36 PM, Antonio Si wrote
run with a
parallelism set to 900?
Thanks.
Antonio.
On 2020/12/23 20:30:34, Jan Lukavský wrote:
OK,
could you make an experiment and increase the parallelism to something
significantly higher than the total number of partitions? Say 5 times
higher? Would that have impact on throughput in your
. The highest TPS topic has 180 partitions,
while the lowest TPS topic has 12 partitions.
Thanks.
Antonio.
On 2020/12/23 12:28:42, Jan Lukavský wrote:
Hi Antonio,
can you please clarify a few things:
a) what parallelism you use for your sources
b) how many partitions there is in your topic(s
Hi Antonio,
can you please clarify a few things:
a) what parallelism you use for your sources
b) how many partitions there is in your topic(s)
Thanks,
Jan
On 12/22/20 10:07 PM, Antonio Si wrote:
Hi Boyuan,
Let me clarify, I have tried with and without using
:15 AM Robert Bradshaw <mailto:rober...@google.com>> wrote:
If readers are expensive to create, this seems like an important
(and not too difficult) optimization.
On Mon, Dec 21, 2020 at 11:04 AM Jan Lukavský mailto:je...@seznam.cz>> wrote:
Hi Boyuan,
Hi Boyuan,
I think your analysis is correct - with one exception. It should be
possible to reuse the reader if and only if the last taken
CheckpointMark equals to the new CheckpointMark the reader would be
created from. But - this equality is on the happy path and should be
satisfied for
/org/apache/beam/sdk/io/Read.java#L750
On Thu, Dec 17, 2020 at 2:42 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hi Boyuan,
> Several changes could be made into PubSub SDF implementation
specially. For example, the PuSub SDF can choose not respond to
the checkp
other workaround for using PubSub on DirectRunner might be using
--experiments=enable_custom_pubsub_source, This flag will make the
pipeline to use a DoFn to read from PubSub instead of using
UnboundedSource. Hope it will be helpful as well.
On Thu, Dec 17, 2020 at 1:09 PM Jan Lukavský <mailto:je..
tigation, Steve! It seems like preventing
the checkpoint from happening so frequently would be one
workaround for you. Making the checkpoint frequency
configurable from pipeline option seems like the way to go.
On Thu, Dec 17, 2020 at 7:35 AM Jan Lukavský mailt
ing.
[1]
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java#L959
On Thu, Dec 17, 2020 at 9:43 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hi Ismaël,
what I meant by the perfo
not necessarily exclude the other. Direct runner is our
most used runner, basically every Beam user relies on the direct
runners so every regression or improvement on it affects everyone, but
well that's a subject worth its own thread.
On Thu, Dec 17, 2020 at 10:55 AM Jan Lukavský wrote:
Hi,
from my point
Hi,
from my point of view the number in DirectRunner are set correctly.
Primary purpose of DirectRunner is testing, not performance, so
DirectRunner makes intentionally frequent checkpoints to easily exercise
potential bugs in user code. It might be possible to make the frequency
thread we could start about requesting a
safe-to-mutate copy of things so it only affects performance local to
that DoFn. Users can do this themselves, but if it is metadata on a
DoFn it may be optimized sometimes.
Kenn
On Tue, Nov 24, 2020 at 12:39 AM Jan Lukavský <mailto:je...@seznam
and we should remove
that.
Kenn
On Thu, Oct 29, 2020 at 8:03 AM Teodor Spæren
wrote:
Thanks Jan, this cleared some things up!
Best regards,
Teodor Spæren
On Thu, Oct 29, 2020 at 02:13:50PM +0100, Jan Lukavský wrote:
Hi Teodor,
the confusion here maybe comes from the fact, that there are two
, Jan Lukavský wrote:
Hi,
could this proposal be generalized to annotations of PCollections as
well? Maybe that reduces to several types of annotations of a
PTransform - e.g.
a) runtime annotations of a PTransform (that might be scheduling
hints - i.e. schedule this task to nodes with GPUs, etc
Hi,
could this proposal be generalized to annotations of PCollections as
well? Maybe that reduces to several types of annotations of a PTransform
- e.g.
a) runtime annotations of a PTransform (that might be scheduling hints
- i.e. schedule this task to nodes with GPUs, etc.)
b) output
Hi Tobias,
this looks like a bug, the clearGlobalState method has been introduced
in 2.25.0, and it (seems to) might have issues related to rocksdb, can
you file a Jira for that, please?
Thanks,
Jan
On 11/4/20 9:50 AM, Kaymak, Tobias wrote:
When running our Kafka-To-BigQuery pipeline with
0014 query:SESSION_SIDE_INPUT_JOIN; streamTimeout:60
Performance:
Conf Runtime(sec) (Baseline) Events(/sec) (Baseline)
Results (Baseline)
0.8 122100.1 10
0001 0.6 181488.2 92000
0002 0.3 363636.
Hi Teodor,
the confusion here maybe comes from the fact, that there are two
(logical) representations of an element in PCollection. One
representation is the never mutable (most probably serialized in a
binary form) form of a PCollection element, where no modifications are
possible. Once a
Hi Teodor and Max,
I think that there is not 100% need for all runners to behave exactly
the same way. The reason for that is that different runners can have
different purposes. The purpose of DirectRunner is to verify code of the
pipeline and (if it succeeds) to validate that it will
201 - 300 of 555 matches
Mail list logo