Re: Primitive Read not working with Flink portable runner

2021-07-18 Thread Jan Lukavský
,  Jan On 7/16/21 2:00 PM, Jan Lukavský wrote: Hi, I hit another issue with the portable Flink runner. Long story short - reading from Kafka is not working in portable Flink. After solving issues with expansion service configuration (ability to add use_deprecated_read) option, because flink

Primitive Read not working with Flink portable runner

2021-07-16 Thread Jan Lukavský
Hi, I hit another issue with the portable Flink runner. Long story short - reading from Kafka is not working in portable Flink. After solving issues with expansion service configuration (ability to add use_deprecated_read) option, because flink portable runner has issues with SDF [1], [2].

Re: Missing coder in python x-lang transform when using window Fn with AfterCount trigger

2021-07-15 Thread Jan Lukavský
m/blob/3e933b55f3d2072fb0248050f9091850933f33c7/model/pipeline/src/main/proto/beam_runner_api.proto#L784 <https://github.com/apache/beam/blob/3e933b55f3d2072fb0248050f9091850933f33c7/model/pipeline/src/main/proto/beam_runner_api.proto#L784> On Thu, Jul 15, 2021 at 12:16 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote:

Re: Missing coder in python x-lang transform when using window Fn with AfterCount trigger

2021-07-15 Thread Jan Lukavský
. On 7/15/21 8:44 PM, Jan Lukavský wrote: Hi, I hit an issue when using x-lang python pipeline (with ReadFromKafka) with subsequent WindowInto with trigger=Repeatedly(AfterCount(1)). The Pipeline looks as follows:   (p | ReadFromKafka(   consumer_config={'bootstrap.servers

Missing coder in python x-lang transform when using window Fn with AfterCount trigger

2021-07-15 Thread Jan Lukavský
Hi, I hit an issue when using x-lang python pipeline (with ReadFromKafka) with subsequent WindowInto with trigger=Repeatedly(AfterCount(1)). The Pipeline looks as follows:   (p | ReadFromKafka(   consumer_config={'bootstrap.servers': bootstrapServer},   topics=[inputTopic],  

Re: Dataflow dependencies require non-maven central dependencies (confluent kafka)

2021-07-12 Thread Jan Lukavský
PM Alexey Romanenko mailto:aromanenko@gmail.com>> wrote: I agree that we should make this optional. What would be the best way to it with gradle? On 11 Jul 2021, at 16:40, Jan Lukavský mailto:je...@seznam.cz>> wrote: I'd be +1 to making it optional as w

Re: Dataflow dependencies require non-maven central dependencies (confluent kafka)

2021-07-11 Thread Jan Lukavský
I'd be +1 to making it optional as well. Looks really like an overhead for users not using avro. On 7/11/21 10:36 AM, Alex Van Boxel wrote: It worked before 2.30. It's fine for when you're using Confluent Kafka, but feels like a hard dependency for non-Kafka users. Certainly the requirement 

Re: JavaPrecommit fails

2021-07-06 Thread Jan Lukavský
Looks to me to be failing consistently. At least appears so both on PRs and tested locally on master branch. On 7/5/21 1:34 PM, Alexey Romanenko wrote: Hello, JavaPrecommit fails because of

Re: Specifying environment for cross-language transform expansion

2021-07-02 Thread Jan Lukavský
Thu, Jul 1, 2021 at 2:11 PM Jan Lukavský wrote: This really does not match my experience. Passing the correct "use_deprecated_read" flag to the expansion service had the expected impact on the Flink's execution DAG and - most of all - it started to work (at least seems so). The UI i

Re: Specifying environment for cross-language transform expansion

2021-07-01 Thread Jan Lukavský
tly itself. It could attempt to deserialize the SDF ptransform and see if there is an UnboundedSource inside and then do whatever it wants with it. On Thu, Jul 1, 2021 at 11:39 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: I don't have complete comprehension of the topic, but f

Re: Specifying environment for cross-language transform expansion

2021-07-01 Thread Jan Lukavský
do we need to do that in the expansion service? On Thu, Jul 1, 2021 at 11:16 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: Hi, after today's experience I think I have some arguments about why we *should* pass (at least some) of the PipelineOptions from SDK to expansi

Re: Specifying environment for cross-language transform expansion

2021-07-01 Thread Jan Lukavský
https://github.com/apache/beam/pull/15082/commits/5a46664ceb9f03da3089925b30ecd0a802e8b3eb On 7/1/21 9:33 AM, Jan Lukavský wrote: On 7/1/21 3:26 AM, Kyle Weaver wrote: I think it should accept complete list of PipelineOptions (or at least some defined subset - PortabilityPipelineOptions, ExperimentalOptions,

Re: Specifying environment for cross-language transform expansion

2021-07-01 Thread Jan Lukavský
the current PR [1]? [1] https://github.com/apache/beam/pull/15082 On Wed, Jun 30, 2021 at 8:48 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:  > Not sure why we need the hacks with NoOpRunner As noted earlier (and that was why I started this thread in the first place

Re: Specifying environment for cross-language transform expansion

2021-06-30 Thread Jan Lukavský
' runners. But that is a whole different story. :)  Jan On 6/30/21 5:35 PM, Robert Bradshaw wrote: On Wed, Jun 30, 2021 at 7:41 AM Jan Lukavský wrote: java -jar beam-sdks-java-io-expansion-service-2.30.0.jar This does not accept any other parameters than the port. That is the first part of th

Re: Specifying environment for cross-language transform expansion

2021-06-30 Thread Jan Lukavský
:46 PM, Chamikara Jayalath wrote: On Wed, Jun 30, 2021 at 7:41 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: > java -jar beam-sdks-java-io-expansion-service-2.30.0.jar This does not accept any other parameters than the port. That is the first part of this thre

Re: Specifying environment for cross-language transform expansion

2021-06-30 Thread Jan Lukavský
ervice/ExpansionService.java#L398 On 6/30/21 3:57 PM, Chamikara Jayalath wrote: On Wed, Jun 30, 2021 at 6:54 AM Chamikara Jayalath mailto:chamik...@google.com>> wrote: On Wed, Jun 30, 2021 at 1:20 AM Jan Lukavský mailto:je...@seznam.cz>> wrote: On 6/30/21 1:16 AM, Robert Bradshaw wr

Re: Specifying environment for cross-language transform expansion

2021-06-30 Thread Jan Lukavský
all of this working, mostly when you are starting to evaluate Beam and don't have much experience with it. We can get rid of b) (implement LOOPBACK in Flink) and c) (enable Python SDK Kafka IO to spawn expansion service with the LOOPBACK environment when submitting to Flink). That is why I still

Re: Specifying environment for cross-language transform expansion

2021-06-29 Thread Jan Lukavský
ent in ExpansionService. [1] https://github.com/apache/beam/pull/15099/files <https://github.com/apache/beam/pull/15099/files> On 6/29/21 11:51 PM, Chamikara Jayalath wrote: On Tue, Jun 29, 2021 at 2:39 PM Jan Lukavský mailto:je...@seznam.cz>> wrote: On 6/29/21

Re: Specifying environment for cross-language transform expansion

2021-06-29 Thread Jan Lukavský
Jayalath wrote: On Tue, Jun 29, 2021 at 2:39 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote: On 6/29/21 11:04 PM, Robert Bradshaw wrote: > You can configure the environment in the current state, you just have > to run your own expansion service that has a differ

Re: Specifying environment for cross-language transform expansion

2021-06-29 Thread Jan Lukavský
rs library could be used.) Yes, I also agree, that expansion service should be runner-dependent (or at least runner-aware), as that brings optimizations. Runner could ignore settings from previous point when it can be *sure* it can do so. On Tue, Jun 29, 2021 at 1:29 PM Jan Lukavský wrote: T

Re: Specifying environment for cross-language transform expansion

2021-06-29 Thread Jan Lukavský
sform. On Tue, Jun 29, 2021 at 11:47 AM Kyle Weaver wrote: For context, there was a previous thread which touched on many of the same points: https://lists.apache.org/thread.html/r6f6fc207ed62e1bf2a1d41deeeab554e35cd2af389ce38289a303cea%40%3Cdev.beam.apache.org%3E On Tue, Jun 29, 2021 at 11:16 AM J

Re: Specifying environment for cross-language transform expansion

2021-06-29 Thread Jan Lukavský
. On Tue, Jun 29, 2021 at 10:08 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: The argument for being able to accept (possibly ordered list of) execution environments is in that this could make a single instance of execution service reusable by various clients with

Re: Specifying environment for cross-language transform expansion

2021-06-29 Thread Jan Lukavský
service API. On Tue, Jun 29, 2021 at 9:42 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: If I understand it correctly, there is currently no place to set the defaultEnvironmentType - python's KafkaIO uses either 'expansion_service' given by the user (which might be a

Re: Specifying environment for cross-language transform expansion

2021-06-29 Thread Jan Lukavský
that PortablePipelineOptions.setDefaultEnvironmentType(String) doesn’t work for you? Or it’s only a specific case while using portable KafkaIO? On 29 Jun 2021, at 09:51, Jan Lukavský wrote: Hi, I have come across an issue with cross-language transforms. My setup is I have working environment type PROCESS and I

Specifying environment for cross-language transform expansion

2021-06-29 Thread Jan Lukavský
Hi, I have come across an issue with cross-language transforms. My setup is I have working environment type PROCESS and I cannot use DOCKER. When I use Python's KafkaIO, it unfortunately - by default - expands to docker environment, which then fails due to missing 'docker' command. I didn't

Specifying environment for cross-language transform expansion

2021-06-29 Thread Jan Lukavský
Hi, I have come across an issue with cross-language transforms. My setup is I have working environment type PROCESS and I cannot use DOCKER. When I use Python's KafkaIO, it unfortunately - by default - expands to docker environment, which then fails due to missing 'docker' command. I didn't

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-24 Thread Jan Lukavský
, watermark can move after the cycle finishes, but the cycle finishes when the watermark moves). That is where the vector watermark must come in place and where all the stuff gets complicated. Thanks for this discussion, it helped me clarify some things.  Jan On 6/24/21 1:44 AM, Jan Lukavský wrote

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Jan Lukavský
at the Beam layer. For engines where the Beam watermark is implemented more directly (for example Dataflow & Flink) there would be a lot of added complexity, probably performance loss, if it could be done at all. Kenn On Wed, Jun 23, 2021 at 3:31 PM Reuven Lax <mailto:re...@google.co

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Jan Lukavský
he iteration "gap"? Iteration is for the time being the domain of batch, which is where the unified approach looses its points. On 6/24/21 12:31 AM, Reuven Lax wrote: On Wed, Jun 23, 2021 at 2:33 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote: On 6/23/21 11:13 PM, Reuv

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Jan Lukavský
On 6/23/21 11:13 PM, Reuven Lax wrote: On Wed, Jun 23, 2021 at 2:00 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote: The most qualitatively import use-case I see are ACID transactions - transactions naturally involve cycles, because the most natural implementat

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Jan Lukavský
the uses I'm aware of are ML-type algorithms (e.g. clustering) or iterative algorithms on large graphs (connected components, etc.), and it's unclear how many of these problems have a natural time component. On Wed, Jun 23, 2021 at 1:49 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote:

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Jan Lukavský
. But that would only mean, we might need two timeouts - one for releasing the watermark hold and another for cancelling the iteration completely. On 6/23/21 10:43 PM, Jan Lukavský wrote: Right, one can "outsource" this functionality through external source, but that is a sort-of hackis

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Jan Lukavský
, though the design started getting very complex (watermarks of unbounded dimension, where each iteration has its own watermark dimension). At the time, the exploration petered out. On Wed, Jun 23, 2021 at 10:13 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: Hi, I'd like t

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Jan Lukavský
mailto:re...@google.com>> wrote: This was explored in the past, though the design started getting very complex (watermarks of unbounded dimension, where each iteration has its own watermark dimension). At the time, the exploration petered out. On Wed, Jun 23, 2021 at 10

[DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Jan Lukavský
Hi, I'd like to discuss a very rough idea. I didn't walk through all the corner cases and the whole idea has a lot of rough edges, so please bear with me. I was thinking about non-IO applications of splittable DoFn, and the main idea - and why it is called splittable - is that it can handle

Re: FileIO with custom sharding function

2021-06-17 Thread Jan Lukavský
rrectly implementing RequiresStableInput. On Wed, Jun 16, 2021 at 11:18 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: I think that the support for @RequiresStableInput is rather limited. AFAIK it is supported by streaming Flink (where it is not needed in this situation) and by Da

Re: FileIO with custom sharding function

2021-06-16 Thread Jan Lukavský
it correctly. In the case of FileIO (which do not use @RequiresStableInput as it would not be supported on Spark) the randomness is easily avoidable (hashCode of key?) and I it seems to me it should be preferred.  Jan On 6/16/21 6:23 PM, Kenneth Knowles wrote: On Wed, Jun 16, 2021 at 5:18 AM Jan

Re: FileIO with custom sharding function

2021-06-16 Thread Jan Lukavský
Hi, maybe a little unrelated, but I think we definitely should not use random assignment of shard keys (RandomShardingFunction), at least for bounded workloads (seems to be fine for streaming workloads). Many batch runners simply recompute path in the computation DAG from the failed node

Re: [DISCUSS] Drop support for Flink 1.10

2021-05-31 Thread Jan Lukavský
Hi, +1 to remove the support for 1.10.  Jan On 5/28/21 10:00 PM, Ismaël Mejía wrote: Hello, With Beam support for Flink 1.13 just merged it is the time to discuss the end of support for Flink 1.10 following the agreed policy on supporting only the latest three Flink releases [1]. I would

Re: Some questions around GroupIntoBatches

2021-05-22 Thread Jan Lukavský
mer extant with that output timestamp doesn't help here. On Fri, May 21, 2021 at 2:06 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: If I understand it correctly (and what I have observed from the actual behavior on both FlinkRunner and DirectRunner) a relative timer

Re: Some questions around GroupIntoBatches

2021-05-21 Thread Jan Lukavský
on most runners today). Reuven On Thu, May 20, 2021 at 2:11 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: Sounds like you could solve that using second event time timer, that would be actually used only to hold the output timestamp (watermark hold). Somet

Re: Some questions around GroupIntoBatches

2021-05-20 Thread Jan Lukavský
y the same questions apply to timers re-setting later timers and +Jan Lukavský has raised this already (among other people) so we kind of know the answer now, and I think +Boyuan Zhang code was good (from my quick read). What has changed is that we have a better idea of the

Re: [VOTE] Vendored Dependencies Release Byte Buddy 1.11.0

2021-05-19 Thread Jan Lukavský
+1 (non-binding) verified correct shading.  Jan On 5/19/21 8:53 PM, Ismaël Mejía wrote: This release is only to publish the vendored dependency artifacts. We need those to integrate it and be able to verify if it causes problems or not. The PR for this is already opened but it needs the

Re: Ordered PCollections eventually?

2021-05-11 Thread Jan Lukavský
I'll just remind that Beam already supports (experimental) @RequiresTimeSortedInput (which has several limitations, mostly in that it orders only by timestamp and not some - time related - user field; and of course - missing retractions). An arbitrary sorting seems to be hard, even per-key, it

Re: LGPL-2.1 in beam-vendor-grpc

2021-05-10 Thread Jan Lukavský
tml <https://www.apache.org/legal/resolved.html> Regards JB Le 10 mai 2021 à 12:56, Elliotte Rusty Harold mailto:elh...@ibiblio.org>> a écrit : Anyone have a link to the official Apache policy about t

LGPL-2.1 in beam-vendor-grpc

2021-05-10 Thread Jan Lukavský
Hi, we are bundling dependencies with LGPL-2.1, according to license header in META-INF/maven/org.jboss.modules/jboss-modules/pom.xml. I think is might be an issue, already reported here: [1]. I created [2] to track it on our side.  Jan [1] https://issues.apache.org/jira/browse/FLINK-22555

Re: Timer.withOutputTimestamp().offset().setRelative seems unusable with event time

2021-05-05 Thread Jan Lukavský
he cleanup timer to be 1ms passed the end of the window (plus allowed lateness), it's guaranteed to be the last timer that fires for that window. On Wed, May 5, 2021 at 2:41 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: Hm, one thing in the last paragraph seems there might be s

Re: Timer.withOutputTimestamp().offset().setRelative seems unusable with event time

2021-05-05 Thread Jan Lukavský
y @OnWindowExpiration is a meaningful feature. Kenn On Tue, May 4, 2021 at 4:29 AM Jan Lukavský mailto:je...@seznam.cz>> wrote: Hi Kenn, I created BEAM-12276 [1] with PR [2].  Jan [1] https://issues.apache.org/jira/browse/BEAM-12276 <http

Re: Timer.withOutputTimestamp().offset().setRelative seems unusable with event time

2021-05-04 Thread Jan Lukavský
ull stack trace and/or a failing unit test in a PR? Kenn On Thu, Apr 29, 2021 at 12:51 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: Hi, I have come across a bug with timer output timestamp - when using event time and relative timers, setting the timer can arbit

Timer.withOutputTimestamp().offset().setRelative seems unusable with event time

2021-04-29 Thread Jan Lukavský
Hi, I have come across a bug with timer output timestamp - when using event time and relative timers, setting the timer can arbitrarily throw IllegalArgumentException if the firing timestamp (input watermark) is ahead of the output timestamp (like .java.lang.IllegalArgumentException:

Re: Event time matching of side input

2021-04-27 Thread Jan Lukavský
re may be more interesting ways to match the side input than simply using windowing. It might be worth thinking about this topic more deeply. On Tue, Apr 27, 2021 at 12:51 PM Jan Lukavský wrote: On 4/27/21 9:26 PM, Robert Bradshaw wrote: On Tue, Apr 27, 2021 at 12:05 PM Jan Lukavský wrote:

Re: Event time matching of side input

2021-04-27 Thread Jan Lukavský
On 4/27/21 9:26 PM, Robert Bradshaw wrote: On Tue, Apr 27, 2021 at 12:05 PM Jan Lukavský wrote: On 4/27/21 8:51 PM, Robert Bradshaw wrote: On Tue, Apr 27, 2021 at 11:25 AM Jan Lukavský wrote: Are you asking for a way to ignore early triggers on side input mapping, and only map to on-time

Re: Event time matching of side input

2021-04-27 Thread Jan Lukavský
On 4/27/21 8:51 PM, Robert Bradshaw wrote: On Tue, Apr 27, 2021 at 11:25 AM Jan Lukavský wrote: Are you asking for a way to ignore early triggers on side input mapping, and only map to on-time triggered values for the window? No, that could for sure be done before applying the View transform

Re: Event time matching of side input

2021-04-27 Thread Jan Lukavský
021 at 10:53 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: > If using early triggers, then the side input loads the latest trigger for that window. Does not the the word 'latest' imply processing time matching? The watermark of main input might be arbitrari

Re: Event time matching of side input

2021-04-27 Thread Jan Lukavský
e FlinkRunner implementation is incomplete, in which case it should be fixed. On Tue, Apr 27, 2021 at 9:36 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: It seems to me, that this is true only with matching windows on both sides and default trigger of the side input. Then it will (due

Re: Event time matching of side input

2021-04-27 Thread Jan Lukavský
time."  Jan [1] https://beam.apache.org/documentation/patterns/side-inputs/ On 4/27/21 6:31 PM, Reuven Lax wrote: Side inputs are matched based on event time, not processing time. On Tue, Apr 27, 2021 at 12:43 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: Hi, I ha

Re: Event time matching of side input

2021-04-27 Thread Jan Lukavský
everything is Event time by default, not Processing time. On Tue, Apr 27, 2021, 12:43 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: Hi, I have a question about matching side inputs to main input. First a recap, to make sure I understand the current state correctly:   a)

Event time matching of side input

2021-04-27 Thread Jan Lukavský
Hi, I have a question about matching side inputs to main input. First a recap, to make sure I understand the current state correctly:  a) elements from main input are pushed back (stored in state) until a first side input pane arrives (that might be on time, or early)  b) after that,

Re: Should WindowFn have a mininal Duration?

2021-04-26 Thread Jan Lukavský
iven that a default value of 0 would not be very useful). On Mon, Apr 26, 2021 at 10:05 AM Jan Lukavský mailto:je...@seznam.cz>> wrote: > > Hi Kenn, > > On 4/26/21 5:59 PM, Kenneth Knowles wrote: > > In +Reza Rokni's example of looping timer

Re: Should WindowFn have a mininal Duration?

2021-04-26 Thread Jan Lukavský
er has already passed. It will fire immediately then. In the @OnTimer you output from the buffer. I think there may be more efficient ways to achieve this output. Kenn On Thu, Apr 22, 2021 at 2:48 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: Hi, I have come across a "pr

Should WindowFn have a mininal Duration?

2021-04-22 Thread Jan Lukavský
Hi, I have come across a "problem" while implementing some toy Pipeline. I would like to split input PCollection into two parts - droppable data (delayed for more than allowed lateness from the end of the window) from the rest. I will not go into details, as that is not relevant, the problem

Re: [ANNOUNCE] New committer: Yichi Zhang

2021-04-22 Thread Jan Lukavský
Congrats Yichi! On 4/22/21 4:58 AM, Ahmet Altay wrote: Congratulations Yichi!  On Wed, Apr 21, 2021 at 6:48 PM Chamikara Jayalath mailto:chamik...@google.com>> wrote: Congrats Yichi! On Wed, Apr 21, 2021 at 6:14 PM Heejong Lee mailto:heej...@google.com>> wrote:

Re: Long term support versions of Beam Java

2021-04-08 Thread Jan Lukavský
mehow part of the proposed LTS support process.  Jan On 4/7/21 4:23 PM, Elliotte Rusty Harold wrote: On Tue, Apr 6, 2021 at 9:43 PM Jan Lukavský wrote: Hi, do we know what is the reason users stay on an older version of Beam? My guess would be that it is not related to API changes, but m

Re: Long term support versions of Beam Java

2021-04-06 Thread Jan Lukavský
Hi, do we know what is the reason users stay on an older version of Beam? My guess would be that it is not related to API changes, but more likely to state incompatibility. Maybe if we could figure out a way which would enable a smooth migration of state (and timers) between Beam versions,

Re: Null checking in Beam

2021-04-06 Thread Jan Lukavský
Thanks for filing that. Once it is fixed in IntelliJ, the annotations actually add value for downstream users. Kenn On Thu, Apr 1, 2021 at 1:10 AM Jan Lukavský mailto:je...@seznam.cz>> wrote: Hi, I created the issue in JetBrains tr

Re: Null checking in Beam

2021-04-01 Thread Jan Lukavský
Ismaël or Reuven - provide them with this issue report? It sounds like Jan you have an example ready to go. Kenn On Mon, Mar 15, 2021 at 1:29 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote: Yes, annotations that we add to the code base on purpose (like @Nullable or @S

Re: Null checking in Beam

2021-03-17 Thread Jan Lukavský
there's an IntelliJ setting that would stop this from happening? On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský mailto:je...@seznam.cz>> wrote: I don't know the details of the checkerframework, but there seems

Re: Null checking in Beam

2021-03-16 Thread Jan Lukavský
also been annoyed at IntelliJ autogenenerating all these annotations. I believe Kenn said that this was not the intention - maybe there's an IntelliJ setting that would stop this from happening? On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: I d

Re: Null checking in Beam

2021-03-16 Thread Jan Lukavský
I don't know the details of the checkerframework, but there seems a contradiction between what code is (currently) generated and what we therefore release and what actually the checkerframework states [1]: @UnknownKeyFor: Used internally by the type system; should never be written by a

Re: Null checking in Beam

2021-03-15 Thread Jan Lukavský
Yes, annotations that we add to the code base on purpose (like @Nullable or @SuppressWarnings) are aboslutely fine. What is worse is that the checked is not only checked, but a code generator. :) For example when one wants to implement Coder by extending CustomCoder and use auto-generating

Re: Null checking in Beam

2021-03-15 Thread Jan Lukavský
10:38 AM Brian Hulette mailto:bhule...@google.com>> wrote: On Fri, Jan 22, 2021 at 1:18 AM Jan Lukavský mailto:je...@seznam.cz>> wrote: Hi,

Re: Do we need synchronized processing time? / What to do about "continuation triggers"?

2021-02-25 Thread Jan Lukavský
ger with some different trigger than with each pane?  Jan On 2/25/21 12:44 AM, Kenneth Knowles wrote: On Wed, Feb 24, 2021 at 12:44 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: Hi Robert, > Here "sink" is really any observable outside effect, so I think 

Re: Do we need synchronized processing time? / What to do about "continuation triggers"?

2021-02-24 Thread Jan Lukavský
in downstream processing. Therefore the final triggering cannot be "hourly output" at least not with regard to the rate of change in inputs. On 2/23/21 5:47 PM, Robert Bradshaw wrote: On Tue, Feb 23, 2021 at 1:07 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: First,

Re: Do we need synchronized processing time? / What to do about "continuation triggers"?

2021-02-23 Thread Jan Lukavský
for making downstream triggering optional (could be useful prep for sink triggers) +1 [1] https://s.apache.org/beam-sink-triggers [2] https://github.com/apache/beam/pull/4742 [3] https://github.com/apache/beam/pull/9199 [4] https://s.apache.org

Re: Do we need synchronized processing time? / What to do about "continuation triggers"?

2021-02-22 Thread Jan Lukavský
The same holds true for pane accumulation mode.  Jan On 2/22/21 10:21 AM, Jan Lukavský wrote: Hi, I'm not sure if I got everything from this thread right, but from my point of view, triggers are property of GBK. They are property of neither windowing, nor PCollection, but relate solely

Re: Do we need synchronized processing time? / What to do about "continuation triggers"?

2021-02-22 Thread Jan Lukavský
Hi, I'm not sure if I got everything from this thread right, but from my point of view, triggers are property of GBK. They are property of neither windowing, nor PCollection, but relate solely to GBK. This can be seen from the fact, that unlike windowFn, triggers are completely ignored in

Re: Null checking in Beam

2021-01-22 Thread Jan Lukavský
ode only.  Jan On 1/22/21 7:37 PM, Brian Hulette wrote: On Fri, Jan 22, 2021 at 1:18 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: Hi, I'll give my two cents here. I'm not 100% sure that the 1-5% of bugs are as severe as other types of bugs. Yes, throwing NPEs a

Re: Null checking in Beam

2021-01-22 Thread Jan Lukavský
Hi, I'll give my two cents here. I'm not 100% sure that the 1-5% of bugs are as severe as other types of bugs. Yes, throwing NPEs at user is not very polite. On the other hand, many of these actually boil down to user errors. Then we might ask what a correct solution would be. If we manage

Re: [VOTE] Release 2.27.0, release candidate #4

2021-01-07 Thread Jan Lukavský
+1 (non-binding). I've validated the RC against my dependent projects (mainly Java SDK, Flink and DirectRunner). Thanks,  Jan On 1/7/21 2:15 AM, Ahmet Altay wrote: +1 (binding) - validated python quickstarts. Thank you Pablo. On Wed, Jan 6, 2021 at 1:57 PM Pablo Estrada

Re: Compatibility between Beam v2.23 and Beam v2.26

2021-01-07 Thread Jan Lukavský
Hi Antonio, can you please create one? Thanks,  Jan On 1/6/21 10:31 PM, Antonio Si wrote: Thanks for the information. Do we have a jira to track this issue or do you want me to create a jira for this? Thanks. Antonio. On 2021/01/06 17:59:47, Kenneth Knowles wrote: Agree with Boyuan &

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2021-01-06 Thread Jan Lukavský
Sorry for the typo in your name. :-) On 1/6/21 10:11 AM, Jan Lukavský wrote: Hi Antonie, yes, for instance. I'd just like to rule out possibility that a single DoFn processing multiple partitions (restrictions) brings some overhead in your case. Jan On 12/31/20 10:36 PM, Antonio Si wrote

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2021-01-06 Thread Jan Lukavský
run with a parallelism set to 900? Thanks. Antonio. On 2020/12/23 20:30:34, Jan Lukavský wrote: OK, could you make an experiment and increase the parallelism to something significantly higher than the total number of partitions? Say 5 times higher? Would that have impact on throughput in your

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2020-12-23 Thread Jan Lukavský
. The highest TPS topic has 180 partitions, while the lowest TPS topic has 12 partitions. Thanks. Antonio. On 2020/12/23 12:28:42, Jan Lukavský wrote: Hi Antonio, can you please clarify a few things:  a) what parallelism you use for your sources  b) how many partitions there is in your topic(s

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2020-12-23 Thread Jan Lukavský
Hi Antonio, can you please clarify a few things:  a) what parallelism you use for your sources  b) how many partitions there is in your topic(s) Thanks,  Jan On 12/22/20 10:07 PM, Antonio Si wrote: Hi Boyuan, Let me clarify, I have tried with and without using

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2020-12-21 Thread Jan Lukavský
:15 AM Robert Bradshaw <mailto:rober...@google.com>> wrote: If readers are expensive to create, this seems like an important (and not too difficult) optimization. On Mon, Dec 21, 2020 at 11:04 AM Jan Lukavský mailto:je...@seznam.cz>> wrote: Hi Boyuan,

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2020-12-21 Thread Jan Lukavský
Hi Boyuan, I think your analysis is correct - with one exception. It should be possible to reuse the reader if and only if the last taken CheckpointMark equals to the new CheckpointMark the reader would be created from. But - this equality is on the happy path and should be satisfied for

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2020-12-18 Thread Jan Lukavský
/org/apache/beam/sdk/io/Read.java#L750 On Thu, Dec 17, 2020 at 2:42 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote: Hi Boyuan, > Several changes could be made into PubSub SDF implementation specially. For example, the PuSub SDF can choose not respond to the checkp

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2020-12-17 Thread Jan Lukavský
other workaround for using PubSub on DirectRunner might be using --experiments=enable_custom_pubsub_source, This flag will make the pipeline to use a DoFn to read from PubSub instead of using UnboundedSource. Hope it will be helpful as well. On Thu, Dec 17, 2020 at 1:09 PM Jan Lukavský <mailto:je..

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2020-12-17 Thread Jan Lukavský
tigation, Steve! It seems like preventing the checkpoint from happening so frequently would be one workaround for you. Making the checkpoint frequency configurable from pipeline option seems like the way to go. On Thu, Dec 17, 2020 at 7:35 AM Jan Lukavský mailt

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2020-12-17 Thread Jan Lukavský
ing. [1] https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java#L959 On Thu, Dec 17, 2020 at 9:43 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote: Hi Ismaël, what I meant by the perfo

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2020-12-17 Thread Jan Lukavský
not necessarily exclude the other. Direct runner is our most used runner, basically every Beam user relies on the direct runners so every regression or improvement on it affects everyone, but well that's a subject worth its own thread. On Thu, Dec 17, 2020 at 10:55 AM Jan Lukavský wrote: Hi, from my point

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2020-12-17 Thread Jan Lukavský
Hi, from my point of view the number in DirectRunner are set correctly. Primary purpose of DirectRunner is testing, not performance, so DirectRunner makes intentionally frequent checkpoints to easily exercise potential bugs in user code. It might be possible to make the frequency

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-11-24 Thread Jan Lukavský
thread we could start about requesting a safe-to-mutate copy of things so it only affects performance local to that DoFn. Users can do this themselves, but if it is metadata on a DoFn it may be optimized sometimes. Kenn On Tue, Nov 24, 2020 at 12:39 AM Jan Lukavský <mailto:je...@seznam

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-11-24 Thread Jan Lukavský
and we should remove that. Kenn On Thu, Oct 29, 2020 at 8:03 AM Teodor Spæren wrote: Thanks Jan, this cleared some things up! Best regards, Teodor Spæren On Thu, Oct 29, 2020 at 02:13:50PM +0100, Jan Lukavský wrote: Hi Teodor, the confusion here maybe comes from the fact, that there are two

Re: PTransform Annotations Proposal

2020-11-16 Thread Jan Lukavský
, Jan Lukavský wrote: Hi, could this proposal be generalized to annotations of PCollections as well? Maybe that reduces to several types of annotations of a PTransform - e.g.  a) runtime annotations of a PTransform (that might be scheduling hints - i.e. schedule this task to nodes with GPUs, etc

Re: PTransform Annotations Proposal

2020-11-16 Thread Jan Lukavský
Hi, could this proposal be generalized to annotations of PCollections as well? Maybe that reduces to several types of annotations of a PTransform - e.g.  a) runtime annotations of a PTransform (that might be scheduling hints - i.e. schedule this task to nodes with GPUs, etc.)  b) output

Re: Beam 2.25.0 / Flink 1.11.2 - Job failing after upgrading from 2.24.0 / Flink 1.10.2

2020-11-04 Thread Jan Lukavský
Hi Tobias, this looks like a bug, the clearGlobalState method has been introduced in 2.25.0, and it (seems to) might have issues related to rocksdb, can you file a Jira for that, please? Thanks,  Jan On 11/4/20 9:50 AM, Kaymak, Tobias wrote: When running our Kafka-To-BigQuery pipeline with

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-11-03 Thread Jan Lukavský
 0014  query:SESSION_SIDE_INPUT_JOIN; streamTimeout:60 Performance:  Conf  Runtime(sec)    (Baseline)  Events(/sec) (Baseline)   Results    (Baseline)     0.8 122100.1  10 0001   0.6 181488.2   92000 0002   0.3 363636.

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-29 Thread Jan Lukavský
Hi Teodor, the confusion here maybe comes from the fact, that there are two (logical) representations of an element in PCollection. One representation is the never mutable (most probably serialized in a binary form) form of a PCollection element, where no modifications are possible. Once a

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-29 Thread Jan Lukavský
Hi Teodor and Max, I think that there is not 100% need for all runners to behave exactly the same way. The reason for that is that different runners can have different purposes. The purpose of DirectRunner is to verify code of the pipeline and (if it succeeds) to validate that it will

<    1   2   3   4   5   6   >