Re: [EXTERNAL] Re: Hiding logging for beam playground examples

2024-03-08 Thread Vlado Djerek via dev
Hi Joey, We just deployed the 2.54 changes on playground. Can you now check? Thanks, Vlado From: Joey Tran Sent: Friday, March 8, 2024 2:45:32 PM To: Andrey Devyatkin Subject: [EXTERNAL] Re: Hiding logging for beam playground examples Ah thanks Andrey! I had

Re: [EXTERNAL] Re: SDF to read from a Spark custom receiver

2022-06-01 Thread Elizaveta Lomteva
: [EXTERNAL] Re: SDF to read from a Spark custom receiver Though the Spark custom receiver protocol doesn't look rich enough to support SDFs, it does look like the Hubspot APIs do support pagination which could be used to build an SDF (using the page as the offsets) directly. (That being said

Re: [EXTERNAL] Re: [EXTERNAL]

2021-06-15 Thread Raphael Sanamyan
Hello, Is it somehow related to this work [1]? No, this work adds the ability to return values from a sql insert query. There are no improvements to work with row and schema in it. Not sure that I got it. Could you elaborate a bit on this? When we using "Write" with table and without

Re: [EXTERNAL] Re:

2021-06-09 Thread Raphael Sanamyan
. 19:12:41 Кому: dev Копия: Reuven Lax; pabl...@google.com; Ilya Kozyrev Тема: [EXTERNAL] Re: > And also the ticket and "// TODO: BEAM-10396 use writeRows() when it's > available" appeared later than this functionality was added to "JdbcIO.Write". Note that this TODO has

Re: [External] : Re: outputReceiver.output() does not emit the result immediately

2021-01-26 Thread yu . b . zhang
Hi Boyuan, Thanks for replying. We are using beam 2.25.0 and direct runner for testing. We are trying to develop an unbounded streaming service connector with splittable DoFn. In our connector.read(), we want to commit the message back to stream after output the record to downstream user

Re: [External] Re: DISCUSS: Sorted MapState API

2020-09-28 Thread Kenneth Knowles
On Wed, Sep 16, 2020 at 8:48 AM Tyson Hamilton wrote: > The use case is to support an unbounded stream-stream join, where the > elements are arriving in roughly time sorted order. Removing a specific > element from the timestamp indexed collection is necessary when a match is > found. > Just

Re: [External] Re: DISCUSS: Sorted MapState API

2020-09-16 Thread Tyson Hamilton
The use case is to support an unbounded stream-stream join, where the elements are arriving in roughly time sorted order. Removing a specific element from the timestamp indexed collection is necessary when a match is found. Having clearRange is helpful to expire elements that are no longer

Re: [External] Re: DISCUSS: Sorted MapState API

2020-09-15 Thread Reuven Lax
Hi, Currently we only support removing a timestamp range. You can remove a single timestamp of course by removing [ts, ts+1), however if there are multiple elements with the same timestamp this will remove all of those elements. Does this fit your use case? If not, I wonder if MapState is closer

Re: [External] Re: DISCUSS: Sorted MapState API

2020-09-15 Thread Tyson Hamilton
Hi Reuven, I noticed that there was an implementation of the in-memory OrderedListState introduced [1]. Where can I find out more regarding the plan and design? Is there a design doc? I'd like to know more details about the implementation to see if it fits my use case. I was hoping it would

Re: [External] Re: Memory Issue When Running Beam On Flink

2020-08-31 Thread David Gogokhiya
I have a very naive question. I know Jan suggested to use 2 successive fixed overlapping windows with offset as a temporary solution to dedup the events. However, I am wondering whether using a single fixed window of length let's say 1 day followed by a deduplicate function is a good

Re: [External] Re: Memory Issue When Running Beam On Flink

2020-08-27 Thread Jan Lukavský
> If the user chooses to create a window of 10 years, I'd say it is expected behavior that the state will be kept for as long as this duration. State will be kept, the problem is that each key in the window will carry a cleanup timer, although there might be nothing to clear (there is no

Re: [External] Re: Memory Issue When Running Beam On Flink

2020-08-27 Thread Maximilian Michels
If the user chooses to create a window of 10 years, I'd say it is expected behavior that the state will be kept for as long as this duration. GlobalWindows are different because they represent the default case where the user does not even use windowing. I think it warrants to be treated

Re: [External] Re: Memory Issue When Running Beam On Flink

2020-08-26 Thread Jan Lukavský
Window triggering is afaik operation that is specific to GBK. Stateful DoFns can have (as shown in the case of deduplication) timers set for the GC only, triggering has no effect there. And yes, if we have other timers than GC (any user timers), then we have to have GC timer (because timers

Re: [External] Re: Memory Issue When Running Beam On Flink

2020-08-26 Thread Maximilian Michels
The inefficiency described happens if and only if the following two conditions are met: a) there are many timers per single window (as otherwise they will be negligible) b) there are many keys which actually contain no state (as otherwise the timer would be negligible wrt the state size)

Re: [External] Re: Memory Issue When Running Beam On Flink

2020-08-26 Thread Jan Lukavský
On 8/25/20 9:27 PM, Maximilian Michels wrote: I agree that this probably solves the described issue in the most straightforward way, but special handling for global window feels weird, as there is really nothing special about global window wrt state cleanup. Why is special handling for the

Re: [External] Re: Memory Issue When Running Beam On Flink

2020-08-25 Thread Maximilian Michels
I agree that this probably solves the described issue in the most straightforward way, but special handling for global window feels weird, as there is really nothing special about global window wrt state cleanup. Why is special handling for the global window weird? After all, it is a special

Re: [External] Re: Memory Issue When Running Beam On Flink

2020-08-24 Thread Maximilian Michels
I'd suggest a modified option (2) which does not use a timer to perform the cleanup (as mentioned, this will cause problems with migrating state). Instead, whenever we receive a watermark which closes the global window, we enumerate all keys and cleanup the associated state. This is the

Re: [External] Re: Memory Issue When Running Beam On Flink

2020-08-15 Thread Maximilian Michels
Awesome! Thanks a lot for the memory profile. Couple remarks: a) I can see that there are about 378k keys and each of them sets a timer. b) Based on the settings for DeduplicatePerKey you posted, you will keep track of all keys of the last 30 minutes. Unless you have much fewer keys, the

Re: [External] Re: DISCUSS: Sorted MapState API

2020-08-03 Thread Catlyn Kong
Hey folks, Sry I'm late to this thread but this might be very helpful for the problem we're dealing with. Do we have a design doc or a jira ticket I can follow? Cheers, Catlyn On Thu, Jun 18, 2020 at 1:11 PM Jan Lukavský wrote: > My questions were just an example. I fully agree there is a

Re: [External] Re: Ensuring messages are processed and emitted in-order

2020-06-16 Thread Catlyn Kong
Thanks for all the ideas! We looked around and found out that @RequiresTimeSortedInput is not supported on portable Flink runner for streaming pipelines

Re: [External] Re: Ensuring messages are processed and emitted in-order

2020-06-11 Thread Jan Lukavský
Hi, I'm afraid @RequiresTimeSortedInput currently does not fit this requirement, because it works on event _timestamps_ only. Assigning Kafka offsets as event timestamps is probably not a good idea. In the original proposal [1] there is mention that it would be good to implement sorting by

Re: [External] Re: Ensuring messages are processed and emitted in-order

2020-06-11 Thread Reuven Lax
I would not recommend using RequiresTimeSortedInput in this way. I also would not use ingestion time, as in a distributed environment, time skew between workers might mess up the order. I will ping the discussion on the sorted state API and add you. My hope is that this can be implemented

Re: [External] Re: Ensuring messages are processed and emitted in-order

2020-06-10 Thread Catlyn Kong
Thank y’all for the input! About the RequiresTimeSortedInput, we were thinking of the following 2 potential approaches: 1. Assign kafka offset as the timestamp while doing a GroupByKey on partition_id in a GlobalWindow 2. Rely on the fact that Flink consumes from kafka

Re: [EXTERNAL] Re: Java Build broken

2020-03-05 Thread Maximilian Michels
ache.org>> *Cc:* Stefan Djelekar mailto:stefan.djele...@redbox.com>> *Subject:* [EXTERNAL] Re: Java Build broken __ __ Hi All, __ __ Was this issue resolved? I started to get the sa

Re: [EXTERNAL] Re: Java Build broken

2020-03-04 Thread Thomas Weise
t;>> >>>> It looks like on localhost build references >>>> https://oss.sonatype.org/content/repositories/staging/com/google/errorprone/error_prone_check_api/2.3.4/ >>>> >>>> istead of >>>> >>>> >>>> http

Re: [EXTERNAL] Re: Java Build broken

2020-03-03 Thread Pulasthi Supun Wickramasinghe
fact/com.google.errorprone/error_prone_check_api/2.3.4 >>> >>> >>> >>> and the first link returns 404 >>> >>> >>> >>> >>> >>> Can you please advise? >>> >>> >>> >>> All the be

Re: [EXTERNAL] Re: Java Build broken

2020-03-03 Thread Kamil Wasilewski
_prone_check_api/2.3.4 >> >> >> >> and the first link returns 404 >> >> >> >> >> >> Can you please advise? >> >> >> >> All the best, >> >> Stefan >> >> >> >> *From:* Pulasthi

Re: [EXTERNAL] Re: Java Build broken

2020-02-25 Thread Pulasthi Supun Wickramasinghe
; > Can you please advise? > > > > All the best, > > Stefan > > > > *From:* Pulasthi Supun Wickramasinghe > *Sent:* Tuesday, February 18, 2020 5:11 PM > *To:* dev > *Cc:* Stefan Djelekar > *Subject:* [EXTERNAL] Re: Java Build broken > >

RE: [EXTERNAL] Re: Java Build broken

2020-02-25 Thread Stefan Djelekar
: [EXTERNAL] Re: Java Build broken Hi All, Was this issue resolved? I started to get the same error on my local build suddenly. Best Regards, Pulasthi On Thu, Jan 23, 2020 at 10:17 AM Maximilian Michels mailto:m...@apache.org>> wrote: Do you have any overrides in your ~/.m2/settin

Re: [EXTERNAL] Re: FirestoreIO connector [JavaSDK]

2019-12-02 Thread Chamikara Jayalath
> > > Getting a review would be nice. > > > > All the best, > > Stefan > > > > *From:* Chamikara Jayalath > *Sent:* Wednesday, November 6, 2019 8:05 PM > *To:* dev > *Subject:* Re: [EXTERNAL] Re: FirestoreIO connector [JavaSDK] > > > > BTW

RE: [EXTERNAL] Re: FirestoreIO connector [JavaSDK]

2019-12-02 Thread Stefan Djelekar
Subject: Re: [EXTERNAL] Re: FirestoreIO connector [JavaSDK] BTW, FYI, I'm also talking with folks from Google Firestore team regarding this. I think they had shown some interest in taking this up but I'm not sure. If they are able to contribute here, and if we can coordinate with some of

Re: [EXTERNAL] Re: FirestoreIO connector [JavaSDK]

2019-11-06 Thread Chamikara Jayalath
ll make the PR in the next few days and let’s pick it up from there. > > > > King regards, > > Stefan > > > > > > *From:* Chamikara Jayalath > *Sent:* Tuesday, November 5, 2019 2:24 AM > *To:* dev > *Subject:* [EXTERNAL] Re: FirestoreIO connector [

RE: [EXTERNAL] Re: FirestoreIO connector [JavaSDK]

2019-11-06 Thread Stefan Djelekar
AM To: dev Subject: [EXTERNAL] Re: FirestoreIO connector [JavaSDK] Thanks for the contribution. Happy to help with the review. Also, probably it'll be good to follow patterns employed by the existing Datastore connector when applicable: https://github.com/apache/beam/blob/master/sdks/java/io