Re: [PROPOSAL] Preparing for Beam 2.26.0 release

2020-10-27 Thread Reza Rokni
+1 Thanx! On Wed, Oct 28, 2020 at 7:14 AM Valentyn Tymofieiev wrote: > +1, thanks and good luck! > > On Tue, Oct 27, 2020 at 1:59 PM Tyson Hamilton wrote: > >> Thanks Rebo! SGTM. >> >> On Tue, Oct 27, 2020 at 11:11 AM Udi Meiri wrote: >> >>> +1 sg! >>> >>> On Tue, Oct 27, 2020 at 10:02 AM

Re: Contributor permissions for Beam Jira tickets

2020-10-27 Thread Pablo Estrada
Welcome Teodor! I've added you as a contributor, and assigned the issue to you. Thanks for looking into this. Your analysis was interesting, and the improvement should benefit many. Best -P. On Tue, Oct 27, 2020 at 2:00 PM Teodor Spæren wrote: > Hey! > > My name is Teodor and I'm writing a

Contributor permissions for Beam Jira tickets

2020-10-27 Thread Teodor Spæren
Hey! My name is Teodor and I'm writing a master thesis comparing the overhead of using Beam versus writing native Flink. I want to contribute fixes for some of the problems I find. So far it's only one, [1], but I would like to assign the ticket to me. My jira username is rhermes, and I'm

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-27 Thread Kenneth Knowles
It seems that many correct things are said on this thread. 1. Elements of a PCollection are immutable. They should be like mathematical values. 2. For performance reasons, the author of a DoFn is responsible to not mutate input elements and also to not mutate outputs once they have been output.

Re: Website Revamp Update - Week 1 (and how to get involved)

2020-10-27 Thread Agnieszka Sell
Hi Tyson, Thank you for your feedback! - We’re thinking about using the mascot –we’ll see how it works with the rest of the design :) - Beam logo will be added on the top of the screen and perhaps also on the hero image – we’re in the process on designing this piece right now. - Thank you

Re: [PROPOSAL] Preparing for Beam 2.26.0 release

2020-10-27 Thread Udi Meiri
+1 sg! On Tue, Oct 27, 2020 at 10:02 AM Robert Burke wrote: > Hello everyone! > > The next Beam release (2.26.0) is scheduled to be cut on November 4th > according to the release calendar [1]. > > I'd like to volunteer myself to handle this release. I plan on cutting the > branch on November

Re: Website Revamp Update - Week 1 (and how to get involved)

2020-10-27 Thread Tyson Hamilton
Hello, I was unable to attend the sprint meeting but had a couple comments. - The Beam mascot isn't on the page anywhere, is this intentional? It would be fun to include it somewhere, it's so cute and fast. - The Beam logo doesn't appear until the footer. Maybe there should be one higher

Transform Logging Issues with Spark/Dataproc in GCP

2020-10-27 Thread Rion Williams
Hi all, Recently, I deployed a very simple Apache Beam pipeline to get some insights into how it behaved executing in Dataproc as opposed to on my local machine. I quickly realized that after executing that any DoFn or transform-level logging didn't appear within the job logs within the Google

[PROPOSAL] Preparing for Beam 2.26.0 release

2020-10-27 Thread Robert Burke
Hello everyone! The next Beam release (2.26.0) is scheduled to be cut on November 4th according to the release calendar [1]. I'd like to volunteer myself to handle this release. I plan on cutting the branch on November 5th (since I've had November 4th booked off for months now) and

Re: Website Revamp Update - Week 1 (and how to get involved)

2020-10-27 Thread Agnieszka Sell
Hi there, Tomorrow at 9 am PT we'll have a second sprint review for the Beam Website Revamp project. If you want to join this meeting please use this link: https://meet.google.com/hrk-ngzu-sun. What you're going to see? - UX design for the home page on mobile devices. - UX design for the

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-27 Thread Reuven Lax
Actually I believe that the Beam model does say that input elements should be immutable. If I remember correctly, the DirectRunner even validates this in unit tests, failing tests if the input elements have been mutated. On Tue, Oct 27, 2020 at 3:49 AM David Morávek wrote: > Hi Teodor, > >

Re: Apache Beam case studies

2020-10-27 Thread Karolina Rosół
Hi Gris, Thanks for bringing this up, I'll also let the users@ list know :-) Karolina Rosół Polidea | Head of Cloud & OSS M: +48 606 630 236 <+48606630236> E: karolina.ro...@polidea.com [image: Polidea] Check out our projects!

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-27 Thread Teodor Spæren
@David, I don't know how the direct runner does the validation, so I'm not sure if we could replicate that to the flink runner without a perf penalty. Your point about writing tests I actually think is an argument for removing this as soon as possible, so the prototype doesn't blow up in

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-27 Thread Jan Lukavský
Hi, I tend to be +1 for the flag, but before that, we might want to have a deeper analysis of the performance impact. I believe the penalty will be (in percentage) much lower in cases of more practical jobs (e.g. having at least one shuffle). @Teodor, would you be willing to provide us with

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-27 Thread David Morávek
you made a really good argument ;) I'm inclined to an experimental opt-in flag that would enable this. It would be great if we could automatically check for violations - kind of a safety net, for mistakes in user code. Just to note, direct runner enforcement may not cover all cases, as it only

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-27 Thread Teodor Spæren
Some more thoughts: As it says on the DirectRunner [1] page, the DirectRunner is meant to check that users don't rely on semantics that are not guaranteed by the Beam model. Programs that rely on the Flink runner deep cloning the inputs between each operator in the pipeline is relying on a

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-27 Thread Teodor Spæren
Hey David, I think I might have worded this poorly, because what I meant is that from what I can see in [1], the BEAM model explicitly states that PCollections should be treated as immutable. The direct runner also tests for this. Do the other runners also protect the user from misusing the

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-27 Thread David Morávek
Hi Teodor, Thanks for bringing this up. This is a known, long standing "issue". Unfortunately there are few things we need to consider: - As you correctly noted, the *Beam model doesn't enforce immutability* of input / output elements, so this is the price. - We* can not break *existing