Re: Joining PCollections to aggregates of themselves

2019-10-11 Thread Kenneth Knowles
This seems a great example of use of stateful DoFn. It has essentially the same structure as the example on the Beam blog but is more meaningful. Kenn On Fri, Oct 11, 2019 at 12:38 PM Robert Bradshaw wrote: > OK, the only way to do this would be via a non-determanistic stateful > DoFn that

Re: Joining PCollections to aggregates of themselves

2019-10-11 Thread Robert Bradshaw
OK, the only way to do this would be via a non-determanistic stateful DoFn that buffers elements as they come in and computes averages by looking at the buffer each time. This could also be represented with an extension to window merging and a join, where the trigger would be explicitly used to

Re: ETL with Beam?

2019-10-11 Thread Robert Bradshaw
These can be externalized as PTransforms. E.g. the generic ETL pipeline could just be written pipeline .appy(SomeExtractPTransform()) // aka Source .apply(SomeTransformPTransform()) .apply(SomeLoadPTransform()) // aka Sink Any and all of these PTransforms may be composite (i.e

Re: ETL with Beam?

2019-10-11 Thread Steve973
The real benefit of a good ETL framework is being able to externalize your extraction and transformation mappings. If I didn't have to write that part, that would be really cool! On Fri, Oct 11, 2019 at 1:28 PM Robert Bradshaw wrote: > I would like to call out that Beam itself can be directly

Re: ETL with Beam?

2019-10-11 Thread Robert Bradshaw
I would like to call out that Beam itself can be directly used for ETL, no extra framework required (not to say that both of these frameworks don't provide additional value, e.g. GUI-style construction of pipelines). On Fri, Oct 11, 2019 at 9:29 AM Ryan Skraba wrote: > > Hello! Talend has a

Re: ETL with Beam?

2019-10-11 Thread Ryan Skraba
Hello! Talend has a big data ETL product in the cloud called Pipeline Designer, entirely powered by Beam. There was a talk at Beam Summit 2018 (https://www.youtube.com/watch?v=1AlEGUtiQek), but unfortunately the live demo wasn't captured in the video. You can find other videos of Pipeline

Re: Limited join with stop condition

2019-10-11 Thread Alexey Romanenko
Many thanks for your ideas, everybody, I really appreciate it. I’m going to play with Stateful DoFn and see if it will work for us. > And I have to ask, though, can you build indices instead of brute force for > the join? Answering your question, Kenn. Yes, potentially, we can build indices for

Re: Joining PCollections to aggregates of themselves

2019-10-11 Thread Sam Stephens
On 2019/10/10 18:23:46, Eugene Kirpichov wrote: > " input elements can pass through the Joiner DoFn before the sideInput > corresponding to that element is present" > > I don't think this is correct. Runners will evaluate a DoFn with side > inputs on elements in a given window only after all

Re: Feedback on how we use Apache Beam in my company

2019-10-11 Thread Pierre Vanacker
Nice, thanks. I just registered, see you there ! Pierre De : Alexey Romanenko Répondre à : "user@beam.apache.org" Date : vendredi 11 octobre 2019 à 15:29 À : "user@beam.apache.org" Cc : dev Objet : Re: Feedback on how we use Apache Beam in my company Hi Pierre, If you are in Paris region

Re: Feedback on how we use Apache Beam in my company

2019-10-11 Thread Alexey Romanenko
Hi Pierre, If you are in Paris region (I can guess because of Dailymotion name =) ) then it would be great to chat about that at next (2nd) Paris Beam meetup, which will be held very soon, October 17th. https://www.meetup.com/Paris-Apache-Beam-Meetup/events/264545288/

Re: Feedback on how we use Apache Beam in my company

2019-10-11 Thread Pierre Vanacker
Thanks Etienne & Matthias ! Why not, it kinda depends on the location :) What meetup / summit do you have in mind ? Pierre De : Matthias Baetens Répondre à : "user@beam.apache.org" Date : vendredi 11 octobre 2019 à 09:45 À : "user@beam.apache.org" Cc : dev Objet : Re: Feedback on how we

Re: ETL with Beam?

2019-10-11 Thread Steve973
Thank you for your reply. I will check it out! I'm in the evaluation phase, especially since I have some time before I have to implement all of this. On Fri, Oct 11, 2019 at 3:25 AM Dan wrote: > I'm not sure if this will help but kettle runs on beam too. > >

Re: Feedback on how we use Apache Beam in my company

2019-10-11 Thread Alex Van Boxel
Great writeup. You can add an additional benefit of docker vs templates: Dynamically reconfigure/rebuild your pipelines from external parameters (example arguments), iso only using the Value placeholders. _/ _/ Alex Van Boxel On Wed, Oct 9, 2019 at 4:55 PM Pierre Vanacker <

Re: Feedback on how we use Apache Beam in my company

2019-10-11 Thread Matthias Baetens
This is great, Pierre! Thank you for sharing, very interesting. Would you and your team be interested to talk about your use case at a meetup (or Summit) in the future? :) All the best, Matthias On Wed, 9 Oct 2019 at 15:59, Etienne Chauchot wrote: > Very nice ! > > Thanks > > ccing dev list >

Re: ETL with Beam?

2019-10-11 Thread Dan
I'm not sure if this will help but kettle runs on beam too. https://github.com/mattcasters/kettle-beam https://youtu.be/vgpGrQJnqkM Depends on your use case but kettle rocks for etl. Dan Sent from my phone On Thu, 10 Oct 2019, 10:12 pm Steve973, wrote: > Hello, all. I still have not been