Re: Beam Summits!

2019-01-07 Thread Reza Rokni
Hya, Yes. although need to work through timing, maybe something around the later part of the year. Let me do some out reach. Cheers Reza On Mon, 7 Jan 2019 at 15:27, Matthias Baetens wrote: > @Reza Rokni : I was planning to reach out to you for the > Asian edition of the Summit - do you

Re: Single threaded processing

2019-01-07 Thread Pablo Estrada
I've just learned that there are these transforms that should be useful: p.apply(FileIO.match().filepattern(...)) .apply(WithKeys.of((Void) null)) .apply(GroupByKey.create()) .apply(Values.create()) .apply(Flatten.itearables()) .apply(FileIO.readMatches()) .apply(ParDo.of(new

Re: Single threaded processing

2019-01-07 Thread Pablo Estrada
Hi Matt, I am much more familiar with Python, so I usually answer questions using that SDK. Also, it's quicker to type a fully detailed pipeline on an email and the SDKs are similar enough that it should not be too difficult to translate to Java from an IDE. To your questions: 1. Grouping like

Re: Single threaded processing

2019-01-07 Thread Matt Casters
Hi Pablo, Apologies, I thought the cases were very simple and clear. Obviously I should have also mentioned I'm in Java land, not used to the script kiddy stuff :-) On the output side: thanks for the grouping "trick". However, doesn't that mean that all rows will end up in a single in-memory

Re: Single threaded processing

2019-01-07 Thread Pablo Estrada
Hi Matt, is this computation running as part of a larger pipeline that does run some parallel processing? Otherwise, it's odd that it needs to run on Beam. Nonetheless, you can certainly do this with a pipeline that has a single element. Here's what that looks like in python: p |

Re: introducing streamy-db

2019-01-07 Thread jan . doms
Hi all, I think I'm going to stick with scala and scio :-). I'm curious though: why is there a hard coupling between scio and beam versions? I was hoping to use latest scio 0.7.0-beta2 with beam 2.9.0 but that appears to get blocked, which was unexpected to me. Regarding the suggestion to add

Single threaded processing

2019-01-07 Thread Matt Casters
Hi Beam! There's a bunch of stuff that I would like to support and it's probably something silly but I couldn't find it immediately ... or I'm completely dim and making too much of certain things. The thing is, sometimes you just want to do a single threaded operations. For example, we sometimes

Re: Beam + Dataflow + Go article

2019-01-07 Thread Robert Burke
Thank you for this! You're on the cutting edge of known issues around the SDK, and why we still call it experimental. We are looking to add a coder registry in the near term, please review for if it will

Re: introducing streamy-db

2019-01-07 Thread Gleb Kanterov
Agree with Max that scio is lagging behind. However, it also has features that significantly reduce boilerplate, and even improve performance. For instance, the latest version (0.7.0) automatically derives binary coders for case classes using macro at compile-time, that is a way better than

Re: introducing streamy-db

2019-01-07 Thread Maximilian Michels
Interesting project, Jan! I think we could add your project to this page: https://beam.apache.org/community/integrations/ The benefit of using the Java DSL would be to be able to directly track Beam. The Scio Scala DSL usually lags a bit behind. But since you probably don't require the latest