Re: [DISCUSS] Beam data plane serialization tech

2016-06-17 Thread Lukasz Cwik
In the Runner API proposal doc, there are 10+ different types with several fields each. Is it important to have a code generator for the schema? * simplify the SDK development process * reduce errors due to differences in custom implementation I'm not familiar with tool(s) which can take a JSON sc

Re: Using GroupAlsoByWindowViaWindowSetDoFn for stream of input element

2016-06-17 Thread Thomas Groh
Generally, the above code snippet will work, producing (after trigger firing) an output Iterable containing all of the input elements. It may be notable that timers (and TimerInternals) are also per-key, so that interface must also be updated per element. By specifying the ReduceFn of the ReduceFn

Re: Using GroupAlsoByWindowViaWindowSetDoFn for stream of input element

2016-06-17 Thread Thomas Weise
The source for my windowed groupByKey experiment is here: https://github.com/tweise/incubator-beam/blob/master/runners/apex/src/main/java/org/apache/beam/runners/apex/translators/functions/ApexGroupByKeyOperator.java The result is Iterable. In cases such as counting, what is the recommended way t

Re: [NOTICE] Change on Filter

2016-06-17 Thread Frances Perry
Release notes for each release are being tracked in JIRA. For example: https://issues.apache.org/jira/browse/BEAM/fixforversion/12335764/ Davor is planning to send a follow up email about how we use this process. And as we redo the website layout, we should figure out how to surface this informatio

Using GroupAlsoByWindowViaWindowSetDoFn for stream of input element

2016-06-17 Thread Thomas Weise
Hi, I'm looking into using the GroupAlsoByWindowViaWindowSetDoFn to accumulate the windowed state with the elements arriving one by one (stream). Once the window is complete, I would like to emit an Iterable or another form of aggregation of the elements. Is the following supposed to lead to merg

Re: [DISCUSS] Beam data plane serialization tech

2016-06-17 Thread Amit Sela
+1 on Aljoscha comment, not sure where's the benefit in having a "schematic" serialization. I know that Spark and I think Flink as well, use Kryo for serialization (to be accurate it's Chill for Spark) and I found it ver

Re: [DISCUSS] Beam data plane serialization tech

2016-06-17 Thread Aljoscha Krettek
Hi, am I correct in assuming that the transmitted envelopes would mostly contain coder-serialized values? If so, wouldn't the header of an envelope just be the number of contained bytes and number of values? I'm probably missing something but with these assumptions I don't see the benefit of using

Re: [NOTICE] Change on Filter

2016-06-17 Thread Jean-Baptiste Onofré
So it will go in the RELEASE NOTES for next release: fair enough. I was just a bit surprised as I missed this jira. Thanks !RegardsJB  Sent from my Samsung device Original message From: Aljoscha Krettek Date: 17/06/2016 10:58 (GMT+01:00) To: dev@beam.incubator.apach

Re: [NOTICE] Change on Filter

2016-06-17 Thread Aljoscha Krettek
There has been an issue about this for a while now: https://issues.apache.org/jira/browse/BEAM-234 On Fri, 17 Jun 2016 at 09:55 Jean-Baptiste Onofré wrote: > Hi Ismaël, > > I didn't talk a change between Dataflow SDK and Beam, I'm talking about > a change between two Beam SNAPSHOTs ;) > > For th

Re: [NOTICE] Change on Filter

2016-06-17 Thread Jean-Baptiste Onofré
Hi Ismaël, I didn't talk a change between Dataflow SDK and Beam, I'm talking about a change between two Beam SNAPSHOTs ;) For the naming of the DirectRunner, I saw it also, and we should align the runners naming (we have FlinkPipelineRunner and SparkPipelineRunner). I sent an e-mail to Davor

Re: [thread fork] Apache Beam & Google Cloud Dataflow

2016-06-17 Thread Ismaël Mejía
Hello Frances, Thanks for clearing this out. I hope you (google) can make somehow this official (maybe in the FAQ too), the effect that users can 'experiment' to move their code bases into Beam (without support until the official release). Anyway it is great to know that this works (at least from

Re: [NOTICE] Change on Filter

2016-06-17 Thread Ismaël Mejía
Do we have a list of breaking changes (from the Google Dataflow SDK to Beam), because this is going to be important considering other recent breaking changes, for example this two that I found yesterday too: DirectPipelineRunner -> DirectRunner DoFnTester.processBatch -> DoFnTester.processBundle (

[NOTICE] Change on Filter

2016-06-17 Thread Jean-Baptiste Onofré
Hi guys, I tested the latest Beam SNAPSHOT this morning and a code which was working yesterday is not broken with the last changes. I'm using a filter by predicate: .apply("Filtering", Filter.byPredicate(new SerializableFunction() { public Boolean apply(S

Re: [thread fork] Apache Beam & Google Cloud Dataflow

2016-06-17 Thread Jean-Baptiste Onofré
Hi Frances, thanks for the details (and I like your Google hat ;)). I was more talking "technically speaking" ;) Regards JB On 06/17/2016 07:21 AM, Frances Perry wrote: With my Google employee hat on, I'd like to soften that claim a little ;-) Currently, the Beam SDK runs again Google Cloud