Re: Using GroupAlsoByWindowViaWindowSetDoFn for stream of input element

2016-06-17 Thread Thomas Weise
The source for my windowed groupByKey experiment is here: https://github.com/tweise/incubator-beam/blob/master/runners/apex/src/main/java/org/apache/beam/runners/apex/translators/functions/ApexGroupByKeyOperator.java The result is Iterable. In cases such as counting, what is the recommended way

Re: [NOTICE] Change on Filter

2016-06-17 Thread Frances Perry
Release notes for each release are being tracked in JIRA. For example: https://issues.apache.org/jira/browse/BEAM/fixforversion/12335764/ Davor is planning to send a follow up email about how we use this process. And as we redo the website layout, we should figure out how to surface this

Using GroupAlsoByWindowViaWindowSetDoFn for stream of input element

2016-06-17 Thread Thomas Weise
Hi, I'm looking into using the GroupAlsoByWindowViaWindowSetDoFn to accumulate the windowed state with the elements arriving one by one (stream). Once the window is complete, I would like to emit an Iterable or another form of aggregation of the elements. Is the following supposed to lead to

Re: [DISCUSS] Beam data plane serialization tech

2016-06-17 Thread Amit Sela
+1 on Aljoscha comment, not sure where's the benefit in having a "schematic" serialization. I know that Spark and I think Flink as well, use Kryo for serialization (to be accurate it's Chill for Spark) and I found it

Re: [DISCUSS] Beam data plane serialization tech

2016-06-17 Thread Aljoscha Krettek
Hi, am I correct in assuming that the transmitted envelopes would mostly contain coder-serialized values? If so, wouldn't the header of an envelope just be the number of contained bytes and number of values? I'm probably missing something but with these assumptions I don't see the benefit of using

Re: [NOTICE] Change on Filter

2016-06-17 Thread Jean-Baptiste Onofré
So it will go in the RELEASE NOTES for next release: fair enough. I was just a bit surprised as I missed this jira. Thanks !RegardsJB  Sent from my Samsung device Original message From: Aljoscha Krettek Date: 17/06/2016 10:58 (GMT+01:00) To:

Re: [NOTICE] Change on Filter

2016-06-17 Thread Aljoscha Krettek
There has been an issue about this for a while now: https://issues.apache.org/jira/browse/BEAM-234 On Fri, 17 Jun 2016 at 09:55 Jean-Baptiste Onofré wrote: > Hi Ismaël, > > I didn't talk a change between Dataflow SDK and Beam, I'm talking about > a change between two Beam

Re: [NOTICE] Change on Filter

2016-06-17 Thread Jean-Baptiste Onofré
Hi Ismaël, I didn't talk a change between Dataflow SDK and Beam, I'm talking about a change between two Beam SNAPSHOTs ;) For the naming of the DirectRunner, I saw it also, and we should align the runners naming (we have FlinkPipelineRunner and SparkPipelineRunner). I sent an e-mail to

Re: [NOTICE] Change on Filter

2016-06-17 Thread Ismaël Mejía
Do we have a list of breaking changes (from the Google Dataflow SDK to Beam), because this is going to be important considering other recent breaking changes, for example this two that I found yesterday too: DirectPipelineRunner -> DirectRunner DoFnTester.processBatch -> DoFnTester.processBundle

[NOTICE] Change on Filter

2016-06-17 Thread Jean-Baptiste Onofré
Hi guys, I tested the latest Beam SNAPSHOT this morning and a code which was working yesterday is not broken with the last changes. I'm using a filter by predicate: .apply("Filtering", Filter.byPredicate(new SerializableFunction() {

Re: [thread fork] Apache Beam & Google Cloud Dataflow

2016-06-17 Thread Jean-Baptiste Onofré
Hi Frances, thanks for the details (and I like your Google hat ;)). I was more talking "technically speaking" ;) Regards JB On 06/17/2016 07:21 AM, Frances Perry wrote: With my Google employee hat on, I'd like to soften that claim a little ;-) Currently, the Beam SDK runs again Google