Re: PROPOSAL: Apache Beam (virtual) meeting: 05/11/2016 08:00 - 11:00 Pacific time
Either works for me. On Tue, 12 Apr 2016 at 22:29 Kenneth Knowles wrote: > Either works for me. Thanks James! > > On Tue, Apr 12, 2016 at 11:31 AM, Amit Sela wrote: > > > Anytime works for me. > > > > On Tue, Apr 12, 2016, 21:24 Jean-Baptiste Onofré > wrote: > > > > > Hi James, > > > > > > 5/4 works for me ! > > > > > > Thanks, > > > Regards > > > JB > > > > > > On 04/12/2016 05:05 PM, James Malone wrote: > > > > Hey JB, > > > > > > > > Sorry for the late reply! That is a good point; apologies I missed > > > noticing > > > > that conflict. For everyone in the community, how would one of the > > > > following alternatives work? > > > > > > > > 5/4/2016 - 8:00 - 11:00 AM Pacific time > > > > -or- > > > > 5/18/2016 - 8:00 - 11:00 AM Pacific time > > > > > > > > Best, > > > > > > > > James > > > > > > > > On Mon, Apr 11, 2016 at 11:17 AM, Lukasz Cwik > > > > > > > wrote: > > > > > > > >> That works for me. > > > >> But it would be best if people just posted when they are available > > > >> depending on the goal/scope of the meeting and then a date is > chosen. > > > >> > > > >> On Sun, Apr 10, 2016 at 9:40 PM, Jean-Baptiste Onofré < > > j...@nanthrax.net> > > > >> wrote: > > > >> > > > >>> OK, what about the week before ApacheCon ? > > > >>> > > > >>> Regards > > > >>> JB > > > >>> > > > >>> > > > >>> On 04/11/2016 04:22 AM, Lukasz Cwik wrote: > > > >>> > > > I will be gone May 14th - 31st so would prefer a date before that. > > > > > > On Fri, Apr 8, 2016 at 10:23 PM, Jean-Baptiste Onofré < > > > j...@nanthrax.net> > > > wrote: > > > > > > Hi James, > > > > > > > > May 11th is during the ApacheCon Vancouver. > > > > > > > > As some Beam current and potential contributors could be busy at > > > > ApacheCon, maybe it's better to postpone to May 18th. > > > > > > > > WDYT ? > > > > > > > > Regards > > > > JB > > > > > > > > > > > > On 04/08/2016 10:37 PM, James Malone wrote: > > > > > > > > Hello everyone, > > > >> > > > >> I'd like to propose holding a meeting in May to discuss a few > > Apache > > > >> Beam > > > >> topics. This could be a good venue to discuss design proposals, > > > gather > > > >> technical feedback, and the state of the Beam community. My > > thinking > > > >> is > > > >> we > > > >> will be able to cover two or three Apache Beam topics in depth > > over > > > >> the > > > >> course of a few hours. > > > >> > > > >> To make the meeting accessible to the community, I propose a > > virtual > > > >> meeting on: > > > >> > > > >> Wednesday May 11th (2016/05/11) > > > >> 8:00 AM - 11:00 AM Pacific > > > >> > > > >> Since time may be limited, I propose agenda items recommended by > > the > > > >> PPMC > > > >> are given preferences. Before the meeting we can finalize the > > method > > > >> used > > > >> for the virtual meeting (like Google hangouts) and the finalized > > > >> agenda. > > > >> I'm also happy to volunteer myself for taking notes and > > coordinating > > > >> the > > > >> event. > > > >> > > > >> Best, > > > >> > > > >> James > > > >> > > > >> > > > >> -- > > > > Jean-Baptiste Onofré > > > > jbono...@apache.org > > > > http://blog.nanthrax.net > > > > Talend - http://www.talend.com > > > > > > > > > > > > > > >>> -- > > > >>> Jean-Baptiste Onofré > > > >>> jbono...@apache.org > > > >>> http://blog.nanthrax.net > > > >>> Talend - http://www.talend.com > > > >>> > > > >> > > > > > > > > > > -- > > > Jean-Baptiste Onofré > > > jbono...@apache.org > > > http://blog.nanthrax.net > > > Talend - http://www.talend.com > > > > > >
Re: A question about windowed values
Values should almost always be part of at least one window. WindowFns should place all elements in at least one window, as values that are in no windows will be dropped when they reach a GroupByKey. Elements in no windows, for example those created by WindowedValue.valueInEmptyWindows(T) are generally an implementation detail of a transform; for example, in the InProcessPipelineRunner, the KV>> elements output by a GroupByKeyOnly are in empty windows - but by the time the element reaches the boundary of the GroupByKey, the elements are reassigned to the appropriate window(s). On Tue, Apr 12, 2016 at 11:44 PM, Amit Sela wrote: > My instinct tells me that if a value does not belong to a specific window > (in time) it's a part of a global window, but if so, what's the role of the > "empty window". When should an element be a "value in an empty window" ? >
Re: TextIO.Read.Bound vs Create
This seems wrong. They should both be in the global window. I think your trouble is https://github.com/apache/incubator-beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java#L472 On Tue, Apr 12, 2016 at 9:43 PM, Amit Sela wrote: > Why input values from *TextIO.Read.Bound *belong to an empty window while > values from *Create* belong in a global window ? > > Thanks, > Amit >
Re: TextIO.Read.Bound vs Create
Yep. And I got a good answer for this one as well. Thanks! On Wed, Apr 13, 2016, 19:00 Kenneth Knowles wrote: > This seems wrong. They should both be in the global window. I think your > trouble is > > https://github.com/apache/incubator-beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java#L472 > > On Tue, Apr 12, 2016 at 9:43 PM, Amit Sela wrote: > > > Why input values from *TextIO.Read.Bound *belong to an empty window while > > values from *Create* belong in a global window ? > > > > Thanks, > > Amit > > >
Re: A question about windowed values
Actually, my above claim isn't as strong as it can be. A value in no windows is considered to not exist. Values that are not assigned to any window can be dropped by a runner at *any time*. A WindowFn *must* assign all elements to at least one window. All elements that are produced by any PTransform (including Sources) must be in a window, potentially the GlobalWindow. On Wed, Apr 13, 2016 at 8:52 AM, Thomas Groh wrote: > Values should almost always be part of at least one window. WindowFns > should place all elements in at least one window, as values that are in no > windows will be dropped when they reach a GroupByKey. > > Elements in no windows, for example those created by > WindowedValue.valueInEmptyWindows(T) are generally an implementation > detail of a transform; for example, in the InProcessPipelineRunner, the KV Iterable>> elements output by a GroupByKeyOnly are in > empty windows - but by the time the element reaches the boundary of the > GroupByKey, the elements are reassigned to the appropriate window(s). > > On Tue, Apr 12, 2016 at 11:44 PM, Amit Sela wrote: > >> My instinct tells me that if a value does not belong to a specific window >> (in time) it's a part of a global window, but if so, what's the role of >> the >> "empty window". When should an element be a "value in an empty window" ? >> > >
Re: A question about windowed values
It is fine to create a WindowedValue carrying no windows when it is a fully reified WindowedValue. It is when it becomes an element in a PCollection that a value must exist within some window. In a PCollection> you can have elements that do not *contain* any windows, but exist *within* some window, probably the global window. But even though I can explain it like that, WindowedValue.valueInEmptyWindows might just be a confusing API that we don't need. It seems there are just 11 files that reference WindowedValue.valueInEmptyWindows [1] that mostly look like they'd be fine with the global window. Kenn [1] https://github.com/apache/incubator-beam/search?p=1&q=valueInEmptyWindows&utf8=%E2%9C%93 On Wed, Apr 13, 2016 at 9:06 AM, Thomas Groh wrote: > Actually, my above claim isn't as strong as it can be. > > A value in no windows is considered to not exist. Values that are not > assigned to any window can be dropped by a runner at *any time*. A WindowFn > *must* assign all elements to at least one window. All elements that are > produced by any PTransform (including Sources) must be in a window, > potentially the GlobalWindow. > > On Wed, Apr 13, 2016 at 8:52 AM, Thomas Groh wrote: > > > Values should almost always be part of at least one window. WindowFns > > should place all elements in at least one window, as values that are in > no > > windows will be dropped when they reach a GroupByKey. > > > > Elements in no windows, for example those created by > > WindowedValue.valueInEmptyWindows(T) are generally an implementation > > detail of a transform; for example, in the InProcessPipelineRunner, the > KV > Iterable>> elements output by a GroupByKeyOnly are in > > empty windows - but by the time the element reaches the boundary of the > > GroupByKey, the elements are reassigned to the appropriate window(s). > > > > On Tue, Apr 12, 2016 at 11:44 PM, Amit Sela > wrote: > > > >> My instinct tells me that if a value does not belong to a specific > window > >> (in time) it's a part of a global window, but if so, what's the role of > >> the > >> "empty window". When should an element be a "value in an empty window" ? > >> > > > > >
Re: PROPOSAL: Apache Beam (virtual) meeting: 05/11/2016 08:00 - 11:00 Pacific time
Hi, 5/4/2016 works for me Regards, Milindu On 13 Apr 2016 1:43 p.m., "Aljoscha Krettek" wrote: > Either works for me. > > On Tue, 12 Apr 2016 at 22:29 Kenneth Knowles > wrote: > > > Either works for me. Thanks James! > > > > On Tue, Apr 12, 2016 at 11:31 AM, Amit Sela > wrote: > > > > > Anytime works for me. > > > > > > On Tue, Apr 12, 2016, 21:24 Jean-Baptiste Onofré > > wrote: > > > > > > > Hi James, > > > > > > > > 5/4 works for me ! > > > > > > > > Thanks, > > > > Regards > > > > JB > > > > > > > > On 04/12/2016 05:05 PM, James Malone wrote: > > > > > Hey JB, > > > > > > > > > > Sorry for the late reply! That is a good point; apologies I missed > > > > noticing > > > > > that conflict. For everyone in the community, how would one of the > > > > > following alternatives work? > > > > > > > > > > 5/4/2016 - 8:00 - 11:00 AM Pacific time > > > > > -or- > > > > > 5/18/2016 - 8:00 - 11:00 AM Pacific time > > > > > > > > > > Best, > > > > > > > > > > James > > > > > > > > > > On Mon, Apr 11, 2016 at 11:17 AM, Lukasz Cwik > > > > > > > > > > wrote: > > > > > > > > > >> That works for me. > > > > >> But it would be best if people just posted when they are available > > > > >> depending on the goal/scope of the meeting and then a date is > > chosen. > > > > >> > > > > >> On Sun, Apr 10, 2016 at 9:40 PM, Jean-Baptiste Onofré < > > > j...@nanthrax.net> > > > > >> wrote: > > > > >> > > > > >>> OK, what about the week before ApacheCon ? > > > > >>> > > > > >>> Regards > > > > >>> JB > > > > >>> > > > > >>> > > > > >>> On 04/11/2016 04:22 AM, Lukasz Cwik wrote: > > > > >>> > > > > I will be gone May 14th - 31st so would prefer a date before > that. > > > > > > > > On Fri, Apr 8, 2016 at 10:23 PM, Jean-Baptiste Onofré < > > > > j...@nanthrax.net> > > > > wrote: > > > > > > > > Hi James, > > > > > > > > > > May 11th is during the ApacheCon Vancouver. > > > > > > > > > > As some Beam current and potential contributors could be busy > at > > > > > ApacheCon, maybe it's better to postpone to May 18th. > > > > > > > > > > WDYT ? > > > > > > > > > > Regards > > > > > JB > > > > > > > > > > > > > > > On 04/08/2016 10:37 PM, James Malone wrote: > > > > > > > > > > Hello everyone, > > > > >> > > > > >> I'd like to propose holding a meeting in May to discuss a few > > > Apache > > > > >> Beam > > > > >> topics. This could be a good venue to discuss design > proposals, > > > > gather > > > > >> technical feedback, and the state of the Beam community. My > > > thinking > > > > >> is > > > > >> we > > > > >> will be able to cover two or three Apache Beam topics in depth > > > over > > > > >> the > > > > >> course of a few hours. > > > > >> > > > > >> To make the meeting accessible to the community, I propose a > > > virtual > > > > >> meeting on: > > > > >> > > > > >> Wednesday May 11th (2016/05/11) > > > > >> 8:00 AM - 11:00 AM Pacific > > > > >> > > > > >> Since time may be limited, I propose agenda items recommended > by > > > the > > > > >> PPMC > > > > >> are given preferences. Before the meeting we can finalize the > > > method > > > > >> used > > > > >> for the virtual meeting (like Google hangouts) and the > finalized > > > > >> agenda. > > > > >> I'm also happy to volunteer myself for taking notes and > > > coordinating > > > > >> the > > > > >> event. > > > > >> > > > > >> Best, > > > > >> > > > > >> James > > > > >> > > > > >> > > > > >> -- > > > > > Jean-Baptiste Onofré > > > > > jbono...@apache.org > > > > > http://blog.nanthrax.net > > > > > Talend - http://www.talend.com > > > > > > > > > > > > > > > > > > >>> -- > > > > >>> Jean-Baptiste Onofré > > > > >>> jbono...@apache.org > > > > >>> http://blog.nanthrax.net > > > > >>> Talend - http://www.talend.com > > > > >>> > > > > >> > > > > > > > > > > > > > -- > > > > Jean-Baptiste Onofré > > > > jbono...@apache.org > > > > http://blog.nanthrax.net > > > > Talend - http://www.talend.com > > > > > > > > > >
Re: A question about windowed values
As Thomas says, the fact that we ever produce values in "no window" is an implementation quirk that should probably be fixed. (IIRC, it's used for the output of a GBK before we've done the group-also-by-windows to figure out what window it really should be in, so "value in unknown windows" would be a better choice). If a WindowFn doesn't assign a value to any windows, the system is free to drop it. There are pros and cons to supporting this degenerate case vs. making it an error. However, this should almost certainly not be in the public API... - Robert On Wed, Apr 13, 2016 at 9:06 AM, Thomas Groh wrote: > Actually, my above claim isn't as strong as it can be. > > A value in no windows is considered to not exist. Values that are not > assigned to any window can be dropped by a runner at *any time*. A WindowFn > *must* assign all elements to at least one window. All elements that are > produced by any PTransform (including Sources) must be in a window, > potentially the GlobalWindow. > > On Wed, Apr 13, 2016 at 8:52 AM, Thomas Groh wrote: > >> Values should almost always be part of at least one window. WindowFns >> should place all elements in at least one window, as values that are in no >> windows will be dropped when they reach a GroupByKey. >> >> Elements in no windows, for example those created by >> WindowedValue.valueInEmptyWindows(T) are generally an implementation >> detail of a transform; for example, in the InProcessPipelineRunner, the KV> Iterable>> elements output by a GroupByKeyOnly are in >> empty windows - but by the time the element reaches the boundary of the >> GroupByKey, the elements are reassigned to the appropriate window(s). >> >> On Tue, Apr 12, 2016 at 11:44 PM, Amit Sela wrote: >> >>> My instinct tells me that if a value does not belong to a specific window >>> (in time) it's a part of a global window, but if so, what's the role of >>> the >>> "empty window". When should an element be a "value in an empty window" ? >>> >> >>
Re: PROPOSAL: Apache Beam (virtual) meeting: 05/11/2016 08:00 - 11:00 Pacific time
Either works for me. On Wed, Apr 13, 2016 at 9:21 AM, Milindu Sanoj Kumarage < agentmili...@gmail.com> wrote: > Hi, > > 5/4/2016 works for me > > Regards, > Milindu > On 13 Apr 2016 1:43 p.m., "Aljoscha Krettek" wrote: > > > Either works for me. > > > > On Tue, 12 Apr 2016 at 22:29 Kenneth Knowles > > wrote: > > > > > Either works for me. Thanks James! > > > > > > On Tue, Apr 12, 2016 at 11:31 AM, Amit Sela > > wrote: > > > > > > > Anytime works for me. > > > > > > > > On Tue, Apr 12, 2016, 21:24 Jean-Baptiste Onofré > > > wrote: > > > > > > > > > Hi James, > > > > > > > > > > 5/4 works for me ! > > > > > > > > > > Thanks, > > > > > Regards > > > > > JB > > > > > > > > > > On 04/12/2016 05:05 PM, James Malone wrote: > > > > > > Hey JB, > > > > > > > > > > > > Sorry for the late reply! That is a good point; apologies I > missed > > > > > noticing > > > > > > that conflict. For everyone in the community, how would one of > the > > > > > > following alternatives work? > > > > > > > > > > > > 5/4/2016 - 8:00 - 11:00 AM Pacific time > > > > > > -or- > > > > > > 5/18/2016 - 8:00 - 11:00 AM Pacific time > > > > > > > > > > > > Best, > > > > > > > > > > > > James > > > > > > > > > > > > On Mon, Apr 11, 2016 at 11:17 AM, Lukasz Cwik > > > > > > > > > > > > > wrote: > > > > > > > > > > > >> That works for me. > > > > > >> But it would be best if people just posted when they are > available > > > > > >> depending on the goal/scope of the meeting and then a date is > > > chosen. > > > > > >> > > > > > >> On Sun, Apr 10, 2016 at 9:40 PM, Jean-Baptiste Onofré < > > > > j...@nanthrax.net> > > > > > >> wrote: > > > > > >> > > > > > >>> OK, what about the week before ApacheCon ? > > > > > >>> > > > > > >>> Regards > > > > > >>> JB > > > > > >>> > > > > > >>> > > > > > >>> On 04/11/2016 04:22 AM, Lukasz Cwik wrote: > > > > > >>> > > > > > I will be gone May 14th - 31st so would prefer a date before > > that. > > > > > > > > > > On Fri, Apr 8, 2016 at 10:23 PM, Jean-Baptiste Onofré < > > > > > j...@nanthrax.net> > > > > > wrote: > > > > > > > > > > Hi James, > > > > > > > > > > > > May 11th is during the ApacheCon Vancouver. > > > > > > > > > > > > As some Beam current and potential contributors could be busy > > at > > > > > > ApacheCon, maybe it's better to postpone to May 18th. > > > > > > > > > > > > WDYT ? > > > > > > > > > > > > Regards > > > > > > JB > > > > > > > > > > > > > > > > > > On 04/08/2016 10:37 PM, James Malone wrote: > > > > > > > > > > > > Hello everyone, > > > > > >> > > > > > >> I'd like to propose holding a meeting in May to discuss a > few > > > > Apache > > > > > >> Beam > > > > > >> topics. This could be a good venue to discuss design > > proposals, > > > > > gather > > > > > >> technical feedback, and the state of the Beam community. My > > > > thinking > > > > > >> is > > > > > >> we > > > > > >> will be able to cover two or three Apache Beam topics in > depth > > > > over > > > > > >> the > > > > > >> course of a few hours. > > > > > >> > > > > > >> To make the meeting accessible to the community, I propose a > > > > virtual > > > > > >> meeting on: > > > > > >> > > > > > >> Wednesday May 11th (2016/05/11) > > > > > >> 8:00 AM - 11:00 AM Pacific > > > > > >> > > > > > >> Since time may be limited, I propose agenda items > recommended > > by > > > > the > > > > > >> PPMC > > > > > >> are given preferences. Before the meeting we can finalize > the > > > > method > > > > > >> used > > > > > >> for the virtual meeting (like Google hangouts) and the > > finalized > > > > > >> agenda. > > > > > >> I'm also happy to volunteer myself for taking notes and > > > > coordinating > > > > > >> the > > > > > >> event. > > > > > >> > > > > > >> Best, > > > > > >> > > > > > >> James > > > > > >> > > > > > >> > > > > > >> -- > > > > > > Jean-Baptiste Onofré > > > > > > jbono...@apache.org > > > > > > http://blog.nanthrax.net > > > > > > Talend - http://www.talend.com > > > > > > > > > > > > > > > > > > > > > > >>> -- > > > > > >>> Jean-Baptiste Onofré > > > > > >>> jbono...@apache.org > > > > > >>> http://blog.nanthrax.net > > > > > >>> Talend - http://www.talend.com > > > > > >>> > > > > > >> > > > > > > > > > > > > > > > > -- > > > > > Jean-Baptiste Onofré > > > > > jbono...@apache.org > > > > > http://blog.nanthrax.net > > > > > Talend - http://www.talend.com > > > > > > > > > > > > > > >
Re: PROPOSAL: Apache Beam (virtual) meeting: 05/11/2016 08:00 - 11:00 Pacific time
Sounds like we have broad consensus on the following: Date: 5/4/2016 Time: 8:00 - 11:00 AM Pacific time Location: Virtual I will submit a PR to update the website ( http://beam.incubator.apache.org/public-meetings/) later today. Best, James On Wed, Apr 13, 2016 at 9:21 AM, Milindu Sanoj Kumarage < agentmili...@gmail.com> wrote: > Hi, > > 5/4/2016 works for me > > Regards, > Milindu > On 13 Apr 2016 1:43 p.m., "Aljoscha Krettek" wrote: > > > Either works for me. > > > > On Tue, 12 Apr 2016 at 22:29 Kenneth Knowles > > wrote: > > > > > Either works for me. Thanks James! > > > > > > On Tue, Apr 12, 2016 at 11:31 AM, Amit Sela > > wrote: > > > > > > > Anytime works for me. > > > > > > > > On Tue, Apr 12, 2016, 21:24 Jean-Baptiste Onofré > > > wrote: > > > > > > > > > Hi James, > > > > > > > > > > 5/4 works for me ! > > > > > > > > > > Thanks, > > > > > Regards > > > > > JB > > > > > > > > > > On 04/12/2016 05:05 PM, James Malone wrote: > > > > > > Hey JB, > > > > > > > > > > > > Sorry for the late reply! That is a good point; apologies I > missed > > > > > noticing > > > > > > that conflict. For everyone in the community, how would one of > the > > > > > > following alternatives work? > > > > > > > > > > > > 5/4/2016 - 8:00 - 11:00 AM Pacific time > > > > > > -or- > > > > > > 5/18/2016 - 8:00 - 11:00 AM Pacific time > > > > > > > > > > > > Best, > > > > > > > > > > > > James > > > > > > > > > > > > On Mon, Apr 11, 2016 at 11:17 AM, Lukasz Cwik > > > > > > > > > > > > > wrote: > > > > > > > > > > > >> That works for me. > > > > > >> But it would be best if people just posted when they are > available > > > > > >> depending on the goal/scope of the meeting and then a date is > > > chosen. > > > > > >> > > > > > >> On Sun, Apr 10, 2016 at 9:40 PM, Jean-Baptiste Onofré < > > > > j...@nanthrax.net> > > > > > >> wrote: > > > > > >> > > > > > >>> OK, what about the week before ApacheCon ? > > > > > >>> > > > > > >>> Regards > > > > > >>> JB > > > > > >>> > > > > > >>> > > > > > >>> On 04/11/2016 04:22 AM, Lukasz Cwik wrote: > > > > > >>> > > > > > I will be gone May 14th - 31st so would prefer a date before > > that. > > > > > > > > > > On Fri, Apr 8, 2016 at 10:23 PM, Jean-Baptiste Onofré < > > > > > j...@nanthrax.net> > > > > > wrote: > > > > > > > > > > Hi James, > > > > > > > > > > > > May 11th is during the ApacheCon Vancouver. > > > > > > > > > > > > As some Beam current and potential contributors could be busy > > at > > > > > > ApacheCon, maybe it's better to postpone to May 18th. > > > > > > > > > > > > WDYT ? > > > > > > > > > > > > Regards > > > > > > JB > > > > > > > > > > > > > > > > > > On 04/08/2016 10:37 PM, James Malone wrote: > > > > > > > > > > > > Hello everyone, > > > > > >> > > > > > >> I'd like to propose holding a meeting in May to discuss a > few > > > > Apache > > > > > >> Beam > > > > > >> topics. This could be a good venue to discuss design > > proposals, > > > > > gather > > > > > >> technical feedback, and the state of the Beam community. My > > > > thinking > > > > > >> is > > > > > >> we > > > > > >> will be able to cover two or three Apache Beam topics in > depth > > > > over > > > > > >> the > > > > > >> course of a few hours. > > > > > >> > > > > > >> To make the meeting accessible to the community, I propose a > > > > virtual > > > > > >> meeting on: > > > > > >> > > > > > >> Wednesday May 11th (2016/05/11) > > > > > >> 8:00 AM - 11:00 AM Pacific > > > > > >> > > > > > >> Since time may be limited, I propose agenda items > recommended > > by > > > > the > > > > > >> PPMC > > > > > >> are given preferences. Before the meeting we can finalize > the > > > > method > > > > > >> used > > > > > >> for the virtual meeting (like Google hangouts) and the > > finalized > > > > > >> agenda. > > > > > >> I'm also happy to volunteer myself for taking notes and > > > > coordinating > > > > > >> the > > > > > >> event. > > > > > >> > > > > > >> Best, > > > > > >> > > > > > >> James > > > > > >> > > > > > >> > > > > > >> -- > > > > > > Jean-Baptiste Onofré > > > > > > jbono...@apache.org > > > > > > http://blog.nanthrax.net > > > > > > Talend - http://www.talend.com > > > > > > > > > > > > > > > > > > > > > > >>> -- > > > > > >>> Jean-Baptiste Onofré > > > > > >>> jbono...@apache.org > > > > > >>> http://blog.nanthrax.net > > > > > >>> Talend - http://www.talend.com > > > > > >>> > > > > > >> > > > > > > > > > > > > > > > > -- > > > > > Jean-Baptiste Onofré > > > > > jbono...@apache.org > > > > > http://blog.nanthrax.net > > > > > Talend - http://www.talend.com > > > > > > > > > > > > > > >
Re: A question about windowed values
First of all, Thanks for the detailed explanation! I can say that from my point of view (as a runner developer) this is definitely confusing, especially discovering that an element in an empty window can be dropped at anytime, so +1 for Robert's comment on not having this public API, and according to Kenneth's lookup it looks like it's not entangled too deep. So I guess #valueInGlobalWindow should be the "go-to" default window (as long as no "real" windows are involved), should we consider making this more clear in the public API ? maybe WindowedValue#defaultValue(T) ? which will probably implement a global window.. just a thought. On Wed, Apr 13, 2016 at 7:29 PM Robert Bradshaw wrote: > As Thomas says, the fact that we ever produce values in "no window" is > an implementation quirk that should probably be fixed. (IIRC, it's > used for the output of a GBK before we've done the > group-also-by-windows to figure out what window it really should be > in, so "value in unknown windows" would be a better choice). > > If a WindowFn doesn't assign a value to any windows, the system is > free to drop it. There are pros and cons to supporting this degenerate > case vs. making it an error. However, this should almost certainly not > be in the public API... > > - Robert > > > On Wed, Apr 13, 2016 at 9:06 AM, Thomas Groh > wrote: > > Actually, my above claim isn't as strong as it can be. > > > > A value in no windows is considered to not exist. Values that are not > > assigned to any window can be dropped by a runner at *any time*. A > WindowFn > > *must* assign all elements to at least one window. All elements that are > > produced by any PTransform (including Sources) must be in a window, > > potentially the GlobalWindow. > > > > On Wed, Apr 13, 2016 at 8:52 AM, Thomas Groh wrote: > > > >> Values should almost always be part of at least one window. WindowFns > >> should place all elements in at least one window, as values that are in > no > >> windows will be dropped when they reach a GroupByKey. > >> > >> Elements in no windows, for example those created by > >> WindowedValue.valueInEmptyWindows(T) are generally an implementation > >> detail of a transform; for example, in the InProcessPipelineRunner, the > KV >> Iterable>> elements output by a GroupByKeyOnly are in > >> empty windows - but by the time the element reaches the boundary of the > >> GroupByKey, the elements are reassigned to the appropriate window(s). > >> > >> On Tue, Apr 12, 2016 at 11:44 PM, Amit Sela > wrote: > >> > >>> My instinct tells me that if a value does not belong to a specific > window > >>> (in time) it's a part of a global window, but if so, what's the role of > >>> the > >>> "empty window". When should an element be a "value in an empty window" > ? > >>> > >> > >> >
Re: PROPOSAL: Apache Beam (virtual) meeting: 05/11/2016 08:00 - 11:00 Pacific time
Should work for me as well! On Wed, Apr 13, 2016 at 7:04 PM, James Malone < jamesmal...@google.com.invalid> wrote: > Sounds like we have broad consensus on the following: > > Date: 5/4/2016 > Time: 8:00 - 11:00 AM Pacific time > Location: Virtual > > I will submit a PR to update the website ( > http://beam.incubator.apache.org/public-meetings/) later today. > > Best, > > James > > On Wed, Apr 13, 2016 at 9:21 AM, Milindu Sanoj Kumarage < > agentmili...@gmail.com> wrote: > > > Hi, > > > > 5/4/2016 works for me > > > > Regards, > > Milindu > > On 13 Apr 2016 1:43 p.m., "Aljoscha Krettek" > wrote: > > > > > Either works for me. > > > > > > On Tue, 12 Apr 2016 at 22:29 Kenneth Knowles > > > wrote: > > > > > > > Either works for me. Thanks James! > > > > > > > > On Tue, Apr 12, 2016 at 11:31 AM, Amit Sela > > > wrote: > > > > > > > > > Anytime works for me. > > > > > > > > > > On Tue, Apr 12, 2016, 21:24 Jean-Baptiste Onofré > > > > wrote: > > > > > > > > > > > Hi James, > > > > > > > > > > > > 5/4 works for me ! > > > > > > > > > > > > Thanks, > > > > > > Regards > > > > > > JB > > > > > > > > > > > > On 04/12/2016 05:05 PM, James Malone wrote: > > > > > > > Hey JB, > > > > > > > > > > > > > > Sorry for the late reply! That is a good point; apologies I > > missed > > > > > > noticing > > > > > > > that conflict. For everyone in the community, how would one of > > the > > > > > > > following alternatives work? > > > > > > > > > > > > > > 5/4/2016 - 8:00 - 11:00 AM Pacific time > > > > > > > -or- > > > > > > > 5/18/2016 - 8:00 - 11:00 AM Pacific time > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > James > > > > > > > > > > > > > > On Mon, Apr 11, 2016 at 11:17 AM, Lukasz Cwik > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > >> That works for me. > > > > > > >> But it would be best if people just posted when they are > > available > > > > > > >> depending on the goal/scope of the meeting and then a date is > > > > chosen. > > > > > > >> > > > > > > >> On Sun, Apr 10, 2016 at 9:40 PM, Jean-Baptiste Onofré < > > > > > j...@nanthrax.net> > > > > > > >> wrote: > > > > > > >> > > > > > > >>> OK, what about the week before ApacheCon ? > > > > > > >>> > > > > > > >>> Regards > > > > > > >>> JB > > > > > > >>> > > > > > > >>> > > > > > > >>> On 04/11/2016 04:22 AM, Lukasz Cwik wrote: > > > > > > >>> > > > > > > I will be gone May 14th - 31st so would prefer a date before > > > that. > > > > > > > > > > > > On Fri, Apr 8, 2016 at 10:23 PM, Jean-Baptiste Onofré < > > > > > > j...@nanthrax.net> > > > > > > wrote: > > > > > > > > > > > > Hi James, > > > > > > > > > > > > > > May 11th is during the ApacheCon Vancouver. > > > > > > > > > > > > > > As some Beam current and potential contributors could be > busy > > > at > > > > > > > ApacheCon, maybe it's better to postpone to May 18th. > > > > > > > > > > > > > > WDYT ? > > > > > > > > > > > > > > Regards > > > > > > > JB > > > > > > > > > > > > > > > > > > > > > On 04/08/2016 10:37 PM, James Malone wrote: > > > > > > > > > > > > > > Hello everyone, > > > > > > >> > > > > > > >> I'd like to propose holding a meeting in May to discuss a > > few > > > > > Apache > > > > > > >> Beam > > > > > > >> topics. This could be a good venue to discuss design > > > proposals, > > > > > > gather > > > > > > >> technical feedback, and the state of the Beam community. > My > > > > > thinking > > > > > > >> is > > > > > > >> we > > > > > > >> will be able to cover two or three Apache Beam topics in > > depth > > > > > over > > > > > > >> the > > > > > > >> course of a few hours. > > > > > > >> > > > > > > >> To make the meeting accessible to the community, I > propose a > > > > > virtual > > > > > > >> meeting on: > > > > > > >> > > > > > > >> Wednesday May 11th (2016/05/11) > > > > > > >> 8:00 AM - 11:00 AM Pacific > > > > > > >> > > > > > > >> Since time may be limited, I propose agenda items > > recommended > > > by > > > > > the > > > > > > >> PPMC > > > > > > >> are given preferences. Before the meeting we can finalize > > the > > > > > method > > > > > > >> used > > > > > > >> for the virtual meeting (like Google hangouts) and the > > > finalized > > > > > > >> agenda. > > > > > > >> I'm also happy to volunteer myself for taking notes and > > > > > coordinating > > > > > > >> the > > > > > > >> event. > > > > > > >> > > > > > > >> Best, > > > > > > >> > > > > > > >> James > > > > > > >> > > > > > > >> > > > > > > >> -- > > > > > > > Jean-Baptiste Onofré > > > > > > > jbono...@apache.org > > > > > > > http://blog.nanthrax.net > > > > > > > Talend - http://www.talend.com > > > > > > > > > > > > > > > > > > > > > > > > > > >>> -- > > > > > > >>> Jean-Baptiste Onofré > > > > > > >>> jbono...@apache.org > > > > >
Re: A question about windowed values
Good thread. Filed as https://issues.apache.org/jira/browse/BEAM-191. On Wed, Apr 13, 2016 at 10:08 AM, Amit Sela wrote: > First of all, Thanks for the detailed explanation! > > I can say that from my point of view (as a runner developer) this is > definitely confusing, especially discovering that an element in an empty > window can be dropped at anytime, so +1 for Robert's comment on not having > this public API, and according to Kenneth's lookup it looks like it's not > entangled too deep. > > So I guess #valueInGlobalWindow should be the "go-to" default window (as > long as no "real" windows are involved), should we consider making this > more clear in the public API ? maybe WindowedValue#defaultValue(T) ? > which will probably implement a global window.. just a thought. > > On Wed, Apr 13, 2016 at 7:29 PM Robert Bradshaw > > wrote: > > > As Thomas says, the fact that we ever produce values in "no window" is > > an implementation quirk that should probably be fixed. (IIRC, it's > > used for the output of a GBK before we've done the > > group-also-by-windows to figure out what window it really should be > > in, so "value in unknown windows" would be a better choice). > > > > If a WindowFn doesn't assign a value to any windows, the system is > > free to drop it. There are pros and cons to supporting this degenerate > > case vs. making it an error. However, this should almost certainly not > > be in the public API... > > > > - Robert > > > > > > On Wed, Apr 13, 2016 at 9:06 AM, Thomas Groh > > wrote: > > > Actually, my above claim isn't as strong as it can be. > > > > > > A value in no windows is considered to not exist. Values that are not > > > assigned to any window can be dropped by a runner at *any time*. A > > WindowFn > > > *must* assign all elements to at least one window. All elements that > are > > > produced by any PTransform (including Sources) must be in a window, > > > potentially the GlobalWindow. > > > > > > On Wed, Apr 13, 2016 at 8:52 AM, Thomas Groh wrote: > > > > > >> Values should almost always be part of at least one window. WindowFns > > >> should place all elements in at least one window, as values that are > in > > no > > >> windows will be dropped when they reach a GroupByKey. > > >> > > >> Elements in no windows, for example those created by > > >> WindowedValue.valueInEmptyWindows(T) are generally an implementation > > >> detail of a transform; for example, in the InProcessPipelineRunner, > the > > KV > >> Iterable>> elements output by a GroupByKeyOnly are in > > >> empty windows - but by the time the element reaches the boundary of > the > > >> GroupByKey, the elements are reassigned to the appropriate window(s). > > >> > > >> On Tue, Apr 12, 2016 at 11:44 PM, Amit Sela > > wrote: > > >> > > >>> My instinct tells me that if a value does not belong to a specific > > window > > >>> (in time) it's a part of a global window, but if so, what's the role > of > > >>> the > > >>> "empty window". When should an element be a "value in an empty > window" > > ? > > >>> > > >> > > >> > > >
Re: [jira] [Commented] (BEAM-190) Dead-letter drop for bad BigQuery records
I thought that we were under the impression that rather than losing data it's likely better to update your pipeline to handle these? On Wed, Apr 13, 2016 at 10:59 AM, Luke Cwik (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/BEAM-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239701#comment-15239701 > ] > > Luke Cwik commented on BEAM-190: > > > I believe this can easily extend beyond BigQuery to having a dead letter > feature for failing DoFns of any kind. > > > Dead-letter drop for bad BigQuery records > > - > > > > Key: BEAM-190 > > URL: https://issues.apache.org/jira/browse/BEAM-190 > > Project: Beam > > Issue Type: Bug > > Components: runner-core > >Reporter: Mark Shields > >Assignee: Frances Perry > > > > If a BigQuery insert fails for data-specific rather than structural > reasons (eg cannot parse a date) then the bundle will be retried > indefinitely, first by BigQueryTableInserter.insertAll then by the overall > production retry logic of the underlying runner. > > Better would be to allow customer to specify a dead-letter store for > records such as those so that overall processing can continue while bad > records are quarantined. > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) >
Re: Hive Runner
Great... Please share the details and I am open to help ./Zahoor@iPhone > On 12-Apr-2016, at 10:23 PM, Jean-Baptiste Onofré wrote: > > Hi, > > yes, I started the MapReduce runner. I can share where I am with you. > > Regards > JB > >> On 04/12/2016 06:19 PM, Zahoor Mohamed J wrote: >> That brings me to MapReduce runner... Any one working on that... Iam >> interested in helping and learning here.. Any pointers to start looking at >> To design a MapReduce runner? >> >> ./Zahoor@iPhone >> >>> On 12-Apr-2016, at 7:42 PM, Jean-Baptiste Onofré wrote: >>> >>> We can imagine to translate some Fn (DoPar) as Hive/SQL statements. I don't >>> think it's super interesting, but why not. >>> >>> On the other hand, definitely, we will provide an IO for Hive. >>> >>> Regards >>> JB >>> On 04/12/2016 04:04 PM, Aljoscha Krettek wrote: Hi, what do you mean by Hive Runner? AFAIK Hive provides an SQL like interface to data while execution is handled either by a MapReduce backend or the newer Tez backend. Therefore I don't think it makes sense to put Beam on Hive. Cheers, Aljoscha > On Tue, 12 Apr 2016 at 15:38 Jean-Baptiste Onofré > wrote: > > Hi, > > you are right: for now, we have runners for spark, flink, google cloud > platform. > > Some work are in progress to provide MapReduce, Gearpump runners. And > we're also preparing a Runner API to simplify the way of writing runners. > > On the other hand, we will improve the website to provide a better > visibility on the Beam support (current and coming runners, IOs, > SDKs/DSLs). > > If you are interested to work on a Hive runner, please let me know, we > love contribution ! > > Thanks, > Regards > JB > >> On 04/12/2016 03:20 PM, Ly, Kiet wrote: >> I didn't see Hive runner in Beam. Is there a plan for Hive runner > component? >> >> Confidentiality Notice:: This email, including attachments, may include > non-public, proprietary, confidential or legally privileged information. > If you are not an intended recipient or an authorized agent of an intended > recipient, you are hereby notified that any dissemination, distribution or > copying of the information contained in or transmitted with this e-mail is > unauthorized and strictly prohibited. If you have received this email in > error, please notify the sender by replying to this message and > permanently > delete this e-mail, its attachments, and any copies of it immediately. > You > should not retain, copy or use this e-mail or any attachment for any > purpose, nor disclose all or any part of the contents to any other person. > Thank you. > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >>> >>> -- >>> Jean-Baptiste Onofré >>> jbono...@apache.org >>> http://blog.nanthrax.net >>> Talend - http://www.talend.com > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com
Re: Massive package renaming coming
I've just merged a pull request that includes this project-wide renaming of Java packages. (Long live Beam!) At this point, pending pull requests may need to be rebased. We'll try to fix the Cloud Dataflow runner as soon as possible, but it might take a little bit to complete that. There's still a lot left to re-organize in Beam, but I'd expect future changes not to be this far-reaching. A special thanks goes to Ben Chambers for pulling off several tricks to get this done quickly and effectively. On Tue, Apr 12, 2016 at 9:38 PM, Jean-Baptiste Onofré wrote: > Hi Davor, > > +1 ! > > I already updated the PullRequest for annotations package. > > Thanks ! > Regards > JB > > > On 04/13/2016 03:23 AM, Davor Bonaci wrote: > >> We are preparing to do a massive, project-wide package rename from >> "com.google.cloud.dataflow" to "org.apache.beam". At the earliest, this >> could occur sometime tomorrow afternoon (Pacific time). >> >> Unfortunately, there's no way to do this without affecting ongoing work. >> We'll try to do it as quickly as possible to minimize such impact. >> >> We'll ensure that existing automated testing passes before merging the >> change, with the exception of integration coverage with the Google Cloud >> Dataflow service. We expect that the code in Beam's master will not work >> against Cloud Dataflow for a little bit -- we'll accept this breakage on a >> one-time basis, and try to recover it as soon as possible thereafter. >> >> If anybody sees any issues with this plan, I'd love to hear it. >> >> This is one of those mandatory things we've been delaying for a while now. >> Of course, there are more such things to come, but hopefully none that are >> this wide. >> >> Thanks! >> >> > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >
Re: Massive package renaming coming
Great work guys ! A big step forward. Thanks Regards JB On 04/14/2016 07:43 AM, Davor Bonaci wrote: I've just merged a pull request that includes this project-wide renaming of Java packages. (Long live Beam!) At this point, pending pull requests may need to be rebased. We'll try to fix the Cloud Dataflow runner as soon as possible, but it might take a little bit to complete that. There's still a lot left to re-organize in Beam, but I'd expect future changes not to be this far-reaching. A special thanks goes to Ben Chambers for pulling off several tricks to get this done quickly and effectively. On Tue, Apr 12, 2016 at 9:38 PM, Jean-Baptiste Onofré wrote: Hi Davor, +1 ! I already updated the PullRequest for annotations package. Thanks ! Regards JB On 04/13/2016 03:23 AM, Davor Bonaci wrote: We are preparing to do a massive, project-wide package rename from "com.google.cloud.dataflow" to "org.apache.beam". At the earliest, this could occur sometime tomorrow afternoon (Pacific time). Unfortunately, there's no way to do this without affecting ongoing work. We'll try to do it as quickly as possible to minimize such impact. We'll ensure that existing automated testing passes before merging the change, with the exception of integration coverage with the Google Cloud Dataflow service. We expect that the code in Beam's master will not work against Cloud Dataflow for a little bit -- we'll accept this breakage on a one-time basis, and try to recover it as soon as possible thereafter. If anybody sees any issues with this plan, I'd love to hear it. This is one of those mandatory things we've been delaying for a while now. Of course, there are more such things to come, but hopefully none that are this wide. Thanks! -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com