Re: Massive package renaming coming

2016-04-13 Thread Jean-Baptiste Onofré

Great work guys !

A big step forward.

Thanks
Regards
JB

On 04/14/2016 07:43 AM, Davor Bonaci wrote:

I've just merged a pull request that includes this project-wide renaming of
Java packages. (Long live Beam!)

At this point, pending pull requests may need to be rebased. We'll try to
fix the Cloud Dataflow runner as soon as possible, but it might take a
little bit to complete that.

There's still a lot left to re-organize in Beam, but I'd expect future
changes not to be this far-reaching.

A special thanks goes to Ben Chambers for pulling off several tricks to get
this done quickly and effectively.

On Tue, Apr 12, 2016 at 9:38 PM, Jean-Baptiste Onofré 
wrote:


Hi Davor,

+1 !

I already updated the PullRequest for annotations package.

Thanks !
Regards
JB


On 04/13/2016 03:23 AM, Davor Bonaci wrote:


We are preparing to do a massive, project-wide package rename from
"com.google.cloud.dataflow" to "org.apache.beam". At the earliest, this
could occur sometime tomorrow afternoon (Pacific time).

Unfortunately, there's no way to do this without affecting ongoing work.
We'll try to do it as quickly as possible to minimize such impact.

We'll ensure that existing automated testing passes before merging the
change, with the exception of integration coverage with the Google Cloud
Dataflow service. We expect that the code in Beam's master will not work
against Cloud Dataflow for a little bit -- we'll accept this breakage on a
one-time basis, and try to recover it as soon as possible thereafter.

If anybody sees any issues with this plan, I'd love to hear it.

This is one of those mandatory things we've been delaying for a while now.
Of course, there are more such things to come, but hopefully none that are
this wide.

Thanks!



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Massive package renaming coming

2016-04-13 Thread Davor Bonaci
I've just merged a pull request that includes this project-wide renaming of
Java packages. (Long live Beam!)

At this point, pending pull requests may need to be rebased. We'll try to
fix the Cloud Dataflow runner as soon as possible, but it might take a
little bit to complete that.

There's still a lot left to re-organize in Beam, but I'd expect future
changes not to be this far-reaching.

A special thanks goes to Ben Chambers for pulling off several tricks to get
this done quickly and effectively.

On Tue, Apr 12, 2016 at 9:38 PM, Jean-Baptiste Onofré 
wrote:

> Hi Davor,
>
> +1 !
>
> I already updated the PullRequest for annotations package.
>
> Thanks !
> Regards
> JB
>
>
> On 04/13/2016 03:23 AM, Davor Bonaci wrote:
>
>> We are preparing to do a massive, project-wide package rename from
>> "com.google.cloud.dataflow" to "org.apache.beam". At the earliest, this
>> could occur sometime tomorrow afternoon (Pacific time).
>>
>> Unfortunately, there's no way to do this without affecting ongoing work.
>> We'll try to do it as quickly as possible to minimize such impact.
>>
>> We'll ensure that existing automated testing passes before merging the
>> change, with the exception of integration coverage with the Google Cloud
>> Dataflow service. We expect that the code in Beam's master will not work
>> against Cloud Dataflow for a little bit -- we'll accept this breakage on a
>> one-time basis, and try to recover it as soon as possible thereafter.
>>
>> If anybody sees any issues with this plan, I'd love to hear it.
>>
>> This is one of those mandatory things we've been delaying for a while now.
>> Of course, there are more such things to come, but hopefully none that are
>> this wide.
>>
>> Thanks!
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Hive Runner

2016-04-13 Thread Zahoor Mohamed J
Great... Please share the details and I am open to help

./Zahoor@iPhone

> On 12-Apr-2016, at 10:23 PM, Jean-Baptiste Onofré  wrote:
> 
> Hi,
> 
> yes, I started the MapReduce runner. I can share where I am with you.
> 
> Regards
> JB
> 
>> On 04/12/2016 06:19 PM, Zahoor Mohamed J wrote:
>> That brings me to MapReduce runner... Any one working on that... Iam 
>> interested in helping and learning here.. Any pointers to start looking at 
>>  To design a MapReduce runner?
>> 
>> ./Zahoor@iPhone
>> 
>>> On 12-Apr-2016, at 7:42 PM, Jean-Baptiste Onofré  wrote:
>>> 
>>> We can imagine to translate some Fn (DoPar) as Hive/SQL statements. I don't 
>>> think it's super interesting, but why not.
>>> 
>>> On the other hand, definitely, we will provide an IO for Hive.
>>> 
>>> Regards
>>> JB
>>> 
 On 04/12/2016 04:04 PM, Aljoscha Krettek wrote:
 Hi,
 what do you mean by Hive Runner? AFAIK Hive provides an SQL like interface
 to data while execution is handled either by a MapReduce backend or the
 newer Tez backend. Therefore I don't think it makes sense to put Beam on
 Hive.
 
 Cheers,
 Aljoscha
 
> On Tue, 12 Apr 2016 at 15:38 Jean-Baptiste Onofré  
> wrote:
> 
> Hi,
> 
> you are right: for now, we have runners for spark, flink, google cloud
> platform.
> 
> Some work are in progress to provide MapReduce, Gearpump runners. And
> we're also preparing a Runner API to simplify the way of writing runners.
> 
> On the other hand, we will improve the website to provide a better
> visibility on the Beam support (current and coming runners, IOs,
> SDKs/DSLs).
> 
> If you are interested to work on a Hive runner, please let me know, we
> love contribution !
> 
> Thanks,
> Regards
> JB
> 
>> On 04/12/2016 03:20 PM, Ly, Kiet wrote:
>> I didn't see Hive runner in Beam. Is there a plan for Hive runner
> component?
>> 
>> Confidentiality Notice::  This email, including attachments, may include
> non-public, proprietary, confidential or legally privileged information.
> If you are not an intended recipient or an authorized agent of an intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of the information contained in or transmitted with this e-mail is
> unauthorized and strictly prohibited.  If you have received this email in
> error, please notify the sender by replying to this message and 
> permanently
> delete this e-mail, its attachments, and any copies of it immediately.  
> You
> should not retain, copy or use this e-mail or any attachment for any
> purpose, nor disclose all or any part of the contents to any other person.
> Thank you.
> 
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>>> 
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
> 
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: [jira] [Commented] (BEAM-190) Dead-letter drop for bad BigQuery records

2016-04-13 Thread Dan Halperin
I thought that we were under the impression that rather than losing data
it's likely better to update your pipeline to handle these?

On Wed, Apr 13, 2016 at 10:59 AM, Luke Cwik (JIRA)  wrote:

>
> [
> https://issues.apache.org/jira/browse/BEAM-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239701#comment-15239701
> ]
>
> Luke Cwik commented on BEAM-190:
> 
>
> I believe this can easily extend beyond BigQuery to having a dead letter
> feature for failing DoFns of any kind.
>
> > Dead-letter drop for bad BigQuery records
> > -
> >
> > Key: BEAM-190
> > URL: https://issues.apache.org/jira/browse/BEAM-190
> > Project: Beam
> >  Issue Type: Bug
> >  Components: runner-core
> >Reporter: Mark Shields
> >Assignee: Frances Perry
> >
> > If a BigQuery insert fails for data-specific rather than structural
> reasons (eg cannot parse a date) then the bundle will be retried
> indefinitely, first by BigQueryTableInserter.insertAll then by the overall
> production retry logic of the underlying runner.
> > Better would be to allow customer to specify a dead-letter store for
> records such as those so that overall processing can continue while bad
> records are quarantined.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>


Re: A question about windowed values

2016-04-13 Thread Kenneth Knowles
Good thread. Filed as https://issues.apache.org/jira/browse/BEAM-191.

On Wed, Apr 13, 2016 at 10:08 AM, Amit Sela  wrote:

> First of all, Thanks for the detailed explanation!
>
> I can say that from my point of view (as a runner developer) this is
> definitely confusing, especially discovering that an element in an empty
> window can be dropped at anytime, so +1 for Robert's comment on not having
> this public API, and according to Kenneth's lookup it looks like it's not
> entangled too deep.
>
> So I guess #valueInGlobalWindow should be the "go-to" default window (as
> long as no "real" windows are involved), should we consider making this
> more clear in the public API ? maybe WindowedValue#defaultValue(T) ?
> which will probably implement a global window.. just a thought.
>
> On Wed, Apr 13, 2016 at 7:29 PM Robert Bradshaw
> 
> wrote:
>
> > As Thomas says, the fact that we ever produce values in "no window" is
> > an implementation quirk that should probably be fixed. (IIRC, it's
> > used for the output of a GBK before we've done the
> > group-also-by-windows to figure out what window it really should be
> > in, so "value in unknown windows" would be a better choice).
> >
> > If a WindowFn doesn't assign a value to any windows, the system is
> > free to drop it. There are pros and cons to supporting this degenerate
> > case vs. making it an error. However, this should almost certainly not
> > be in the public API...
> >
> > - Robert
> >
> >
> > On Wed, Apr 13, 2016 at 9:06 AM, Thomas Groh 
> > wrote:
> > > Actually, my above claim isn't as strong as it can be.
> > >
> > > A value in no windows is considered to not exist. Values that are not
> > > assigned to any window can be dropped by a runner at *any time*. A
> > WindowFn
> > > *must* assign all elements to at least one window. All elements that
> are
> > > produced by any PTransform (including Sources) must be in a window,
> > > potentially the GlobalWindow.
> > >
> > > On Wed, Apr 13, 2016 at 8:52 AM, Thomas Groh  wrote:
> > >
> > >> Values should almost always be part of at least one window. WindowFns
> > >> should place all elements in at least one window, as values that are
> in
> > no
> > >> windows will be dropped when they reach a GroupByKey.
> > >>
> > >> Elements in no windows, for example those created by
> > >> WindowedValue.valueInEmptyWindows(T) are generally an implementation
> > >> detail of a transform; for example, in the InProcessPipelineRunner,
> the
> > KV > >> Iterable>> elements output by a GroupByKeyOnly are in
> > >> empty windows - but by the time the element reaches the boundary of
> the
> > >> GroupByKey, the elements are reassigned to the appropriate window(s).
> > >>
> > >> On Tue, Apr 12, 2016 at 11:44 PM, Amit Sela 
> > wrote:
> > >>
> > >>> My instinct tells me that if a value does not belong to a specific
> > window
> > >>> (in time) it's a part of a global window, but if so, what's the role
> of
> > >>> the
> > >>> "empty window". When should an element be a "value in an empty
> window"
> > ?
> > >>>
> > >>
> > >>
> >
>


Re: PROPOSAL: Apache Beam (virtual) meeting: 05/11/2016 08:00 - 11:00 Pacific time

2016-04-13 Thread Stephan Ewen
Should work for me as well!

On Wed, Apr 13, 2016 at 7:04 PM, James Malone <
jamesmal...@google.com.invalid> wrote:

> Sounds like we have broad consensus on the following:
>
> Date: 5/4/2016
> Time: 8:00 - 11:00 AM Pacific time
> Location: Virtual
>
> I will submit a PR to update the website (
> http://beam.incubator.apache.org/public-meetings/) later today.
>
> Best,
>
> James
>
> On Wed, Apr 13, 2016 at 9:21 AM, Milindu Sanoj Kumarage <
> agentmili...@gmail.com> wrote:
>
> > Hi,
> >
> > 5/4/2016 works for me
> >
> > Regards,
> > Milindu
> > On 13 Apr 2016 1:43 p.m., "Aljoscha Krettek" 
> wrote:
> >
> > > Either works for me.
> > >
> > > On Tue, 12 Apr 2016 at 22:29 Kenneth Knowles 
> > > wrote:
> > >
> > > > Either works for me. Thanks James!
> > > >
> > > > On Tue, Apr 12, 2016 at 11:31 AM, Amit Sela 
> > > wrote:
> > > >
> > > > > Anytime works for me.
> > > > >
> > > > > On Tue, Apr 12, 2016, 21:24 Jean-Baptiste Onofré 
> > > > wrote:
> > > > >
> > > > > > Hi James,
> > > > > >
> > > > > > 5/4 works for me !
> > > > > >
> > > > > > Thanks,
> > > > > > Regards
> > > > > > JB
> > > > > >
> > > > > > On 04/12/2016 05:05 PM, James Malone wrote:
> > > > > > > Hey JB,
> > > > > > >
> > > > > > > Sorry for the late reply! That is a good point; apologies I
> > missed
> > > > > > noticing
> > > > > > > that conflict. For everyone in the community, how would one of
> > the
> > > > > > > following alternatives work?
> > > > > > >
> > > > > > > 5/4/2016 - 8:00 - 11:00 AM Pacific time
> > > > > > > -or-
> > > > > > > 5/18/2016 - 8:00 - 11:00 AM Pacific time
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > James
> > > > > > >
> > > > > > > On Mon, Apr 11, 2016 at 11:17 AM, Lukasz Cwik
> > > >  > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > >> That works for me.
> > > > > > >> But it would be best if people just posted when they are
> > available
> > > > > > >> depending on the goal/scope of the meeting and then a date is
> > > > chosen.
> > > > > > >>
> > > > > > >> On Sun, Apr 10, 2016 at 9:40 PM, Jean-Baptiste Onofré <
> > > > > j...@nanthrax.net>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >>> OK, what about the week before ApacheCon ?
> > > > > > >>>
> > > > > > >>> Regards
> > > > > > >>> JB
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> On 04/11/2016 04:22 AM, Lukasz Cwik wrote:
> > > > > > >>>
> > > > > >  I will be gone May 14th - 31st so would prefer a date before
> > > that.
> > > > > > 
> > > > > >  On Fri, Apr 8, 2016 at 10:23 PM, Jean-Baptiste Onofré <
> > > > > > j...@nanthrax.net>
> > > > > >  wrote:
> > > > > > 
> > > > > >  Hi James,
> > > > > > >
> > > > > > > May 11th is during the ApacheCon Vancouver.
> > > > > > >
> > > > > > > As some Beam current and potential contributors could be
> busy
> > > at
> > > > > > > ApacheCon, maybe it's better to postpone to May 18th.
> > > > > > >
> > > > > > > WDYT ?
> > > > > > >
> > > > > > > Regards
> > > > > > > JB
> > > > > > >
> > > > > > >
> > > > > > > On 04/08/2016 10:37 PM, James Malone wrote:
> > > > > > >
> > > > > > > Hello everyone,
> > > > > > >>
> > > > > > >> I'd like to propose holding a meeting in May to discuss a
> > few
> > > > > Apache
> > > > > > >> Beam
> > > > > > >> topics. This could be a good venue to discuss design
> > > proposals,
> > > > > > gather
> > > > > > >> technical feedback, and the state of the Beam community.
> My
> > > > > thinking
> > > > > > >> is
> > > > > > >> we
> > > > > > >> will be able to cover two or three Apache Beam topics in
> > depth
> > > > > over
> > > > > > >> the
> > > > > > >> course of a few hours.
> > > > > > >>
> > > > > > >> To make the meeting accessible to the community, I
> propose a
> > > > > virtual
> > > > > > >> meeting on:
> > > > > > >>
> > > > > > >> Wednesday May 11th (2016/05/11)
> > > > > > >> 8:00 AM - 11:00 AM Pacific
> > > > > > >>
> > > > > > >> Since time may be limited, I propose agenda items
> > recommended
> > > by
> > > > > the
> > > > > > >> PPMC
> > > > > > >> are given preferences. Before the meeting we can finalize
> > the
> > > > > method
> > > > > > >> used
> > > > > > >> for the virtual meeting (like Google hangouts) and the
> > > finalized
> > > > > > >> agenda.
> > > > > > >> I'm also happy to volunteer myself for taking notes and
> > > > > coordinating
> > > > > > >> the
> > > > > > >> event.
> > > > > > >>
> > > > > > >> Best,
> > > > > > >>
> > > > > > >> James
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > > Jean-Baptiste Onofré
> > > > > > > jbono...@apache.org
> > > > > > > http://blog.nanthrax.net
> > > > > > > Talend - http://www.talend.com
> > > > > > >
> > > > > > >
> > > > > > 
> > > > > > >>> --
> > > > > > >>> Jean-Baptiste Onofré
> > > > > > >>> jbono...@apache.org
> > > > > 

Re: A question about windowed values

2016-04-13 Thread Amit Sela
First of all, Thanks for the detailed explanation!

I can say that from my point of view (as a runner developer) this is
definitely confusing, especially discovering that an element in an empty
window can be dropped at anytime, so +1 for Robert's comment on not having
this public API, and according to Kenneth's lookup it looks like it's not
entangled too deep.

So I guess #valueInGlobalWindow should be the "go-to" default window (as
long as no "real" windows are involved), should we consider making this
more clear in the public API ? maybe WindowedValue#defaultValue(T) ?
which will probably implement a global window.. just a thought.

On Wed, Apr 13, 2016 at 7:29 PM Robert Bradshaw 
wrote:

> As Thomas says, the fact that we ever produce values in "no window" is
> an implementation quirk that should probably be fixed. (IIRC, it's
> used for the output of a GBK before we've done the
> group-also-by-windows to figure out what window it really should be
> in, so "value in unknown windows" would be a better choice).
>
> If a WindowFn doesn't assign a value to any windows, the system is
> free to drop it. There are pros and cons to supporting this degenerate
> case vs. making it an error. However, this should almost certainly not
> be in the public API...
>
> - Robert
>
>
> On Wed, Apr 13, 2016 at 9:06 AM, Thomas Groh 
> wrote:
> > Actually, my above claim isn't as strong as it can be.
> >
> > A value in no windows is considered to not exist. Values that are not
> > assigned to any window can be dropped by a runner at *any time*. A
> WindowFn
> > *must* assign all elements to at least one window. All elements that are
> > produced by any PTransform (including Sources) must be in a window,
> > potentially the GlobalWindow.
> >
> > On Wed, Apr 13, 2016 at 8:52 AM, Thomas Groh  wrote:
> >
> >> Values should almost always be part of at least one window. WindowFns
> >> should place all elements in at least one window, as values that are in
> no
> >> windows will be dropped when they reach a GroupByKey.
> >>
> >> Elements in no windows, for example those created by
> >> WindowedValue.valueInEmptyWindows(T) are generally an implementation
> >> detail of a transform; for example, in the InProcessPipelineRunner, the
> KV >> Iterable>> elements output by a GroupByKeyOnly are in
> >> empty windows - but by the time the element reaches the boundary of the
> >> GroupByKey, the elements are reassigned to the appropriate window(s).
> >>
> >> On Tue, Apr 12, 2016 at 11:44 PM, Amit Sela 
> wrote:
> >>
> >>> My instinct tells me that if a value does not belong to a specific
> window
> >>> (in time) it's a part of a global window, but if so, what's the role of
> >>> the
> >>> "empty window". When should an element be a "value in an empty window"
> ?
> >>>
> >>
> >>
>


Re: PROPOSAL: Apache Beam (virtual) meeting: 05/11/2016 08:00 - 11:00 Pacific time

2016-04-13 Thread James Malone
Sounds like we have broad consensus on the following:

Date: 5/4/2016
Time: 8:00 - 11:00 AM Pacific time
Location: Virtual

I will submit a PR to update the website (
http://beam.incubator.apache.org/public-meetings/) later today.

Best,

James

On Wed, Apr 13, 2016 at 9:21 AM, Milindu Sanoj Kumarage <
agentmili...@gmail.com> wrote:

> Hi,
>
> 5/4/2016 works for me
>
> Regards,
> Milindu
> On 13 Apr 2016 1:43 p.m., "Aljoscha Krettek"  wrote:
>
> > Either works for me.
> >
> > On Tue, 12 Apr 2016 at 22:29 Kenneth Knowles 
> > wrote:
> >
> > > Either works for me. Thanks James!
> > >
> > > On Tue, Apr 12, 2016 at 11:31 AM, Amit Sela 
> > wrote:
> > >
> > > > Anytime works for me.
> > > >
> > > > On Tue, Apr 12, 2016, 21:24 Jean-Baptiste Onofré 
> > > wrote:
> > > >
> > > > > Hi James,
> > > > >
> > > > > 5/4 works for me !
> > > > >
> > > > > Thanks,
> > > > > Regards
> > > > > JB
> > > > >
> > > > > On 04/12/2016 05:05 PM, James Malone wrote:
> > > > > > Hey JB,
> > > > > >
> > > > > > Sorry for the late reply! That is a good point; apologies I
> missed
> > > > > noticing
> > > > > > that conflict. For everyone in the community, how would one of
> the
> > > > > > following alternatives work?
> > > > > >
> > > > > > 5/4/2016 - 8:00 - 11:00 AM Pacific time
> > > > > > -or-
> > > > > > 5/18/2016 - 8:00 - 11:00 AM Pacific time
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > James
> > > > > >
> > > > > > On Mon, Apr 11, 2016 at 11:17 AM, Lukasz Cwik
> > >  > > > >
> > > > > > wrote:
> > > > > >
> > > > > >> That works for me.
> > > > > >> But it would be best if people just posted when they are
> available
> > > > > >> depending on the goal/scope of the meeting and then a date is
> > > chosen.
> > > > > >>
> > > > > >> On Sun, Apr 10, 2016 at 9:40 PM, Jean-Baptiste Onofré <
> > > > j...@nanthrax.net>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> OK, what about the week before ApacheCon ?
> > > > > >>>
> > > > > >>> Regards
> > > > > >>> JB
> > > > > >>>
> > > > > >>>
> > > > > >>> On 04/11/2016 04:22 AM, Lukasz Cwik wrote:
> > > > > >>>
> > > > >  I will be gone May 14th - 31st so would prefer a date before
> > that.
> > > > > 
> > > > >  On Fri, Apr 8, 2016 at 10:23 PM, Jean-Baptiste Onofré <
> > > > > j...@nanthrax.net>
> > > > >  wrote:
> > > > > 
> > > > >  Hi James,
> > > > > >
> > > > > > May 11th is during the ApacheCon Vancouver.
> > > > > >
> > > > > > As some Beam current and potential contributors could be busy
> > at
> > > > > > ApacheCon, maybe it's better to postpone to May 18th.
> > > > > >
> > > > > > WDYT ?
> > > > > >
> > > > > > Regards
> > > > > > JB
> > > > > >
> > > > > >
> > > > > > On 04/08/2016 10:37 PM, James Malone wrote:
> > > > > >
> > > > > > Hello everyone,
> > > > > >>
> > > > > >> I'd like to propose holding a meeting in May to discuss a
> few
> > > > Apache
> > > > > >> Beam
> > > > > >> topics. This could be a good venue to discuss design
> > proposals,
> > > > > gather
> > > > > >> technical feedback, and the state of the Beam community. My
> > > > thinking
> > > > > >> is
> > > > > >> we
> > > > > >> will be able to cover two or three Apache Beam topics in
> depth
> > > > over
> > > > > >> the
> > > > > >> course of a few hours.
> > > > > >>
> > > > > >> To make the meeting accessible to the community, I propose a
> > > > virtual
> > > > > >> meeting on:
> > > > > >>
> > > > > >> Wednesday May 11th (2016/05/11)
> > > > > >> 8:00 AM - 11:00 AM Pacific
> > > > > >>
> > > > > >> Since time may be limited, I propose agenda items
> recommended
> > by
> > > > the
> > > > > >> PPMC
> > > > > >> are given preferences. Before the meeting we can finalize
> the
> > > > method
> > > > > >> used
> > > > > >> for the virtual meeting (like Google hangouts) and the
> > finalized
> > > > > >> agenda.
> > > > > >> I'm also happy to volunteer myself for taking notes and
> > > > coordinating
> > > > > >> the
> > > > > >> event.
> > > > > >>
> > > > > >> Best,
> > > > > >>
> > > > > >> James
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > > Jean-Baptiste Onofré
> > > > > > jbono...@apache.org
> > > > > > http://blog.nanthrax.net
> > > > > > Talend - http://www.talend.com
> > > > > >
> > > > > >
> > > > > 
> > > > > >>> --
> > > > > >>> Jean-Baptiste Onofré
> > > > > >>> jbono...@apache.org
> > > > > >>> http://blog.nanthrax.net
> > > > > >>> Talend - http://www.talend.com
> > > > > >>>
> > > > > >>
> > > > > >
> > > > >
> > > > > --
> > > > > Jean-Baptiste Onofré
> > > > > jbono...@apache.org
> > > > > http://blog.nanthrax.net
> > > > > Talend - http://www.talend.com
> > > > >
> > > >
> > >
> >
>


Re: PROPOSAL: Apache Beam (virtual) meeting: 05/11/2016 08:00 - 11:00 Pacific time

2016-04-13 Thread Robert Bradshaw
Either works for me.

On Wed, Apr 13, 2016 at 9:21 AM, Milindu Sanoj Kumarage <
agentmili...@gmail.com> wrote:

> Hi,
>
> 5/4/2016 works for me
>
> Regards,
> Milindu
> On 13 Apr 2016 1:43 p.m., "Aljoscha Krettek"  wrote:
>
> > Either works for me.
> >
> > On Tue, 12 Apr 2016 at 22:29 Kenneth Knowles 
> > wrote:
> >
> > > Either works for me. Thanks James!
> > >
> > > On Tue, Apr 12, 2016 at 11:31 AM, Amit Sela 
> > wrote:
> > >
> > > > Anytime works for me.
> > > >
> > > > On Tue, Apr 12, 2016, 21:24 Jean-Baptiste Onofré 
> > > wrote:
> > > >
> > > > > Hi James,
> > > > >
> > > > > 5/4 works for me !
> > > > >
> > > > > Thanks,
> > > > > Regards
> > > > > JB
> > > > >
> > > > > On 04/12/2016 05:05 PM, James Malone wrote:
> > > > > > Hey JB,
> > > > > >
> > > > > > Sorry for the late reply! That is a good point; apologies I
> missed
> > > > > noticing
> > > > > > that conflict. For everyone in the community, how would one of
> the
> > > > > > following alternatives work?
> > > > > >
> > > > > > 5/4/2016 - 8:00 - 11:00 AM Pacific time
> > > > > > -or-
> > > > > > 5/18/2016 - 8:00 - 11:00 AM Pacific time
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > James
> > > > > >
> > > > > > On Mon, Apr 11, 2016 at 11:17 AM, Lukasz Cwik
> > >  > > > >
> > > > > > wrote:
> > > > > >
> > > > > >> That works for me.
> > > > > >> But it would be best if people just posted when they are
> available
> > > > > >> depending on the goal/scope of the meeting and then a date is
> > > chosen.
> > > > > >>
> > > > > >> On Sun, Apr 10, 2016 at 9:40 PM, Jean-Baptiste Onofré <
> > > > j...@nanthrax.net>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> OK, what about the week before ApacheCon ?
> > > > > >>>
> > > > > >>> Regards
> > > > > >>> JB
> > > > > >>>
> > > > > >>>
> > > > > >>> On 04/11/2016 04:22 AM, Lukasz Cwik wrote:
> > > > > >>>
> > > > >  I will be gone May 14th - 31st so would prefer a date before
> > that.
> > > > > 
> > > > >  On Fri, Apr 8, 2016 at 10:23 PM, Jean-Baptiste Onofré <
> > > > > j...@nanthrax.net>
> > > > >  wrote:
> > > > > 
> > > > >  Hi James,
> > > > > >
> > > > > > May 11th is during the ApacheCon Vancouver.
> > > > > >
> > > > > > As some Beam current and potential contributors could be busy
> > at
> > > > > > ApacheCon, maybe it's better to postpone to May 18th.
> > > > > >
> > > > > > WDYT ?
> > > > > >
> > > > > > Regards
> > > > > > JB
> > > > > >
> > > > > >
> > > > > > On 04/08/2016 10:37 PM, James Malone wrote:
> > > > > >
> > > > > > Hello everyone,
> > > > > >>
> > > > > >> I'd like to propose holding a meeting in May to discuss a
> few
> > > > Apache
> > > > > >> Beam
> > > > > >> topics. This could be a good venue to discuss design
> > proposals,
> > > > > gather
> > > > > >> technical feedback, and the state of the Beam community. My
> > > > thinking
> > > > > >> is
> > > > > >> we
> > > > > >> will be able to cover two or three Apache Beam topics in
> depth
> > > > over
> > > > > >> the
> > > > > >> course of a few hours.
> > > > > >>
> > > > > >> To make the meeting accessible to the community, I propose a
> > > > virtual
> > > > > >> meeting on:
> > > > > >>
> > > > > >> Wednesday May 11th (2016/05/11)
> > > > > >> 8:00 AM - 11:00 AM Pacific
> > > > > >>
> > > > > >> Since time may be limited, I propose agenda items
> recommended
> > by
> > > > the
> > > > > >> PPMC
> > > > > >> are given preferences. Before the meeting we can finalize
> the
> > > > method
> > > > > >> used
> > > > > >> for the virtual meeting (like Google hangouts) and the
> > finalized
> > > > > >> agenda.
> > > > > >> I'm also happy to volunteer myself for taking notes and
> > > > coordinating
> > > > > >> the
> > > > > >> event.
> > > > > >>
> > > > > >> Best,
> > > > > >>
> > > > > >> James
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > > Jean-Baptiste Onofré
> > > > > > jbono...@apache.org
> > > > > > http://blog.nanthrax.net
> > > > > > Talend - http://www.talend.com
> > > > > >
> > > > > >
> > > > > 
> > > > > >>> --
> > > > > >>> Jean-Baptiste Onofré
> > > > > >>> jbono...@apache.org
> > > > > >>> http://blog.nanthrax.net
> > > > > >>> Talend - http://www.talend.com
> > > > > >>>
> > > > > >>
> > > > > >
> > > > >
> > > > > --
> > > > > Jean-Baptiste Onofré
> > > > > jbono...@apache.org
> > > > > http://blog.nanthrax.net
> > > > > Talend - http://www.talend.com
> > > > >
> > > >
> > >
> >
>


Re: A question about windowed values

2016-04-13 Thread Robert Bradshaw
As Thomas says, the fact that we ever produce values in "no window" is
an implementation quirk that should probably be fixed. (IIRC, it's
used for the output of a GBK before we've done the
group-also-by-windows to figure out what window it really should be
in, so "value in unknown windows" would be a better choice).

If a WindowFn doesn't assign a value to any windows, the system is
free to drop it. There are pros and cons to supporting this degenerate
case vs. making it an error. However, this should almost certainly not
be in the public API...

- Robert


On Wed, Apr 13, 2016 at 9:06 AM, Thomas Groh  wrote:
> Actually, my above claim isn't as strong as it can be.
>
> A value in no windows is considered to not exist. Values that are not
> assigned to any window can be dropped by a runner at *any time*. A WindowFn
> *must* assign all elements to at least one window. All elements that are
> produced by any PTransform (including Sources) must be in a window,
> potentially the GlobalWindow.
>
> On Wed, Apr 13, 2016 at 8:52 AM, Thomas Groh  wrote:
>
>> Values should almost always be part of at least one window. WindowFns
>> should place all elements in at least one window, as values that are in no
>> windows will be dropped when they reach a GroupByKey.
>>
>> Elements in no windows, for example those created by
>> WindowedValue.valueInEmptyWindows(T) are generally an implementation
>> detail of a transform; for example, in the InProcessPipelineRunner, the KV> Iterable>> elements output by a GroupByKeyOnly are in
>> empty windows - but by the time the element reaches the boundary of the
>> GroupByKey, the elements are reassigned to the appropriate window(s).
>>
>> On Tue, Apr 12, 2016 at 11:44 PM, Amit Sela  wrote:
>>
>>> My instinct tells me that if a value does not belong to a specific window
>>> (in time) it's a part of a global window, but if so, what's the role of
>>> the
>>> "empty window". When should an element be a "value in an empty window" ?
>>>
>>
>>


Re: PROPOSAL: Apache Beam (virtual) meeting: 05/11/2016 08:00 - 11:00 Pacific time

2016-04-13 Thread Milindu Sanoj Kumarage
Hi,

5/4/2016 works for me

Regards,
Milindu
On 13 Apr 2016 1:43 p.m., "Aljoscha Krettek"  wrote:

> Either works for me.
>
> On Tue, 12 Apr 2016 at 22:29 Kenneth Knowles 
> wrote:
>
> > Either works for me. Thanks James!
> >
> > On Tue, Apr 12, 2016 at 11:31 AM, Amit Sela 
> wrote:
> >
> > > Anytime works for me.
> > >
> > > On Tue, Apr 12, 2016, 21:24 Jean-Baptiste Onofré 
> > wrote:
> > >
> > > > Hi James,
> > > >
> > > > 5/4 works for me !
> > > >
> > > > Thanks,
> > > > Regards
> > > > JB
> > > >
> > > > On 04/12/2016 05:05 PM, James Malone wrote:
> > > > > Hey JB,
> > > > >
> > > > > Sorry for the late reply! That is a good point; apologies I missed
> > > > noticing
> > > > > that conflict. For everyone in the community, how would one of the
> > > > > following alternatives work?
> > > > >
> > > > > 5/4/2016 - 8:00 - 11:00 AM Pacific time
> > > > > -or-
> > > > > 5/18/2016 - 8:00 - 11:00 AM Pacific time
> > > > >
> > > > > Best,
> > > > >
> > > > > James
> > > > >
> > > > > On Mon, Apr 11, 2016 at 11:17 AM, Lukasz Cwik
> >  > > >
> > > > > wrote:
> > > > >
> > > > >> That works for me.
> > > > >> But it would be best if people just posted when they are available
> > > > >> depending on the goal/scope of the meeting and then a date is
> > chosen.
> > > > >>
> > > > >> On Sun, Apr 10, 2016 at 9:40 PM, Jean-Baptiste Onofré <
> > > j...@nanthrax.net>
> > > > >> wrote:
> > > > >>
> > > > >>> OK, what about the week before ApacheCon ?
> > > > >>>
> > > > >>> Regards
> > > > >>> JB
> > > > >>>
> > > > >>>
> > > > >>> On 04/11/2016 04:22 AM, Lukasz Cwik wrote:
> > > > >>>
> > > >  I will be gone May 14th - 31st so would prefer a date before
> that.
> > > > 
> > > >  On Fri, Apr 8, 2016 at 10:23 PM, Jean-Baptiste Onofré <
> > > > j...@nanthrax.net>
> > > >  wrote:
> > > > 
> > > >  Hi James,
> > > > >
> > > > > May 11th is during the ApacheCon Vancouver.
> > > > >
> > > > > As some Beam current and potential contributors could be busy
> at
> > > > > ApacheCon, maybe it's better to postpone to May 18th.
> > > > >
> > > > > WDYT ?
> > > > >
> > > > > Regards
> > > > > JB
> > > > >
> > > > >
> > > > > On 04/08/2016 10:37 PM, James Malone wrote:
> > > > >
> > > > > Hello everyone,
> > > > >>
> > > > >> I'd like to propose holding a meeting in May to discuss a few
> > > Apache
> > > > >> Beam
> > > > >> topics. This could be a good venue to discuss design
> proposals,
> > > > gather
> > > > >> technical feedback, and the state of the Beam community. My
> > > thinking
> > > > >> is
> > > > >> we
> > > > >> will be able to cover two or three Apache Beam topics in depth
> > > over
> > > > >> the
> > > > >> course of a few hours.
> > > > >>
> > > > >> To make the meeting accessible to the community, I propose a
> > > virtual
> > > > >> meeting on:
> > > > >>
> > > > >> Wednesday May 11th (2016/05/11)
> > > > >> 8:00 AM - 11:00 AM Pacific
> > > > >>
> > > > >> Since time may be limited, I propose agenda items recommended
> by
> > > the
> > > > >> PPMC
> > > > >> are given preferences. Before the meeting we can finalize the
> > > method
> > > > >> used
> > > > >> for the virtual meeting (like Google hangouts) and the
> finalized
> > > > >> agenda.
> > > > >> I'm also happy to volunteer myself for taking notes and
> > > coordinating
> > > > >> the
> > > > >> event.
> > > > >>
> > > > >> Best,
> > > > >>
> > > > >> James
> > > > >>
> > > > >>
> > > > >> --
> > > > > Jean-Baptiste Onofré
> > > > > jbono...@apache.org
> > > > > http://blog.nanthrax.net
> > > > > Talend - http://www.talend.com
> > > > >
> > > > >
> > > > 
> > > > >>> --
> > > > >>> Jean-Baptiste Onofré
> > > > >>> jbono...@apache.org
> > > > >>> http://blog.nanthrax.net
> > > > >>> Talend - http://www.talend.com
> > > > >>>
> > > > >>
> > > > >
> > > >
> > > > --
> > > > Jean-Baptiste Onofré
> > > > jbono...@apache.org
> > > > http://blog.nanthrax.net
> > > > Talend - http://www.talend.com
> > > >
> > >
> >
>


Re: A question about windowed values

2016-04-13 Thread Kenneth Knowles
It is fine to create a WindowedValue carrying no windows when it is a fully
reified WindowedValue.

It is when it becomes an element in a PCollection that a value must exist
within some window. In a PCollection> you can have
elements that do not *contain* any windows, but exist *within* some window,
probably the global window.

But even though I can explain it like that,
WindowedValue.valueInEmptyWindows might just be a confusing API that we
don't need. It seems there are just 11 files that reference
WindowedValue.valueInEmptyWindows [1] that mostly look like they'd be fine
with the global window.

Kenn

[1]
https://github.com/apache/incubator-beam/search?p=1&q=valueInEmptyWindows&utf8=%E2%9C%93


On Wed, Apr 13, 2016 at 9:06 AM, Thomas Groh 
wrote:

> Actually, my above claim isn't as strong as it can be.
>
> A value in no windows is considered to not exist. Values that are not
> assigned to any window can be dropped by a runner at *any time*. A WindowFn
> *must* assign all elements to at least one window. All elements that are
> produced by any PTransform (including Sources) must be in a window,
> potentially the GlobalWindow.
>
> On Wed, Apr 13, 2016 at 8:52 AM, Thomas Groh  wrote:
>
> > Values should almost always be part of at least one window. WindowFns
> > should place all elements in at least one window, as values that are in
> no
> > windows will be dropped when they reach a GroupByKey.
> >
> > Elements in no windows, for example those created by
> > WindowedValue.valueInEmptyWindows(T) are generally an implementation
> > detail of a transform; for example, in the InProcessPipelineRunner, the
> KV > Iterable>> elements output by a GroupByKeyOnly are in
> > empty windows - but by the time the element reaches the boundary of the
> > GroupByKey, the elements are reassigned to the appropriate window(s).
> >
> > On Tue, Apr 12, 2016 at 11:44 PM, Amit Sela 
> wrote:
> >
> >> My instinct tells me that if a value does not belong to a specific
> window
> >> (in time) it's a part of a global window, but if so, what's the role of
> >> the
> >> "empty window". When should an element be a "value in an empty window" ?
> >>
> >
> >
>


Re: A question about windowed values

2016-04-13 Thread Thomas Groh
Actually, my above claim isn't as strong as it can be.

A value in no windows is considered to not exist. Values that are not
assigned to any window can be dropped by a runner at *any time*. A WindowFn
*must* assign all elements to at least one window. All elements that are
produced by any PTransform (including Sources) must be in a window,
potentially the GlobalWindow.

On Wed, Apr 13, 2016 at 8:52 AM, Thomas Groh  wrote:

> Values should almost always be part of at least one window. WindowFns
> should place all elements in at least one window, as values that are in no
> windows will be dropped when they reach a GroupByKey.
>
> Elements in no windows, for example those created by
> WindowedValue.valueInEmptyWindows(T) are generally an implementation
> detail of a transform; for example, in the InProcessPipelineRunner, the KV Iterable>> elements output by a GroupByKeyOnly are in
> empty windows - but by the time the element reaches the boundary of the
> GroupByKey, the elements are reassigned to the appropriate window(s).
>
> On Tue, Apr 12, 2016 at 11:44 PM, Amit Sela  wrote:
>
>> My instinct tells me that if a value does not belong to a specific window
>> (in time) it's a part of a global window, but if so, what's the role of
>> the
>> "empty window". When should an element be a "value in an empty window" ?
>>
>
>


Re: TextIO.Read.Bound vs Create

2016-04-13 Thread Amit Sela
Yep. And I got a good answer for this one as well. Thanks!

On Wed, Apr 13, 2016, 19:00 Kenneth Knowles  wrote:

> This seems wrong. They should both be in the global window. I think your
> trouble is
>
> https://github.com/apache/incubator-beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java#L472
>
> On Tue, Apr 12, 2016 at 9:43 PM, Amit Sela  wrote:
>
> > Why input values from *TextIO.Read.Bound *belong to an empty window while
> > values from *Create* belong in a global window ?
> >
> > Thanks,
> > Amit
> >
>


Re: TextIO.Read.Bound vs Create

2016-04-13 Thread Kenneth Knowles
This seems wrong. They should both be in the global window. I think your
trouble is
https://github.com/apache/incubator-beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java#L472

On Tue, Apr 12, 2016 at 9:43 PM, Amit Sela  wrote:

> Why input values from *TextIO.Read.Bound *belong to an empty window while
> values from *Create* belong in a global window ?
>
> Thanks,
> Amit
>


Re: A question about windowed values

2016-04-13 Thread Thomas Groh
Values should almost always be part of at least one window. WindowFns
should place all elements in at least one window, as values that are in no
windows will be dropped when they reach a GroupByKey.

Elements in no windows, for example those created by
WindowedValue.valueInEmptyWindows(T) are generally an implementation detail
of a transform; for example, in the InProcessPipelineRunner, the KV>> elements output by a GroupByKeyOnly are in
empty windows - but by the time the element reaches the boundary of the
GroupByKey, the elements are reassigned to the appropriate window(s).

On Tue, Apr 12, 2016 at 11:44 PM, Amit Sela  wrote:

> My instinct tells me that if a value does not belong to a specific window
> (in time) it's a part of a global window, but if so, what's the role of the
> "empty window". When should an element be a "value in an empty window" ?
>


Re: PROPOSAL: Apache Beam (virtual) meeting: 05/11/2016 08:00 - 11:00 Pacific time

2016-04-13 Thread Aljoscha Krettek
Either works for me.

On Tue, 12 Apr 2016 at 22:29 Kenneth Knowles  wrote:

> Either works for me. Thanks James!
>
> On Tue, Apr 12, 2016 at 11:31 AM, Amit Sela  wrote:
>
> > Anytime works for me.
> >
> > On Tue, Apr 12, 2016, 21:24 Jean-Baptiste Onofré 
> wrote:
> >
> > > Hi James,
> > >
> > > 5/4 works for me !
> > >
> > > Thanks,
> > > Regards
> > > JB
> > >
> > > On 04/12/2016 05:05 PM, James Malone wrote:
> > > > Hey JB,
> > > >
> > > > Sorry for the late reply! That is a good point; apologies I missed
> > > noticing
> > > > that conflict. For everyone in the community, how would one of the
> > > > following alternatives work?
> > > >
> > > > 5/4/2016 - 8:00 - 11:00 AM Pacific time
> > > > -or-
> > > > 5/18/2016 - 8:00 - 11:00 AM Pacific time
> > > >
> > > > Best,
> > > >
> > > > James
> > > >
> > > > On Mon, Apr 11, 2016 at 11:17 AM, Lukasz Cwik
>  > >
> > > > wrote:
> > > >
> > > >> That works for me.
> > > >> But it would be best if people just posted when they are available
> > > >> depending on the goal/scope of the meeting and then a date is
> chosen.
> > > >>
> > > >> On Sun, Apr 10, 2016 at 9:40 PM, Jean-Baptiste Onofré <
> > j...@nanthrax.net>
> > > >> wrote:
> > > >>
> > > >>> OK, what about the week before ApacheCon ?
> > > >>>
> > > >>> Regards
> > > >>> JB
> > > >>>
> > > >>>
> > > >>> On 04/11/2016 04:22 AM, Lukasz Cwik wrote:
> > > >>>
> > >  I will be gone May 14th - 31st so would prefer a date before that.
> > > 
> > >  On Fri, Apr 8, 2016 at 10:23 PM, Jean-Baptiste Onofré <
> > > j...@nanthrax.net>
> > >  wrote:
> > > 
> > >  Hi James,
> > > >
> > > > May 11th is during the ApacheCon Vancouver.
> > > >
> > > > As some Beam current and potential contributors could be busy at
> > > > ApacheCon, maybe it's better to postpone to May 18th.
> > > >
> > > > WDYT ?
> > > >
> > > > Regards
> > > > JB
> > > >
> > > >
> > > > On 04/08/2016 10:37 PM, James Malone wrote:
> > > >
> > > > Hello everyone,
> > > >>
> > > >> I'd like to propose holding a meeting in May to discuss a few
> > Apache
> > > >> Beam
> > > >> topics. This could be a good venue to discuss design proposals,
> > > gather
> > > >> technical feedback, and the state of the Beam community. My
> > thinking
> > > >> is
> > > >> we
> > > >> will be able to cover two or three Apache Beam topics in depth
> > over
> > > >> the
> > > >> course of a few hours.
> > > >>
> > > >> To make the meeting accessible to the community, I propose a
> > virtual
> > > >> meeting on:
> > > >>
> > > >> Wednesday May 11th (2016/05/11)
> > > >> 8:00 AM - 11:00 AM Pacific
> > > >>
> > > >> Since time may be limited, I propose agenda items recommended by
> > the
> > > >> PPMC
> > > >> are given preferences. Before the meeting we can finalize the
> > method
> > > >> used
> > > >> for the virtual meeting (like Google hangouts) and the finalized
> > > >> agenda.
> > > >> I'm also happy to volunteer myself for taking notes and
> > coordinating
> > > >> the
> > > >> event.
> > > >>
> > > >> Best,
> > > >>
> > > >> James
> > > >>
> > > >>
> > > >> --
> > > > Jean-Baptiste Onofré
> > > > jbono...@apache.org
> > > > http://blog.nanthrax.net
> > > > Talend - http://www.talend.com
> > > >
> > > >
> > > 
> > > >>> --
> > > >>> Jean-Baptiste Onofré
> > > >>> jbono...@apache.org
> > > >>> http://blog.nanthrax.net
> > > >>> Talend - http://www.talend.com
> > > >>>
> > > >>
> > > >
> > >
> > > --
> > > Jean-Baptiste Onofré
> > > jbono...@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
> > >
> >
>