StatefulDoFnRunner

2017-04-05 Thread Thomas Weise
Hi,

While working on the support for splittable DoFn, I see a few cases where
the DoFn runner classes slightly complicate reuse across elements (or make
it a bit awkward to implement for the runner).

StateInternalsStateCleaner and TimeInternalsCleanupTimer take xxxInternals
instances. But since those are key specific, the runner writer has to
perform acrobatics to flip the key on these internal instances on a per
element basis (to avoid having to recreate the other objects that refer to
them).

Would it be possible to instead use the factory and retrieve the internals
by key? The runner then has the choice to optimize as needed. In general, I
think it would be nice if the processing context related classes are
designed so that they promote reuse of object instances across elements and
bundles and minimize object creation on a per key basis?

Thanks,
Thomas


Re: AssignWindowsDoFn

2017-04-05 Thread Thomas Weise
Thanks Kenn, I also think it would be nice to do away with the primitive.


On Wed, Apr 5, 2017 at 2:51 PM, Kenneth Knowles 
wrote:

> Yes, you have it correct. Now it is just done without a DoFn, as a
> primitive.
>
> We have some idea that we might re-add the capability; see
> https://issues.apache.org/jira/browse/BEAM-1287.
>
> Basically, it resolves around whether a window is viewed as an
> extra-special partition of a PCollection that deserves a primitive so it
> can be handled specially, versus just an implicit secondary grouping key
> that provides a GC time. I used to think the former; now I mostly think the
> latter.
>
> On Wed, Apr 5, 2017 at 12:25 PM, Thomas Weise  wrote:
>
> > Hi,
> >
> > As part of removing remaining OldDoFn reliance in ApexRunner I'm looking
> > for the DoFn replacement for
> > org.apache.beam.runners.core.AssignWindowsDoFn, specifically the
> > equivalent
> > of context.outputWindowedValue.
> >
> > https://github.com/apache/beam/blob/master/runners/apex/
> > src/main/java/org/apache/beam/runners/apex/translation/
> > WindowAssignTranslator.java#L52
> >
> > Looking elsewhere suggests that this isn't done using a DoFn (and reuse
> > respective execution operator)? If that's confirmed, then I can also
> create
> > a new operator for this.
> >
> > Thanks
> >
>


Re: How I hit a roadblock with AutoValue and AvroCoder

2017-04-05 Thread Kenneth Knowles
Great write up! Unfortunate situation :-(

On Wed, Apr 5, 2017 at 3:20 PM, Stephen Sisk 
wrote:

> Pablo - thanks for your investigation and taking the time to write this up!
>
> I filed https://issues.apache.org/jira/browse/BEAM-1891 for this.
>
> S
>
> On Wed, Apr 5, 2017 at 2:24 PM Ben Chambers 
> wrote:
>
> Correction autovalue coder.
>
> On Wed, Apr 5, 2017, 2:24 PM Ben Chambers  wrote:
>
> > Serializable coder had a separate set of issues - often larger and less
> > efficient. Ideally, we would have an avrocoder.
> >
> > On Wed, Apr 5, 2017, 2:15 PM Pablo Estrada 
> > wrote:
> >
> > As a note, it seems that SerializableCoder does the trick in this case,
> as
> > it does not require a no-arg constructor for the class that is being
> > deserialized - so perhaps we should encourage people to use that in the
> > future.
> > Best
> > -P.
> >
> > On Wed, Apr 5, 2017 at 1:48 PM Pablo Estrada  wrote:
> >
> > > Hi all,
> > > I was encouraged to write about my troubles to use PCollections of
> > > AutoValue classes with AvroCoder; because it seems like currently, this
> > is
> > > not possible.
> > >
> > > As part of the changes to PAssert, I meant to create a SuccessOrFailure
> > > class that could be passed in a PCollection to a `concludeTransform`,
> > which
> > > would be in charge of validating that all the assertions succeeded, and
> > use
> > > AvroCoder for serialization of that class. Consider this dummy example:
> > >
> > > @AutoValue
> > > abstract class FizzBuzz {
> > > ...
> > > }
> > >
> > > class FizzBuzzDoFn extends DoFn {
> > > ...
> > > }
> > >
> > > 1. The first problem was that the abstract class does not have any
> > > attributes, so AvroCoder can not scrape them. For this, (with advice
> from
> > > Kenn Knowles), the Coder would need to take the AutoValue-generated
> > class:
> > >
> > > .apply(ParDo.of(new FizzBuzzDoFn()))
> > > .setCoder(AvroCoder.of((Class) AutoValue_FizzBuzz.class))
> > >
> > > 2. This errored out saying that FizzBuzz and AutoValue_FizzBuzz are
> > > incompatible classes, so I just tried bypassing the type system like
> so:
> > >
> > > .setCoder(AvroCoder.of((Class) AutoValue_FizzBuzz.class))
> > >
> > > 3. This compiled properly, and encoding worked, but the problem came at
> > > decoding, because Avro specifically requires the class to have a no-arg
> > > constructor [1], and AutoValue-generated classes do not come with one.
> > This
> > > is a problem for several serialization frameworks, and we're not the
> > first
> > > ones to hit this [2], and the AutoValue people don't seem keen on
> adding
> > > this.
> > >
> > > Considering all that, it seems that the AutoValue-AvroCoder pair can
> not
> > > currently work. We'd need a serialization framework that does not
> depend
> > on
> > > calling the no-arg constructor and then filling in the attributes with
> > > reflection. I'm trying to check if SerializableCoder has different
> > > deserialization techniques; but for PAssert, I just decided to use
> > > POJO+AvroCoder.
> > >
> > > I hope my experience may be useful to others, and maybe start a
> > discussion
> > > on how to enable users to have AutoValue classes in their PCollections.
> > >
> > > Best
> > > -P.
> > >
> > > [1] -
> > >
> >
> http://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/
> reflect/package-summary.html?is-external=true
> > > [2] - https://github.com/google/auto/issues/122
> > >
> > >
> >
> >
>


Proposed Splittable DoFn API changes

2017-04-05 Thread Eugene Kirpichov
Hey all,

>From the recent experience in continuing implementation of Splittable DoFn,
I would like to propose a few changes to its API. They get rid of a bug,
make parts of its semantics more well-defined and easier for a user to get
right, and reduce the assumptions about the runner implementation.

In short:
- Add c.updateWatermark() and report watermark continuously via this method.
- Make SDF.@ProcessElement return void, which is simpler for users though
it doesn't allow to resume after a specified time
- Declare that SDF.@ProcessElement must guarantee that after it returns,
the entire tracker.currentRestriction() was processed.
- Add a bool RestrictionTracker.done() method to enforce the bullet above.
- For resuming after specified time, use regular DoFn with state and timers
API.

The only downside is the removal (from SDF) of ability to suspend the call
for a certain amount of time - the suggestion is that, if you need that,
you should use a regular DoFn and the timers API.

Please see the full proposal in the following doc and comment there & vote
on this thread.
https://docs.google.com/document/d/1BGc8pM1GOvZhwR9SARSVte-20XEoBUxrGJ5gTWXdv3c/edit?usp=sharing


I am going to concurrently start prototyping some parts of this proposal,
because the current implementation is simply wrong and this proposal is the
only way to fix it that I can think of, but I will adjust my implementation
based on the discussion. I believe this proposal should not affect runner
authors - I can make all the necessary changes myself.

Thanks!


Re: Calcite to Beam

2017-04-05 Thread Julian Hyde
+ dev@beam (please moderate through!)

There’s only been talk so far.

I found https://issues.apache.org/jira/browse/BEAM-301 
. Does that match what you were 
thinking? If so, let’s continue the conversation there.

Julian

> On Apr 5, 2017, at 3:13 PM, Khai Tran  wrote:
> 
> Hi,
> Just want to check if there is any ongoing effort to convert Calcite
> logical plan into Apache Beam APIs for stream processing?
> 
> Thanks,
> Khai



Re: How I hit a roadblock with AutoValue and AvroCoder

2017-04-05 Thread Stephen Sisk
Pablo - thanks for your investigation and taking the time to write this up!

I filed https://issues.apache.org/jira/browse/BEAM-1891 for this.

S

On Wed, Apr 5, 2017 at 2:24 PM Ben Chambers 
wrote:

Correction autovalue coder.

On Wed, Apr 5, 2017, 2:24 PM Ben Chambers  wrote:

> Serializable coder had a separate set of issues - often larger and less
> efficient. Ideally, we would have an avrocoder.
>
> On Wed, Apr 5, 2017, 2:15 PM Pablo Estrada 
> wrote:
>
> As a note, it seems that SerializableCoder does the trick in this case, as
> it does not require a no-arg constructor for the class that is being
> deserialized - so perhaps we should encourage people to use that in the
> future.
> Best
> -P.
>
> On Wed, Apr 5, 2017 at 1:48 PM Pablo Estrada  wrote:
>
> > Hi all,
> > I was encouraged to write about my troubles to use PCollections of
> > AutoValue classes with AvroCoder; because it seems like currently, this
> is
> > not possible.
> >
> > As part of the changes to PAssert, I meant to create a SuccessOrFailure
> > class that could be passed in a PCollection to a `concludeTransform`,
> which
> > would be in charge of validating that all the assertions succeeded, and
> use
> > AvroCoder for serialization of that class. Consider this dummy example:
> >
> > @AutoValue
> > abstract class FizzBuzz {
> > ...
> > }
> >
> > class FizzBuzzDoFn extends DoFn {
> > ...
> > }
> >
> > 1. The first problem was that the abstract class does not have any
> > attributes, so AvroCoder can not scrape them. For this, (with advice
from
> > Kenn Knowles), the Coder would need to take the AutoValue-generated
> class:
> >
> > .apply(ParDo.of(new FizzBuzzDoFn()))
> > .setCoder(AvroCoder.of((Class) AutoValue_FizzBuzz.class))
> >
> > 2. This errored out saying that FizzBuzz and AutoValue_FizzBuzz are
> > incompatible classes, so I just tried bypassing the type system like so:
> >
> > .setCoder(AvroCoder.of((Class) AutoValue_FizzBuzz.class))
> >
> > 3. This compiled properly, and encoding worked, but the problem came at
> > decoding, because Avro specifically requires the class to have a no-arg
> > constructor [1], and AutoValue-generated classes do not come with one.
> This
> > is a problem for several serialization frameworks, and we're not the
> first
> > ones to hit this [2], and the AutoValue people don't seem keen on adding
> > this.
> >
> > Considering all that, it seems that the AutoValue-AvroCoder pair can not
> > currently work. We'd need a serialization framework that does not depend
> on
> > calling the no-arg constructor and then filling in the attributes with
> > reflection. I'm trying to check if SerializableCoder has different
> > deserialization techniques; but for PAssert, I just decided to use
> > POJO+AvroCoder.
> >
> > I hope my experience may be useful to others, and maybe start a
> discussion
> > on how to enable users to have AutoValue classes in their PCollections.
> >
> > Best
> > -P.
> >
> > [1] -
> >
>
http://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/reflect/package-summary.html?is-external=true
> > [2] - https://github.com/google/auto/issues/122
> >
> >
>
>


Re: AssignWindowsDoFn

2017-04-05 Thread Kenneth Knowles
Yes, you have it correct. Now it is just done without a DoFn, as a
primitive.

We have some idea that we might re-add the capability; see
https://issues.apache.org/jira/browse/BEAM-1287.

Basically, it resolves around whether a window is viewed as an
extra-special partition of a PCollection that deserves a primitive so it
can be handled specially, versus just an implicit secondary grouping key
that provides a GC time. I used to think the former; now I mostly think the
latter.

On Wed, Apr 5, 2017 at 12:25 PM, Thomas Weise  wrote:

> Hi,
>
> As part of removing remaining OldDoFn reliance in ApexRunner I'm looking
> for the DoFn replacement for
> org.apache.beam.runners.core.AssignWindowsDoFn, specifically the
> equivalent
> of context.outputWindowedValue.
>
> https://github.com/apache/beam/blob/master/runners/apex/
> src/main/java/org/apache/beam/runners/apex/translation/
> WindowAssignTranslator.java#L52
>
> Looking elsewhere suggests that this isn't done using a DoFn (and reuse
> respective execution operator)? If that's confirmed, then I can also create
> a new operator for this.
>
> Thanks
>


Re: [PROPOSAL]: a new feature branch for SQL DSL

2017-04-05 Thread Ted Yu
I compiled BEAM-301 branch with calcite 1.12 - passed.

Julian tries to not break existing things, but he will if there's a reason
to do so :-)

On Wed, Apr 5, 2017 at 2:36 PM, Mingmin Xu  wrote:

> @Ted, thanks for the note. I intend to stick with one version, Beam 0.6.0
> and Calcite 1.11 so far, unless impacted by API change. Before it's merged
> back to master, will upgrade to the latest version.
>
> On Wed, Apr 5, 2017 at 2:14 PM, Ted Yu  wrote:
>
> > Working in feature branch is good - you may want to periodically sync up
> > with master.
> >
> > I noticed that you are using 1.11.0 of calcite.
> > 1.12 is out, FYI
> >
> > On Wed, Apr 5, 2017 at 2:05 PM, Mingmin Xu  wrote:
> >
> > > Hi all,
> > >
> > > I'm working on https://issues.apache.org/jira/browse/BEAM-301(Add a
> Beam
> > > SQL DSL). The skeleton is already in
> > > https://github.com/XuMingmin/beam/tree/BEAM-301, using Java SDK in the
> > > back-end. The goal is to provide a SQL interface over Beam, based on
> > > Calcite, including:
> > > 1). a translator to create Beam pipeline from SQL,
> > > (SELECT/INSERT/FILTER/GROUP-BY/JOIN/...);
> > > 2). an interactive client to submit queries;  (All-SQL mode)
> > > 3). a SQL API which reduce the work to create a Pipeline; (Semi-SQL
> mode)
> > >
> > > As we see many folks are interested in this feature, would like to
> > create a
> > > feature branch to have more involvement.
> > > Looking for comments and feedback.
> > >
> > > Thanks!
> > > 
> > > Mingmin
> > >
> >
>
>
>
> --
> 
> Mingmin
>


Re: [PROPOSAL]: a new feature branch for SQL DSL

2017-04-05 Thread Mingmin Xu
@Ted, thanks for the note. I intend to stick with one version, Beam 0.6.0
and Calcite 1.11 so far, unless impacted by API change. Before it's merged
back to master, will upgrade to the latest version.

On Wed, Apr 5, 2017 at 2:14 PM, Ted Yu  wrote:

> Working in feature branch is good - you may want to periodically sync up
> with master.
>
> I noticed that you are using 1.11.0 of calcite.
> 1.12 is out, FYI
>
> On Wed, Apr 5, 2017 at 2:05 PM, Mingmin Xu  wrote:
>
> > Hi all,
> >
> > I'm working on https://issues.apache.org/jira/browse/BEAM-301(Add a Beam
> > SQL DSL). The skeleton is already in
> > https://github.com/XuMingmin/beam/tree/BEAM-301, using Java SDK in the
> > back-end. The goal is to provide a SQL interface over Beam, based on
> > Calcite, including:
> > 1). a translator to create Beam pipeline from SQL,
> > (SELECT/INSERT/FILTER/GROUP-BY/JOIN/...);
> > 2). an interactive client to submit queries;  (All-SQL mode)
> > 3). a SQL API which reduce the work to create a Pipeline; (Semi-SQL mode)
> >
> > As we see many folks are interested in this feature, would like to
> create a
> > feature branch to have more involvement.
> > Looking for comments and feedback.
> >
> > Thanks!
> > 
> > Mingmin
> >
>



-- 

Mingmin


Re: How I hit a roadblock with AutoValue and AvroCoder

2017-04-05 Thread Ben Chambers
Correction autovalue coder.

On Wed, Apr 5, 2017, 2:24 PM Ben Chambers  wrote:

> Serializable coder had a separate set of issues - often larger and less
> efficient. Ideally, we would have an avrocoder.
>
> On Wed, Apr 5, 2017, 2:15 PM Pablo Estrada 
> wrote:
>
> As a note, it seems that SerializableCoder does the trick in this case, as
> it does not require a no-arg constructor for the class that is being
> deserialized - so perhaps we should encourage people to use that in the
> future.
> Best
> -P.
>
> On Wed, Apr 5, 2017 at 1:48 PM Pablo Estrada  wrote:
>
> > Hi all,
> > I was encouraged to write about my troubles to use PCollections of
> > AutoValue classes with AvroCoder; because it seems like currently, this
> is
> > not possible.
> >
> > As part of the changes to PAssert, I meant to create a SuccessOrFailure
> > class that could be passed in a PCollection to a `concludeTransform`,
> which
> > would be in charge of validating that all the assertions succeeded, and
> use
> > AvroCoder for serialization of that class. Consider this dummy example:
> >
> > @AutoValue
> > abstract class FizzBuzz {
> > ...
> > }
> >
> > class FizzBuzzDoFn extends DoFn {
> > ...
> > }
> >
> > 1. The first problem was that the abstract class does not have any
> > attributes, so AvroCoder can not scrape them. For this, (with advice from
> > Kenn Knowles), the Coder would need to take the AutoValue-generated
> class:
> >
> > .apply(ParDo.of(new FizzBuzzDoFn()))
> > .setCoder(AvroCoder.of((Class) AutoValue_FizzBuzz.class))
> >
> > 2. This errored out saying that FizzBuzz and AutoValue_FizzBuzz are
> > incompatible classes, so I just tried bypassing the type system like so:
> >
> > .setCoder(AvroCoder.of((Class) AutoValue_FizzBuzz.class))
> >
> > 3. This compiled properly, and encoding worked, but the problem came at
> > decoding, because Avro specifically requires the class to have a no-arg
> > constructor [1], and AutoValue-generated classes do not come with one.
> This
> > is a problem for several serialization frameworks, and we're not the
> first
> > ones to hit this [2], and the AutoValue people don't seem keen on adding
> > this.
> >
> > Considering all that, it seems that the AutoValue-AvroCoder pair can not
> > currently work. We'd need a serialization framework that does not depend
> on
> > calling the no-arg constructor and then filling in the attributes with
> > reflection. I'm trying to check if SerializableCoder has different
> > deserialization techniques; but for PAssert, I just decided to use
> > POJO+AvroCoder.
> >
> > I hope my experience may be useful to others, and maybe start a
> discussion
> > on how to enable users to have AutoValue classes in their PCollections.
> >
> > Best
> > -P.
> >
> > [1] -
> >
> http://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/reflect/package-summary.html?is-external=true
> > [2] - https://github.com/google/auto/issues/122
> >
> >
>
>


Re: How I hit a roadblock with AutoValue and AvroCoder

2017-04-05 Thread Ben Chambers
Serializable coder had a separate set of issues - often larger and less
efficient. Ideally, we would have an avrocoder.

On Wed, Apr 5, 2017, 2:15 PM Pablo Estrada 
wrote:

> As a note, it seems that SerializableCoder does the trick in this case, as
> it does not require a no-arg constructor for the class that is being
> deserialized - so perhaps we should encourage people to use that in the
> future.
> Best
> -P.
>
> On Wed, Apr 5, 2017 at 1:48 PM Pablo Estrada  wrote:
>
> > Hi all,
> > I was encouraged to write about my troubles to use PCollections of
> > AutoValue classes with AvroCoder; because it seems like currently, this
> is
> > not possible.
> >
> > As part of the changes to PAssert, I meant to create a SuccessOrFailure
> > class that could be passed in a PCollection to a `concludeTransform`,
> which
> > would be in charge of validating that all the assertions succeeded, and
> use
> > AvroCoder for serialization of that class. Consider this dummy example:
> >
> > @AutoValue
> > abstract class FizzBuzz {
> > ...
> > }
> >
> > class FizzBuzzDoFn extends DoFn {
> > ...
> > }
> >
> > 1. The first problem was that the abstract class does not have any
> > attributes, so AvroCoder can not scrape them. For this, (with advice from
> > Kenn Knowles), the Coder would need to take the AutoValue-generated
> class:
> >
> > .apply(ParDo.of(new FizzBuzzDoFn()))
> > .setCoder(AvroCoder.of((Class) AutoValue_FizzBuzz.class))
> >
> > 2. This errored out saying that FizzBuzz and AutoValue_FizzBuzz are
> > incompatible classes, so I just tried bypassing the type system like so:
> >
> > .setCoder(AvroCoder.of((Class) AutoValue_FizzBuzz.class))
> >
> > 3. This compiled properly, and encoding worked, but the problem came at
> > decoding, because Avro specifically requires the class to have a no-arg
> > constructor [1], and AutoValue-generated classes do not come with one.
> This
> > is a problem for several serialization frameworks, and we're not the
> first
> > ones to hit this [2], and the AutoValue people don't seem keen on adding
> > this.
> >
> > Considering all that, it seems that the AutoValue-AvroCoder pair can not
> > currently work. We'd need a serialization framework that does not depend
> on
> > calling the no-arg constructor and then filling in the attributes with
> > reflection. I'm trying to check if SerializableCoder has different
> > deserialization techniques; but for PAssert, I just decided to use
> > POJO+AvroCoder.
> >
> > I hope my experience may be useful to others, and maybe start a
> discussion
> > on how to enable users to have AutoValue classes in their PCollections.
> >
> > Best
> > -P.
> >
> > [1] -
> >
> http://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/reflect/package-summary.html?is-external=true
> > [2] - https://github.com/google/auto/issues/122
> >
> >
>


Re: How I hit a roadblock with AutoValue and AvroCoder

2017-04-05 Thread Pablo Estrada
As a note, it seems that SerializableCoder does the trick in this case, as
it does not require a no-arg constructor for the class that is being
deserialized - so perhaps we should encourage people to use that in the
future.
Best
-P.

On Wed, Apr 5, 2017 at 1:48 PM Pablo Estrada  wrote:

> Hi all,
> I was encouraged to write about my troubles to use PCollections of
> AutoValue classes with AvroCoder; because it seems like currently, this is
> not possible.
>
> As part of the changes to PAssert, I meant to create a SuccessOrFailure
> class that could be passed in a PCollection to a `concludeTransform`, which
> would be in charge of validating that all the assertions succeeded, and use
> AvroCoder for serialization of that class. Consider this dummy example:
>
> @AutoValue
> abstract class FizzBuzz {
> ...
> }
>
> class FizzBuzzDoFn extends DoFn {
> ...
> }
>
> 1. The first problem was that the abstract class does not have any
> attributes, so AvroCoder can not scrape them. For this, (with advice from
> Kenn Knowles), the Coder would need to take the AutoValue-generated class:
>
> .apply(ParDo.of(new FizzBuzzDoFn()))
> .setCoder(AvroCoder.of((Class) AutoValue_FizzBuzz.class))
>
> 2. This errored out saying that FizzBuzz and AutoValue_FizzBuzz are
> incompatible classes, so I just tried bypassing the type system like so:
>
> .setCoder(AvroCoder.of((Class) AutoValue_FizzBuzz.class))
>
> 3. This compiled properly, and encoding worked, but the problem came at
> decoding, because Avro specifically requires the class to have a no-arg
> constructor [1], and AutoValue-generated classes do not come with one. This
> is a problem for several serialization frameworks, and we're not the first
> ones to hit this [2], and the AutoValue people don't seem keen on adding
> this.
>
> Considering all that, it seems that the AutoValue-AvroCoder pair can not
> currently work. We'd need a serialization framework that does not depend on
> calling the no-arg constructor and then filling in the attributes with
> reflection. I'm trying to check if SerializableCoder has different
> deserialization techniques; but for PAssert, I just decided to use
> POJO+AvroCoder.
>
> I hope my experience may be useful to others, and maybe start a discussion
> on how to enable users to have AutoValue classes in their PCollections.
>
> Best
> -P.
>
> [1] -
> http://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/reflect/package-summary.html?is-external=true
> [2] - https://github.com/google/auto/issues/122
>
>


Re: [PROPOSAL]: a new feature branch for SQL DSL

2017-04-05 Thread Ted Yu
Working in feature branch is good - you may want to periodically sync up
with master.

I noticed that you are using 1.11.0 of calcite.
1.12 is out, FYI

On Wed, Apr 5, 2017 at 2:05 PM, Mingmin Xu  wrote:

> Hi all,
>
> I'm working on https://issues.apache.org/jira/browse/BEAM-301(Add a Beam
> SQL DSL). The skeleton is already in
> https://github.com/XuMingmin/beam/tree/BEAM-301, using Java SDK in the
> back-end. The goal is to provide a SQL interface over Beam, based on
> Calcite, including:
> 1). a translator to create Beam pipeline from SQL,
> (SELECT/INSERT/FILTER/GROUP-BY/JOIN/...);
> 2). an interactive client to submit queries;  (All-SQL mode)
> 3). a SQL API which reduce the work to create a Pipeline; (Semi-SQL mode)
>
> As we see many folks are interested in this feature, would like to create a
> feature branch to have more involvement.
> Looking for comments and feedback.
>
> Thanks!
> 
> Mingmin
>


Re: [PROPOSAL]: a new feature branch for SQL DSL

2017-04-05 Thread Jesse Anderson
That will be awesome!

On Wed, Apr 5, 2017, 2:05 PM Mingmin Xu  wrote:

> Hi all,
>
> I'm working on https://issues.apache.org/jira/browse/BEAM-301(Add a Beam
> SQL DSL). The skeleton is already in
> https://github.com/XuMingmin/beam/tree/BEAM-301, using Java SDK in the
> back-end. The goal is to provide a SQL interface over Beam, based on
> Calcite, including:
> 1). a translator to create Beam pipeline from SQL,
> (SELECT/INSERT/FILTER/GROUP-BY/JOIN/...);
> 2). an interactive client to submit queries;  (All-SQL mode)
> 3). a SQL API which reduce the work to create a Pipeline; (Semi-SQL mode)
>
> As we see many folks are interested in this feature, would like to create a
> feature branch to have more involvement.
> Looking for comments and feedback.
>
> Thanks!
> 
> Mingmin
>
-- 
Thanks,

Jesse


[PROPOSAL]: a new feature branch for SQL DSL

2017-04-05 Thread Mingmin Xu
Hi all,

I'm working on https://issues.apache.org/jira/browse/BEAM-301(Add a Beam
SQL DSL). The skeleton is already in
https://github.com/XuMingmin/beam/tree/BEAM-301, using Java SDK in the
back-end. The goal is to provide a SQL interface over Beam, based on
Calcite, including:
1). a translator to create Beam pipeline from SQL,
(SELECT/INSERT/FILTER/GROUP-BY/JOIN/...);
2). an interactive client to submit queries;  (All-SQL mode)
3). a SQL API which reduce the work to create a Pipeline; (Semi-SQL mode)

As we see many folks are interested in this feature, would like to create a
feature branch to have more involvement.
Looking for comments and feedback.

Thanks!

Mingmin


How I hit a roadblock with AutoValue and AvroCoder

2017-04-05 Thread Pablo Estrada
Hi all,
I was encouraged to write about my troubles to use PCollections of
AutoValue classes with AvroCoder; because it seems like currently, this is
not possible.

As part of the changes to PAssert, I meant to create a SuccessOrFailure
class that could be passed in a PCollection to a `concludeTransform`, which
would be in charge of validating that all the assertions succeeded, and use
AvroCoder for serialization of that class. Consider this dummy example:

@AutoValue
abstract class FizzBuzz {
...
}

class FizzBuzzDoFn extends DoFn {
...
}

1. The first problem was that the abstract class does not have any
attributes, so AvroCoder can not scrape them. For this, (with advice from
Kenn Knowles), the Coder would need to take the AutoValue-generated class:

.apply(ParDo.of(new FizzBuzzDoFn()))
.setCoder(AvroCoder.of((Class) AutoValue_FizzBuzz.class))

2. This errored out saying that FizzBuzz and AutoValue_FizzBuzz are
incompatible classes, so I just tried bypassing the type system like so:

.setCoder(AvroCoder.of((Class) AutoValue_FizzBuzz.class))

3. This compiled properly, and encoding worked, but the problem came at
decoding, because Avro specifically requires the class to have a no-arg
constructor [1], and AutoValue-generated classes do not come with one. This
is a problem for several serialization frameworks, and we're not the first
ones to hit this [2], and the AutoValue people don't seem keen on adding
this.

Considering all that, it seems that the AutoValue-AvroCoder pair can not
currently work. We'd need a serialization framework that does not depend on
calling the no-arg constructor and then filling in the attributes with
reflection. I'm trying to check if SerializableCoder has different
deserialization techniques; but for PAssert, I just decided to use
POJO+AvroCoder.

I hope my experience may be useful to others, and maybe start a discussion
on how to enable users to have AutoValue classes in their PCollections.

Best
-P.

[1] -
http://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/reflect/package-summary.html?is-external=true
[2] - https://github.com/google/auto/issues/122


AssignWindowsDoFn

2017-04-05 Thread Thomas Weise
Hi,

As part of removing remaining OldDoFn reliance in ApexRunner I'm looking
for the DoFn replacement for
org.apache.beam.runners.core.AssignWindowsDoFn, specifically the equivalent
of context.outputWindowedValue.

https://github.com/apache/beam/blob/master/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/WindowAssignTranslator.java#L52

Looking elsewhere suggests that this isn't done using a DoFn (and reuse
respective execution operator)? If that's confirmed, then I can also create
a new operator for this.

Thanks


Behaviour of watermarks in the presence of WithTimestamps

2017-04-05 Thread Matthew Jadczak
Hi,

This is a question which goes back to the theoretical model. Normally, as 
defined in the Beam lateness semantics [1], the source is in charge of emitting 
appropriate timestamps and setting its own watermark. This is in general 
configurable by users by providing their own timestamp and watermark 
transformation functions, as implemented in e.g. KafkaIO and PubsubIO. Since 
the Read transform from that source is a root transform, we do not have an 
input watermark to worry about in that case, and the output watermark is under 
the control of the Source.

However, what is the actual and desired behaviour when we wish to materialise 
timestamp / watermark information in the middle of a pipeline? For example, we 
have a source which does not support timestamps and sets them all (along with 
the watermark) to `Long.MIN_VALUE`, or perhaps we need to do some complex 
processing to determine the correct timestamp. Further, the elements could be 
out of order with respect to the generated timestamps.

I tried analysing the code to work out what the behaviour in the DirectRunner 
is when using WithTimestamps to assign the timestamps in this case, but I 
wasn’t able to figure it out. Is the watermark advanced to the latest one of 
the timestamps assigned so far? If so, if data is out of order is the 
out-of-order data simply emitted late? (and does the allowed skew need to be in 
place there to allow that?)

The input watermark in this case is always the minimum time. Advancing the 
watermark in any way past that would break the assumption that output WM <= 
input WM (as written in [1]). However not advancing the watermark would mean we 
never fire triggers which depend on it, etc.

Am I misunderstanding something? Do the invariants in [1] only hold if we are 
not actively messing with timestamps/watermarks by using WithTimestamps or 
similar?

I had a look at [2], but this merely seems to be about adjusting the watermark 
after a “real” watermark is actually established (for example reading log 
files, etc.)

Any clarification would be appreciated.

Thanks,
Matt

[1] 
https://docs.google.com/document/d/12r7frmxNickxB5tbpuEh_n35_IJeVZn1peOrBrhhP6Y/edit#
[2] https://issues.apache.org/jira/browse/BEAM-644 

Re: Travis-CI Build failures

2017-04-05 Thread Ted Yu
Please see BEAM-1809

On Wed, Apr 5, 2017 at 1:01 AM, Amit Sela  wrote:

> Unfortunately Travis is not stable enough right now. I don't think this
> means that there is an issue with your work on the PR, you should notice
> Jenkins after PRing to see if all tests pass and the reviewing committer
> will followup with you further if necessary.
>
> Thanks!
>
> On Wed, Apr 5, 2017 at 9:47 AM Jins George  wrote:
>
> > Hello Team,
> >
> > I am a new contributor and  working on a ticket BEAM-1812. After making
> > code changes and pushing to my forked repo, I noticed that the Travis-CI
> > build fails with a reason
> >
> > 2017-04-05T00:36:07.408 [ERROR] Failed to execute goal
> > org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (default) on
> > project beam-sdks-parent:  Failed during checkstyle execution: Unable to
> > find suppressions file at location: beam/suppressions.xml: Could not
> > find resource 'beam/suppressions.xml'. -> [Help 1]
> >
> > The same error seems to be happening for all the Beam builds/PRs.
> >
> > Can I proceed with the PR ignoring this build failure ?
> >
> >
> > Thanks,
> > Jins George
> >
>


Re: Travis-CI Build failures

2017-04-05 Thread Jins George

Great. Thanks Amit.

On 04/05/2017 01:01 AM, Amit Sela wrote:

Unfortunately Travis is not stable enough right now. I don't think this
means that there is an issue with your work on the PR, you should notice
Jenkins after PRing to see if all tests pass and the reviewing committer
will followup with you further if necessary.

Thanks!

On Wed, Apr 5, 2017 at 9:47 AM Jins George  wrote:


Hello Team,

I am a new contributor and  working on a ticket BEAM-1812. After making
code changes and pushing to my forked repo, I noticed that the Travis-CI
build fails with a reason

2017-04-05T00:36:07.408 [ERROR] Failed to execute goal
org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (default) on
project beam-sdks-parent:  Failed during checkstyle execution: Unable to
find suppressions file at location: beam/suppressions.xml: Could not
find resource 'beam/suppressions.xml'. -> [Help 1]

The same error seems to be happening for all the Beam builds/PRs.

Can I proceed with the PR ignoring this build failure ?


Thanks,
Jins George





Re: Travis-CI Build failures

2017-04-05 Thread Amit Sela
Unfortunately Travis is not stable enough right now. I don't think this
means that there is an issue with your work on the PR, you should notice
Jenkins after PRing to see if all tests pass and the reviewing committer
will followup with you further if necessary.

Thanks!

On Wed, Apr 5, 2017 at 9:47 AM Jins George  wrote:

> Hello Team,
>
> I am a new contributor and  working on a ticket BEAM-1812. After making
> code changes and pushing to my forked repo, I noticed that the Travis-CI
> build fails with a reason
>
> 2017-04-05T00:36:07.408 [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (default) on
> project beam-sdks-parent:  Failed during checkstyle execution: Unable to
> find suppressions file at location: beam/suppressions.xml: Could not
> find resource 'beam/suppressions.xml'. -> [Help 1]
>
> The same error seems to be happening for all the Beam builds/PRs.
>
> Can I proceed with the PR ignoring this build failure ?
>
>
> Thanks,
> Jins George
>