Re: [PROPOSAL] Pipeline Runner API design doc

2016-08-02 Thread Kenneth Knowles
Hi,

Yes, there are a few things "TODO" including aggregators and triggers.
Triggers can either be an inline syntax tree or flattened and using
"pointers" like the transforms and PCollections. With coders we've hit
issues with the nesting and repetition that leads us to keep them
flattened. Essentially unreadable without un-flattening, so I would keep
things un-flattened if we weren't worried.

Kenn

On Tue, Aug 2, 2016 at 7:22 PM, Aljoscha Krettek 
wrote:

> Hi,
> thanks for putting this together. Now that I'm seeing them side by side I
> think the Avro schema looks a lot nicer than the JSON schema but it's
> probably alright since we don't want to change this often (as you already
> said). The advantage of JSON is that the (intermediate) plans can easily be
> inspected by humans.
>
> I think at this stage there is not much left to discuss on the plan
> representation. To me it seems pretty straightforward what has to be in
> there and that is already more or less in. The only real thing missing are
> triggers but there isn't yet a discussion about how that is going to work
> out, correct?
>
> Cheers,
> Aljoscha
>
> On Thu, 14 Jul 2016 at 21:34 Kenneth Knowles 
> wrote:
>
> > Hi everyone,
> >
> > I wanted to circle back on this thread and with another invitation to a
> > discussion. Work on the high level refactorings to align the Java SDK
> with
> > the primitives of the proposed model is pretty far along, as is moving
> out
> > the stuff that we don't want in the user-facing SDK.
> >
> > Since our runners are all Java-based, and we tend to discuss the model in
> > Java first, I think part of the proposal that may have received less
> > attention was the concrete Avro schema towards the bottom of the doc.
> Since
> > our serialization tech discussion seemed to favor JSON on the front end,
> I
> > just spent a few minutes to port the Avro schema to a JSON schema and do
> > some project set up to demonstrate where & how it would incorporate into
> > the project structure. I'd done the same for Avro previously, so we can
> see
> > how they compare.
> >
> > I put the code in a PR, for discussion only at this point, at
> > https://github.com/apache/incubator-beam/pull/662. I'd love if you took
> a
> > look at the notes on the PR and briefly at the schema; I'll continue to
> > evolve it according to current & future feedback.
> >
> > Kenn
> >
> > On Wed, Mar 23, 2016 at 2:17 PM, Kenneth Knowles  wrote:
> >
> > > Hi everyone,
> > >
> > > Incorporating the feedback from the 1-pager I circulated a week ago, I
> > > have put together a concrete design document for the new API(s).
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/1bao-5B6uBuf-kwH1meenAuXXS0c9cBQ1B2J59I3FiyI/edit?usp=sharing
> > >
> > > I appreciate any and all feedback on the design.
> > >
> > > Kenn
> > >
> >
>


Re: [PROPOSAL] Pipeline Runner API design doc

2016-08-02 Thread Aljoscha Krettek
Hi,
thanks for putting this together. Now that I'm seeing them side by side I
think the Avro schema looks a lot nicer than the JSON schema but it's
probably alright since we don't want to change this often (as you already
said). The advantage of JSON is that the (intermediate) plans can easily be
inspected by humans.

I think at this stage there is not much left to discuss on the plan
representation. To me it seems pretty straightforward what has to be in
there and that is already more or less in. The only real thing missing are
triggers but there isn't yet a discussion about how that is going to work
out, correct?

Cheers,
Aljoscha

On Thu, 14 Jul 2016 at 21:34 Kenneth Knowles  wrote:

> Hi everyone,
>
> I wanted to circle back on this thread and with another invitation to a
> discussion. Work on the high level refactorings to align the Java SDK with
> the primitives of the proposed model is pretty far along, as is moving out
> the stuff that we don't want in the user-facing SDK.
>
> Since our runners are all Java-based, and we tend to discuss the model in
> Java first, I think part of the proposal that may have received less
> attention was the concrete Avro schema towards the bottom of the doc. Since
> our serialization tech discussion seemed to favor JSON on the front end, I
> just spent a few minutes to port the Avro schema to a JSON schema and do
> some project set up to demonstrate where & how it would incorporate into
> the project structure. I'd done the same for Avro previously, so we can see
> how they compare.
>
> I put the code in a PR, for discussion only at this point, at
> https://github.com/apache/incubator-beam/pull/662. I'd love if you took a
> look at the notes on the PR and briefly at the schema; I'll continue to
> evolve it according to current & future feedback.
>
> Kenn
>
> On Wed, Mar 23, 2016 at 2:17 PM, Kenneth Knowles  wrote:
>
> > Hi everyone,
> >
> > Incorporating the feedback from the 1-pager I circulated a week ago, I
> > have put together a concrete design document for the new API(s).
> >
> >
> >
> https://docs.google.com/document/d/1bao-5B6uBuf-kwH1meenAuXXS0c9cBQ1B2J59I3FiyI/edit?usp=sharing
> >
> > I appreciate any and all feedback on the design.
> >
> > Kenn
> >
>


Podling Report Reminder - August 2016

2016-08-02 Thread johndament
Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 17 August 2016, 10:30 am PDT.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, August 03).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.

This should be appended to the Incubator Wiki page at:

http://wiki.apache.org/incubator/August2016

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC


Re: [PROPOSAL] IRC or slack channel for Apache Beam

2016-08-02 Thread James Malone
Invite sent!

Ideally we'd be able to let people self-join Slack. Still in the process of
setting that up (eta 1-2 weeks).

On Tue, Aug 2, 2016 at 1:22 PM, P. Taylor Goetz  wrote:

> Can I get an invite to the slack channel? I’m in the early stages of
> implementing a Beam runner for Apache Storm and have some (probably stupid
> ;) ) questions.
>
> Also, would it make sense to document the process on the Beam website so
> new users/devs can find out about it and how to join?
>
> -Taylor
>
> > On May 24, 2016, at 11:39 AM, Jean-Baptiste Onofré 
> wrote:
> >
> > Hi,
> >
> > We already discussed about that during the latest Beam developer meetup.
> >
> > Basically, as I said: Slack, IRC, Hangout, whatever are very convenient
> to discuss ideas, fixes, etc.
> >
> > However, all discussions happening there have to be summarized and
> shared on the mailing list.
> >
> > Good idea to have such reminder in the welcome message (I'm doing it).
> >
> > Thanks,
> > Regards
> > JB
> >
> > On 05/24/2016 05:33 PM, Ganelin, Ilya wrote:
> >> Hi, all - I'm a big fan of Slack and would love an invite to the room
> as well.
> >>
> >> With that said, from an early discussion on another Apache project, we
> were reminded that a key component of the Apache way is for all substantive
> discussion to be publicly visible, archived, and searchable.
> >>
> >> Slack may have the unintended consequence of promoting direct and
> invisible interaction, which while useful to the participants, may hinder
> the success of the project overall.
> >>
> >> As a relatively new member of Apache, I would defer to those wiser and
> more experienced in its ways (e.g our mentors) for further guidance but I
> think this is a point worth reminding folks of.
> >>
> >> Perhaps the answer is as simple as ensuring the welcome message on
> Slack reminds people of this.
> >>
> >> /my2c
> >>
> >>
> >>
> >> Sent with Good (www.good.com)
> >> 
> >> From: Jean-Baptiste Onofré 
> >> Sent: Tuesday, May 24, 2016 11:13:17 AM
> >> To: dev@beam.incubator.apache.org
> >> Subject: Re: [PROPOSAL] IRC or slack channel for Apache Beam
> >>
> >> Done
> >>
> >> Regards
> >> JB
> >>
> >> On 05/24/2016 04:46 PM, Jesse Anderson wrote:
> >>> Me too
> >>>
> >>> On Tue, May 24, 2016, 7:37 AM Jean-Baptiste Onofré 
> wrote:
> >>>
>  Done
> 
>  Regards
>  JB
> 
>  On 05/24/2016 04:31 PM, Simone Robutti wrote:
> > I would like to join, if it's possible. Thanks :)
> >
> > 2016-05-24 14:55 GMT+02:00 Jean-Baptiste Onofré :
> >
> >> Good idea !
> >>
> >> Thanks !
> >> Regards
> >> JB
> >>
> >>
> >> On 05/24/2016 02:53 PM, Maximilian Michels wrote:
> >>
> >>> Thanks. I've also invited Aljoscha Krettek and Kostas Kloudas.
> >>>
> >>> On Tue, May 24, 2016 at 2:15 PM, Jean-Baptiste Onofré <
> j...@nanthrax.net
> >
> >>> wrote:
> >>>
>  Hi Max,
> 
>  I just invited you.
> 
>  Regards
>  JB
> 
> 
>  On 05/24/2016 02:12 PM, Maximilian Michels wrote:
> 
> >
> > +1 for Slack.
> >
> > @James Could you invite me?
> >
> > On Thu, May 19, 2016 at 9:24 PM, James Malone
> >  wrote:
> >
> >>
> >> Hi all,
> >>
> >> It sounds like Slack is the clear winner here. So, I am happy
> to say
> >> that
> >> we now have our own Slack Team, open to all!
> >>
> >> http://apachebeam.slack.com
> >>
> >> Once I created the Slack team, it rejected the large blanket
> list of
> >> "acceptable email domains" I wanted to use (so signup is
> painless.)
> >> Instead, it looks like we'll have to use an invite system. I've
>  already
> >> modified the team so anyone can invite anyone else (to make it
> easy
>  to
> >> grow
> >> the Beam community.) But, we will need to manually invite some
>  people
> >> to
> >> get this process started.
> >>
> >> If you'd like an invite today, can you please email me -
> >> jamesmal...@apache.org and I will invite you ASAP.
> >>
> >> Best,
> >>
> >> James
> >>
> >> On Thu, May 19, 2016 at 9:36 AM, Milindu Sanoj Kumarage <
> >> agentmili...@gmail.com> wrote:
> >>
> >> +1 for Slack.
> >>> On 19 May 2016 5:43 p.m., "GANESH RAJU" 
>  wrote:
> >>>
> >>> +1 on slack
> 
>  Ganesh Raju
> 
>  Sent from my iPhone
> 
>  On May 18, 2016, at 3:41 AM, Jean-Baptiste Onofré <
>  j...@nanthrax.net>
> >
> 
>  wrote:

Re: [PROPOSAL] IRC or slack channel for Apache Beam

2016-08-02 Thread P. Taylor Goetz
Can I get an invite to the slack channel? I’m in the early stages of 
implementing a Beam runner for Apache Storm and have some (probably stupid ;) ) 
questions.

Also, would it make sense to document the process on the Beam website so new 
users/devs can find out about it and how to join?

-Taylor

> On May 24, 2016, at 11:39 AM, Jean-Baptiste Onofré  wrote:
> 
> Hi,
> 
> We already discussed about that during the latest Beam developer meetup.
> 
> Basically, as I said: Slack, IRC, Hangout, whatever are very convenient to 
> discuss ideas, fixes, etc.
> 
> However, all discussions happening there have to be summarized and shared on 
> the mailing list.
> 
> Good idea to have such reminder in the welcome message (I'm doing it).
> 
> Thanks,
> Regards
> JB
> 
> On 05/24/2016 05:33 PM, Ganelin, Ilya wrote:
>> Hi, all - I'm a big fan of Slack and would love an invite to the room as 
>> well.
>> 
>> With that said, from an early discussion on another Apache project, we were 
>> reminded that a key component of the Apache way is for all substantive 
>> discussion to be publicly visible, archived, and searchable.
>> 
>> Slack may have the unintended consequence of promoting direct and invisible 
>> interaction, which while useful to the participants, may hinder the success 
>> of the project overall.
>> 
>> As a relatively new member of Apache, I would defer to those wiser and more 
>> experienced in its ways (e.g our mentors) for further guidance but I think 
>> this is a point worth reminding folks of.
>> 
>> Perhaps the answer is as simple as ensuring the welcome message on Slack 
>> reminds people of this.
>> 
>> /my2c
>> 
>> 
>> 
>> Sent with Good (www.good.com)
>> 
>> From: Jean-Baptiste Onofré 
>> Sent: Tuesday, May 24, 2016 11:13:17 AM
>> To: dev@beam.incubator.apache.org
>> Subject: Re: [PROPOSAL] IRC or slack channel for Apache Beam
>> 
>> Done
>> 
>> Regards
>> JB
>> 
>> On 05/24/2016 04:46 PM, Jesse Anderson wrote:
>>> Me too
>>> 
>>> On Tue, May 24, 2016, 7:37 AM Jean-Baptiste Onofré  
>>> wrote:
>>> 
 Done
 
 Regards
 JB
 
 On 05/24/2016 04:31 PM, Simone Robutti wrote:
> I would like to join, if it's possible. Thanks :)
> 
> 2016-05-24 14:55 GMT+02:00 Jean-Baptiste Onofré :
> 
>> Good idea !
>> 
>> Thanks !
>> Regards
>> JB
>> 
>> 
>> On 05/24/2016 02:53 PM, Maximilian Michels wrote:
>> 
>>> Thanks. I've also invited Aljoscha Krettek and Kostas Kloudas.
>>> 
>>> On Tue, May 24, 2016 at 2:15 PM, Jean-Baptiste Onofré  
>>> wrote:
>>> 
 Hi Max,
 
 I just invited you.
 
 Regards
 JB
 
 
 On 05/24/2016 02:12 PM, Maximilian Michels wrote:
 
> 
> +1 for Slack.
> 
> @James Could you invite me?
> 
> On Thu, May 19, 2016 at 9:24 PM, James Malone
>  wrote:
> 
>> 
>> Hi all,
>> 
>> It sounds like Slack is the clear winner here. So, I am happy to say
>> that
>> we now have our own Slack Team, open to all!
>> 
>> http://apachebeam.slack.com
>> 
>> Once I created the Slack team, it rejected the large blanket list of
>> "acceptable email domains" I wanted to use (so signup is painless.)
>> Instead, it looks like we'll have to use an invite system. I've
 already
>> modified the team so anyone can invite anyone else (to make it easy
 to
>> grow
>> the Beam community.) But, we will need to manually invite some
 people
>> to
>> get this process started.
>> 
>> If you'd like an invite today, can you please email me -
>> jamesmal...@apache.org and I will invite you ASAP.
>> 
>> Best,
>> 
>> James
>> 
>> On Thu, May 19, 2016 at 9:36 AM, Milindu Sanoj Kumarage <
>> agentmili...@gmail.com> wrote:
>> 
>> +1 for Slack.
>>> On 19 May 2016 5:43 p.m., "GANESH RAJU" 
 wrote:
>>> 
>>> +1 on slack
 
 Ganesh Raju
 
 Sent from my iPhone
 
 On May 18, 2016, at 3:41 AM, Jean-Baptiste Onofré <
 j...@nanthrax.net>
> 
 
 wrote:
 
> 
> 
> Good point Robert.
> 
> I will be on the channel for sure (I'm already on bunch of Apache
> IRC
> 
 
 channels ;)).
 
> 
> 
> Regards
> JB
> 
> On 05/18/2016 10:26 AM, Robert Bradshaw wrote:
>> 

Re: [MENTOR] August 2016 Podling report on Wiki

2016-08-02 Thread Jean-Baptiste Onofré

Thanks !

Reviewed and signed off.

Regards
JB

On 08/02/2016 05:33 PM, James Malone wrote:

Hello Beam mentors!

The Beam podling report for August 2016 is now on the Apache Wiki and ready
for your review, comments, and sign off:

https://wiki.apache.org/incubator/August2016

Best,

James



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Proposal: Dynamic PIpelineOptions

2016-08-02 Thread Robert Bradshaw
Being able to "late-bind" parameters like input paths to a
pre-constructed program would be a very useful feature, and I think is
worth adding to Beam.

Of the four API proposals, I have a strong preference for (4).
Further, it seems that these need not be bound to the PipelineOptions
object itself (i.e. a named RuntimeValueSupplier could be constructed
off of a pipeline object), which the Python API makes less heavy use
of (encouraging the user to use familiar, standard libraries for
argument parsing), though of course such integration is useful to
provide for convenience.

- Robert

On Fri, Jul 29, 2016 at 12:14 PM, Sam McVeety  wrote:
> During the graph construction phase, the given SDK generates an initial
> execution graph for the program.  At execution time, this graph is
> executed, either locally or by a service.  Currently, Beam only supports
> parameterization at graph construction time.  Both Flink and Spark supply
> functionality that allows a pre-compiled job to be run without SDK
> interaction with updated runtime parameters.
>
> In its current incarnation, Dataflow can read values of PipelineOptions at
> job submission time, but this requires the presence of an SDK to properly
> encode these values into the job.  We would like to build a common layer
> into the Beam model so that these dynamic options can be properly provided
> to jobs.
>
> Please see
> https://docs.google.com/document/d/1I-iIgWDYasb7ZmXbGBHdok_IK1r1YAJ90JG5Fz0_28o/edit
> for the high-level model, and
> https://docs.google.com/document/d/17I7HeNQmiIfOJi0aI70tgGMMkOSgGi8ZUH-MOnFatZ8/edit
> for
> the specific API proposal.
>
> Cheers,
> Sam


[MENTOR] August 2016 Podling report on Wiki

2016-08-02 Thread James Malone
Hello Beam mentors!

The Beam podling report for August 2016 is now on the Apache Wiki and ready
for your review, comments, and sign off:

https://wiki.apache.org/incubator/August2016

Best,

James


Re: [REFLECT] Beam’s Half Birthday!

2016-08-02 Thread Ismaël Mejía
Hello,

Nice reminder of the work done, I feel quite proud of what this community
has
accomplished in this short time (and of course of been a recent member of
it).

One missing statistic that it is probably hard to measure is how some Beam
ideas
have helped to improve other Apache projects. I know this is the case for
Flink
for example, but it is easy to imagine that this continues to happen as well
with other Apache projects.

Other statistic that surprised me is the number of members of the mailing
lists,
probably it is normal at this time of the project to have more users in the
dev
list than in the user one, and this clearly reflects a healthy dev
community,
but we have to continue with the good work, so we can have thriving user
community too.

Congratulations and Happy Half Birthday Beamers.
Ismaël


On Tue, Aug 2, 2016 at 2:20 AM, Ahmet Altay 
wrote:

> Happy half-birthday!
>
> As one of the new comers of the Python SDK, it would be great to have it in
> the main branch. We are getting closer to that goal everyday.
>
> Thanks,
> Ahmet
>
> On Mon, Aug 1, 2016 at 10:02 AM, Jean-Baptiste Onofré 
> wrote:
>
> > Fully agree with Dan.
> >
> > Regards
> > JB
> >
> >
> > On 08/01/2016 06:56 PM, Dan Halperin wrote:
> >
> >> +1 (binding? ;)
> >>
> >> On this part of the email:
> >>
> >> This half birthday is also a good chance to take a step back and reflect
> 
> >>> on
> >>>
>  our goals for this year -- TLP graduation and the first stable
> release.
>  Where are we on this path? What can we do better to accomplish these
>  high-level goals?
> 
> >>>
> >> I think we really want to finish as many backwards-incompatible changes
> as
> >> possible. Here's a seed for that list.
> >>
> >>
> >>
> >>- DoFn setup/teardown
> >>- new DoFn proposal
> >>- Continuing to move google-specific IO from SDK into
> >>google-cloud-platform IO module
> >>- Any changes to fundamental style (PTransform.apply rename? Removing
> >>the .Bound wrappers in various transforms?)
> >>
> >> I'd also really like to see Gearpump runner (maybe also Apex) and Python
> >> SDK in the main branch.
> >>
> >> Thanks,
> >> Dan
> >>
> >>
> >> On Mon, Aug 1, 2016 at 8:36 AM, Aljoscha Krettek 
> >> wrote:
> >>
> >> +1
> >>>
> >>> This sounds very good, I can't come up with anything that you missed.
> >>>
> >>> On Mon, 1 Aug 2016 at 08:00 Jean-Baptiste Onofré 
> >>> wrote:
> >>>
> >>> Happy half birthday ;)
> 
>  Very good idea Frances !!
> 
>  And the numbers are impressive indeed.
> 
>  Maybe, we can add kind of teasing about new incoming PRs like:
> Cassandra
>  IO (PR submitted), MongDB IO (PR submitted), MQTT IO, JDBC IO, Socket
> IO
>  (I'm working on these IOs), XML/JSON DSLs .
> 
>  Regards
>  JB
> 
>  On 08/01/2016 04:36 PM, Frances Perry wrote:
> 
> > Hi Beamers!
> >
> > It’s been six months today since Beam was accepted into incubation.
> >
>  It’s
> >>>
>  thrilling how far we’ve come since then!
> >
> > I’d like to volunteer to put together a post on the Beam blog
> >
>  summarizing
> >>>
>  our progress since February. Here’s a starting point... What am I
> >
>  missing
> >>>
>  that we should include? What makes you proud?
> >
> > By the numbers:
> >
> > * 48,238 lines of preexisting code donated by Cloudera, dataArtisans,
> >
>  and
> >>>
>  Google.
> >
> > * 761 pull requests from 45 contributors.
> >
> > * 498 Jira issues opened and 245 resolved.
> >
> > * 1 incubating release (and another 1 in progress).
> >
> > * 4200 hours of automated tests.
> >
> > * 161 subscribers / 606 messages on user@.
> >
> > * 217 subscribers / 1205 messages on dev@.
> >
> > There’s been a lot of technical progress, including:
> >
> > * Refactoring of the entire codebase, examples, and tests to be truly
> > runner-independent.
> >
> > * New functionality in the Apache Flink runner for timestamps/windows
> >
>  in
> >>>
>  batch and bounded sources and side inputs in streaming mode.
> >
> > * Work in progress to upgrade the Apache Spark runner to use Spark
> 2.0.
> >
> > * Several new runners from the wider Apache community -- Apache
> >
>  Gearpump
> >>>
>  has its own feature branch, Apache Apex has a PR, and conversations
> are
> > starting on Apache Storm and others.
> >
> > * New SDKs/DSLs -- the Python SDK from Google is in, and there are
> >
>  plans
> >>>
>  to
> 
> > add the Scio DSL from Spotify.
> >
> > * Support for new IO connectors -- Apache Kafka and JMS are in, with
> >
>  Amazon
> 
> > Kinesis in PR.
> >
> > And community-wise, we’ve:
> >
> > * Started building a vibrant developer community,