Re: [PROPOSAL] Pipeline Runner API design doc
Hi, Yes, there are a few things "TODO" including aggregators and triggers. Triggers can either be an inline syntax tree or flattened and using "pointers" like the transforms and PCollections. With coders we've hit issues with the nesting and repetition that leads us to keep them flattened. Essentially unreadable without un-flattening, so I would keep things un-flattened if we weren't worried. Kenn On Tue, Aug 2, 2016 at 7:22 PM, Aljoscha Krettekwrote: > Hi, > thanks for putting this together. Now that I'm seeing them side by side I > think the Avro schema looks a lot nicer than the JSON schema but it's > probably alright since we don't want to change this often (as you already > said). The advantage of JSON is that the (intermediate) plans can easily be > inspected by humans. > > I think at this stage there is not much left to discuss on the plan > representation. To me it seems pretty straightforward what has to be in > there and that is already more or less in. The only real thing missing are > triggers but there isn't yet a discussion about how that is going to work > out, correct? > > Cheers, > Aljoscha > > On Thu, 14 Jul 2016 at 21:34 Kenneth Knowles > wrote: > > > Hi everyone, > > > > I wanted to circle back on this thread and with another invitation to a > > discussion. Work on the high level refactorings to align the Java SDK > with > > the primitives of the proposed model is pretty far along, as is moving > out > > the stuff that we don't want in the user-facing SDK. > > > > Since our runners are all Java-based, and we tend to discuss the model in > > Java first, I think part of the proposal that may have received less > > attention was the concrete Avro schema towards the bottom of the doc. > Since > > our serialization tech discussion seemed to favor JSON on the front end, > I > > just spent a few minutes to port the Avro schema to a JSON schema and do > > some project set up to demonstrate where & how it would incorporate into > > the project structure. I'd done the same for Avro previously, so we can > see > > how they compare. > > > > I put the code in a PR, for discussion only at this point, at > > https://github.com/apache/incubator-beam/pull/662. I'd love if you took > a > > look at the notes on the PR and briefly at the schema; I'll continue to > > evolve it according to current & future feedback. > > > > Kenn > > > > On Wed, Mar 23, 2016 at 2:17 PM, Kenneth Knowles wrote: > > > > > Hi everyone, > > > > > > Incorporating the feedback from the 1-pager I circulated a week ago, I > > > have put together a concrete design document for the new API(s). > > > > > > > > > > > > https://docs.google.com/document/d/1bao-5B6uBuf-kwH1meenAuXXS0c9cBQ1B2J59I3FiyI/edit?usp=sharing > > > > > > I appreciate any and all feedback on the design. > > > > > > Kenn > > > > > >
Re: [PROPOSAL] Pipeline Runner API design doc
Hi, thanks for putting this together. Now that I'm seeing them side by side I think the Avro schema looks a lot nicer than the JSON schema but it's probably alright since we don't want to change this often (as you already said). The advantage of JSON is that the (intermediate) plans can easily be inspected by humans. I think at this stage there is not much left to discuss on the plan representation. To me it seems pretty straightforward what has to be in there and that is already more or less in. The only real thing missing are triggers but there isn't yet a discussion about how that is going to work out, correct? Cheers, Aljoscha On Thu, 14 Jul 2016 at 21:34 Kenneth Knowleswrote: > Hi everyone, > > I wanted to circle back on this thread and with another invitation to a > discussion. Work on the high level refactorings to align the Java SDK with > the primitives of the proposed model is pretty far along, as is moving out > the stuff that we don't want in the user-facing SDK. > > Since our runners are all Java-based, and we tend to discuss the model in > Java first, I think part of the proposal that may have received less > attention was the concrete Avro schema towards the bottom of the doc. Since > our serialization tech discussion seemed to favor JSON on the front end, I > just spent a few minutes to port the Avro schema to a JSON schema and do > some project set up to demonstrate where & how it would incorporate into > the project structure. I'd done the same for Avro previously, so we can see > how they compare. > > I put the code in a PR, for discussion only at this point, at > https://github.com/apache/incubator-beam/pull/662. I'd love if you took a > look at the notes on the PR and briefly at the schema; I'll continue to > evolve it according to current & future feedback. > > Kenn > > On Wed, Mar 23, 2016 at 2:17 PM, Kenneth Knowles wrote: > > > Hi everyone, > > > > Incorporating the feedback from the 1-pager I circulated a week ago, I > > have put together a concrete design document for the new API(s). > > > > > > > https://docs.google.com/document/d/1bao-5B6uBuf-kwH1meenAuXXS0c9cBQ1B2J59I3FiyI/edit?usp=sharing > > > > I appreciate any and all feedback on the design. > > > > Kenn > > >
Podling Report Reminder - August 2016
Dear podling, This email was sent by an automated system on behalf of the Apache Incubator PMC. It is an initial reminder to give you plenty of time to prepare your quarterly board report. The board meeting is scheduled for Wed, 17 August 2016, 10:30 am PDT. The report for your podling will form a part of the Incubator PMC report. The Incubator PMC requires your report to be submitted 2 weeks before the board meeting, to allow sufficient time for review and submission (Wed, August 03). Please submit your report with sufficient time to allow the Incubator PMC, and subsequently board members to review and digest. Again, the very latest you should submit your report is 2 weeks prior to the board meeting. Thanks, The Apache Incubator PMC Submitting your Report -- Your report should contain the following: * Your project name * A brief description of your project, which assumes no knowledge of the project or necessarily of its field * A list of the three most important issues to address in the move towards graduation. * Any issues that the Incubator PMC or ASF Board might wish/need to be aware of * How has the community developed since the last report * How has the project developed since the last report. This should be appended to the Incubator Wiki page at: http://wiki.apache.org/incubator/August2016 Note: This is manually populated. You may need to wait a little before this page is created from a template. Mentors --- Mentors should review reports for their project(s) and sign them off on the Incubator wiki page. Signing off reports shows that you are following the project - projects that are not signed may raise alarms for the Incubator PMC. Incubator PMC
Re: [PROPOSAL] IRC or slack channel for Apache Beam
Invite sent! Ideally we'd be able to let people self-join Slack. Still in the process of setting that up (eta 1-2 weeks). On Tue, Aug 2, 2016 at 1:22 PM, P. Taylor Goetzwrote: > Can I get an invite to the slack channel? I’m in the early stages of > implementing a Beam runner for Apache Storm and have some (probably stupid > ;) ) questions. > > Also, would it make sense to document the process on the Beam website so > new users/devs can find out about it and how to join? > > -Taylor > > > On May 24, 2016, at 11:39 AM, Jean-Baptiste Onofré > wrote: > > > > Hi, > > > > We already discussed about that during the latest Beam developer meetup. > > > > Basically, as I said: Slack, IRC, Hangout, whatever are very convenient > to discuss ideas, fixes, etc. > > > > However, all discussions happening there have to be summarized and > shared on the mailing list. > > > > Good idea to have such reminder in the welcome message (I'm doing it). > > > > Thanks, > > Regards > > JB > > > > On 05/24/2016 05:33 PM, Ganelin, Ilya wrote: > >> Hi, all - I'm a big fan of Slack and would love an invite to the room > as well. > >> > >> With that said, from an early discussion on another Apache project, we > were reminded that a key component of the Apache way is for all substantive > discussion to be publicly visible, archived, and searchable. > >> > >> Slack may have the unintended consequence of promoting direct and > invisible interaction, which while useful to the participants, may hinder > the success of the project overall. > >> > >> As a relatively new member of Apache, I would defer to those wiser and > more experienced in its ways (e.g our mentors) for further guidance but I > think this is a point worth reminding folks of. > >> > >> Perhaps the answer is as simple as ensuring the welcome message on > Slack reminds people of this. > >> > >> /my2c > >> > >> > >> > >> Sent with Good (www.good.com) > >> > >> From: Jean-Baptiste Onofré > >> Sent: Tuesday, May 24, 2016 11:13:17 AM > >> To: dev@beam.incubator.apache.org > >> Subject: Re: [PROPOSAL] IRC or slack channel for Apache Beam > >> > >> Done > >> > >> Regards > >> JB > >> > >> On 05/24/2016 04:46 PM, Jesse Anderson wrote: > >>> Me too > >>> > >>> On Tue, May 24, 2016, 7:37 AM Jean-Baptiste Onofré > wrote: > >>> > Done > > Regards > JB > > On 05/24/2016 04:31 PM, Simone Robutti wrote: > > I would like to join, if it's possible. Thanks :) > > > > 2016-05-24 14:55 GMT+02:00 Jean-Baptiste Onofré : > > > >> Good idea ! > >> > >> Thanks ! > >> Regards > >> JB > >> > >> > >> On 05/24/2016 02:53 PM, Maximilian Michels wrote: > >> > >>> Thanks. I've also invited Aljoscha Krettek and Kostas Kloudas. > >>> > >>> On Tue, May 24, 2016 at 2:15 PM, Jean-Baptiste Onofré < > j...@nanthrax.net > > > >>> wrote: > >>> > Hi Max, > > I just invited you. > > Regards > JB > > > On 05/24/2016 02:12 PM, Maximilian Michels wrote: > > > > > +1 for Slack. > > > > @James Could you invite me? > > > > On Thu, May 19, 2016 at 9:24 PM, James Malone > > wrote: > > > >> > >> Hi all, > >> > >> It sounds like Slack is the clear winner here. So, I am happy > to say > >> that > >> we now have our own Slack Team, open to all! > >> > >> http://apachebeam.slack.com > >> > >> Once I created the Slack team, it rejected the large blanket > list of > >> "acceptable email domains" I wanted to use (so signup is > painless.) > >> Instead, it looks like we'll have to use an invite system. I've > already > >> modified the team so anyone can invite anyone else (to make it > easy > to > >> grow > >> the Beam community.) But, we will need to manually invite some > people > >> to > >> get this process started. > >> > >> If you'd like an invite today, can you please email me - > >> jamesmal...@apache.org and I will invite you ASAP. > >> > >> Best, > >> > >> James > >> > >> On Thu, May 19, 2016 at 9:36 AM, Milindu Sanoj Kumarage < > >> agentmili...@gmail.com> wrote: > >> > >> +1 for Slack. > >>> On 19 May 2016 5:43 p.m., "GANESH RAJU" > wrote: > >>> > >>> +1 on slack > > Ganesh Raju > > Sent from my iPhone > > On May 18, 2016, at 3:41 AM, Jean-Baptiste Onofré < > j...@nanthrax.net> > > > > wrote:
Re: [PROPOSAL] IRC or slack channel for Apache Beam
Can I get an invite to the slack channel? I’m in the early stages of implementing a Beam runner for Apache Storm and have some (probably stupid ;) ) questions. Also, would it make sense to document the process on the Beam website so new users/devs can find out about it and how to join? -Taylor > On May 24, 2016, at 11:39 AM, Jean-Baptiste Onofréwrote: > > Hi, > > We already discussed about that during the latest Beam developer meetup. > > Basically, as I said: Slack, IRC, Hangout, whatever are very convenient to > discuss ideas, fixes, etc. > > However, all discussions happening there have to be summarized and shared on > the mailing list. > > Good idea to have such reminder in the welcome message (I'm doing it). > > Thanks, > Regards > JB > > On 05/24/2016 05:33 PM, Ganelin, Ilya wrote: >> Hi, all - I'm a big fan of Slack and would love an invite to the room as >> well. >> >> With that said, from an early discussion on another Apache project, we were >> reminded that a key component of the Apache way is for all substantive >> discussion to be publicly visible, archived, and searchable. >> >> Slack may have the unintended consequence of promoting direct and invisible >> interaction, which while useful to the participants, may hinder the success >> of the project overall. >> >> As a relatively new member of Apache, I would defer to those wiser and more >> experienced in its ways (e.g our mentors) for further guidance but I think >> this is a point worth reminding folks of. >> >> Perhaps the answer is as simple as ensuring the welcome message on Slack >> reminds people of this. >> >> /my2c >> >> >> >> Sent with Good (www.good.com) >> >> From: Jean-Baptiste Onofré >> Sent: Tuesday, May 24, 2016 11:13:17 AM >> To: dev@beam.incubator.apache.org >> Subject: Re: [PROPOSAL] IRC or slack channel for Apache Beam >> >> Done >> >> Regards >> JB >> >> On 05/24/2016 04:46 PM, Jesse Anderson wrote: >>> Me too >>> >>> On Tue, May 24, 2016, 7:37 AM Jean-Baptiste Onofré >>> wrote: >>> Done Regards JB On 05/24/2016 04:31 PM, Simone Robutti wrote: > I would like to join, if it's possible. Thanks :) > > 2016-05-24 14:55 GMT+02:00 Jean-Baptiste Onofré : > >> Good idea ! >> >> Thanks ! >> Regards >> JB >> >> >> On 05/24/2016 02:53 PM, Maximilian Michels wrote: >> >>> Thanks. I've also invited Aljoscha Krettek and Kostas Kloudas. >>> >>> On Tue, May 24, 2016 at 2:15 PM, Jean-Baptiste Onofré >>> wrote: >>> Hi Max, I just invited you. Regards JB On 05/24/2016 02:12 PM, Maximilian Michels wrote: > > +1 for Slack. > > @James Could you invite me? > > On Thu, May 19, 2016 at 9:24 PM, James Malone > wrote: > >> >> Hi all, >> >> It sounds like Slack is the clear winner here. So, I am happy to say >> that >> we now have our own Slack Team, open to all! >> >> http://apachebeam.slack.com >> >> Once I created the Slack team, it rejected the large blanket list of >> "acceptable email domains" I wanted to use (so signup is painless.) >> Instead, it looks like we'll have to use an invite system. I've already >> modified the team so anyone can invite anyone else (to make it easy to >> grow >> the Beam community.) But, we will need to manually invite some people >> to >> get this process started. >> >> If you'd like an invite today, can you please email me - >> jamesmal...@apache.org and I will invite you ASAP. >> >> Best, >> >> James >> >> On Thu, May 19, 2016 at 9:36 AM, Milindu Sanoj Kumarage < >> agentmili...@gmail.com> wrote: >> >> +1 for Slack. >>> On 19 May 2016 5:43 p.m., "GANESH RAJU" wrote: >>> >>> +1 on slack Ganesh Raju Sent from my iPhone On May 18, 2016, at 3:41 AM, Jean-Baptiste Onofré < j...@nanthrax.net> > wrote: > > > Good point Robert. > > I will be on the channel for sure (I'm already on bunch of Apache > IRC > channels ;)). > > > Regards > JB > > On 05/18/2016 10:26 AM, Robert Bradshaw wrote: >>
Re: [MENTOR] August 2016 Podling report on Wiki
Thanks ! Reviewed and signed off. Regards JB On 08/02/2016 05:33 PM, James Malone wrote: Hello Beam mentors! The Beam podling report for August 2016 is now on the Apache Wiki and ready for your review, comments, and sign off: https://wiki.apache.org/incubator/August2016 Best, James -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com
Re: Proposal: Dynamic PIpelineOptions
Being able to "late-bind" parameters like input paths to a pre-constructed program would be a very useful feature, and I think is worth adding to Beam. Of the four API proposals, I have a strong preference for (4). Further, it seems that these need not be bound to the PipelineOptions object itself (i.e. a named RuntimeValueSupplier could be constructed off of a pipeline object), which the Python API makes less heavy use of (encouraging the user to use familiar, standard libraries for argument parsing), though of course such integration is useful to provide for convenience. - Robert On Fri, Jul 29, 2016 at 12:14 PM, Sam McVeetywrote: > During the graph construction phase, the given SDK generates an initial > execution graph for the program. At execution time, this graph is > executed, either locally or by a service. Currently, Beam only supports > parameterization at graph construction time. Both Flink and Spark supply > functionality that allows a pre-compiled job to be run without SDK > interaction with updated runtime parameters. > > In its current incarnation, Dataflow can read values of PipelineOptions at > job submission time, but this requires the presence of an SDK to properly > encode these values into the job. We would like to build a common layer > into the Beam model so that these dynamic options can be properly provided > to jobs. > > Please see > https://docs.google.com/document/d/1I-iIgWDYasb7ZmXbGBHdok_IK1r1YAJ90JG5Fz0_28o/edit > for the high-level model, and > https://docs.google.com/document/d/17I7HeNQmiIfOJi0aI70tgGMMkOSgGi8ZUH-MOnFatZ8/edit > for > the specific API proposal. > > Cheers, > Sam
[MENTOR] August 2016 Podling report on Wiki
Hello Beam mentors! The Beam podling report for August 2016 is now on the Apache Wiki and ready for your review, comments, and sign off: https://wiki.apache.org/incubator/August2016 Best, James
Re: [REFLECT] Beam’s Half Birthday!
Hello, Nice reminder of the work done, I feel quite proud of what this community has accomplished in this short time (and of course of been a recent member of it). One missing statistic that it is probably hard to measure is how some Beam ideas have helped to improve other Apache projects. I know this is the case for Flink for example, but it is easy to imagine that this continues to happen as well with other Apache projects. Other statistic that surprised me is the number of members of the mailing lists, probably it is normal at this time of the project to have more users in the dev list than in the user one, and this clearly reflects a healthy dev community, but we have to continue with the good work, so we can have thriving user community too. Congratulations and Happy Half Birthday Beamers. Ismaël On Tue, Aug 2, 2016 at 2:20 AM, Ahmet Altaywrote: > Happy half-birthday! > > As one of the new comers of the Python SDK, it would be great to have it in > the main branch. We are getting closer to that goal everyday. > > Thanks, > Ahmet > > On Mon, Aug 1, 2016 at 10:02 AM, Jean-Baptiste Onofré > wrote: > > > Fully agree with Dan. > > > > Regards > > JB > > > > > > On 08/01/2016 06:56 PM, Dan Halperin wrote: > > > >> +1 (binding? ;) > >> > >> On this part of the email: > >> > >> This half birthday is also a good chance to take a step back and reflect > > >>> on > >>> > our goals for this year -- TLP graduation and the first stable > release. > Where are we on this path? What can we do better to accomplish these > high-level goals? > > >>> > >> I think we really want to finish as many backwards-incompatible changes > as > >> possible. Here's a seed for that list. > >> > >> > >> > >>- DoFn setup/teardown > >>- new DoFn proposal > >>- Continuing to move google-specific IO from SDK into > >>google-cloud-platform IO module > >>- Any changes to fundamental style (PTransform.apply rename? Removing > >>the .Bound wrappers in various transforms?) > >> > >> I'd also really like to see Gearpump runner (maybe also Apex) and Python > >> SDK in the main branch. > >> > >> Thanks, > >> Dan > >> > >> > >> On Mon, Aug 1, 2016 at 8:36 AM, Aljoscha Krettek > >> wrote: > >> > >> +1 > >>> > >>> This sounds very good, I can't come up with anything that you missed. > >>> > >>> On Mon, 1 Aug 2016 at 08:00 Jean-Baptiste Onofré > >>> wrote: > >>> > >>> Happy half birthday ;) > > Very good idea Frances !! > > And the numbers are impressive indeed. > > Maybe, we can add kind of teasing about new incoming PRs like: > Cassandra > IO (PR submitted), MongDB IO (PR submitted), MQTT IO, JDBC IO, Socket > IO > (I'm working on these IOs), XML/JSON DSLs . > > Regards > JB > > On 08/01/2016 04:36 PM, Frances Perry wrote: > > > Hi Beamers! > > > > It’s been six months today since Beam was accepted into incubation. > > > It’s > >>> > thrilling how far we’ve come since then! > > > > I’d like to volunteer to put together a post on the Beam blog > > > summarizing > >>> > our progress since February. Here’s a starting point... What am I > > > missing > >>> > that we should include? What makes you proud? > > > > By the numbers: > > > > * 48,238 lines of preexisting code donated by Cloudera, dataArtisans, > > > and > >>> > Google. > > > > * 761 pull requests from 45 contributors. > > > > * 498 Jira issues opened and 245 resolved. > > > > * 1 incubating release (and another 1 in progress). > > > > * 4200 hours of automated tests. > > > > * 161 subscribers / 606 messages on user@. > > > > * 217 subscribers / 1205 messages on dev@. > > > > There’s been a lot of technical progress, including: > > > > * Refactoring of the entire codebase, examples, and tests to be truly > > runner-independent. > > > > * New functionality in the Apache Flink runner for timestamps/windows > > > in > >>> > batch and bounded sources and side inputs in streaming mode. > > > > * Work in progress to upgrade the Apache Spark runner to use Spark > 2.0. > > > > * Several new runners from the wider Apache community -- Apache > > > Gearpump > >>> > has its own feature branch, Apache Apex has a PR, and conversations > are > > starting on Apache Storm and others. > > > > * New SDKs/DSLs -- the Python SDK from Google is in, and there are > > > plans > >>> > to > > > add the Scio DSL from Spotify. > > > > * Support for new IO connectors -- Apache Kafka and JMS are in, with > > > Amazon > > > Kinesis in PR. > > > > And community-wise, we’ve: > > > > * Started building a vibrant developer community,