Re: [ANNOUNCE] New PMC Member: Pablo Estrada

2019-05-15 Thread Pei HE
Congrats, Pablo!

On Tue, May 14, 2019 at 11:41 PM Tanay Tummalapalli
 wrote:
>
> Congratulations Pablo!
>
> On Wed, May 15, 2019, 12:08 Michael Luckey  wrote:
>>
>> Congrats, Pablo!
>>
>> On Wed, May 15, 2019 at 8:21 AM Connell O'Callaghan  
>> wrote:
>>>
>>> Awesome well done Pablo!!!
>>>
>>> Kenn thank you for sharing this great news with us!!!
>>>
>>> On Tue, May 14, 2019 at 11:01 PM Ahmet Altay  wrote:

 Congratulations!

 On Tue, May 14, 2019 at 9:11 PM Robert Burke  wrote:
>
> Woohoo! Well deserved.
>
> On Tue, May 14, 2019, 8:34 PM Reuven Lax  wrote:
>>
>> Congratulations!
>>
>> From: Mikhail Gryzykhin 
>> Date: Tue, May 14, 2019 at 8:32 PM
>> To: 
>>
>>> Congratulations Pablo!
>>>
>>> On Tue, May 14, 2019, 20:25 Kenneth Knowles  wrote:

 Hi all,

 Please join me and the rest of the Beam PMC in welcoming Pablo Estrada 
 to join the PMC.

 Pablo first picked up BEAM-722 in October of 2016 and has been a 
 steady part of the Beam community since then. In addition to technical 
 work on Beam Python & Java & runners, I would highlight how Pablo 
 grows Beam's community by helping users, working on GSoC, giving talks 
 at Beam Summits and other OSS conferences including Flink Forward, and 
 holding training workshops. I cannot do justice to Pablo's 
 contributions in a single paragraph.

 Thanks Pablo, for being a part of Beam.

 Kenn


Re: [ANNOUNCE] New committer announcement: Mark Liu

2019-05-09 Thread Pei HE
Congratulations Mark!

On Thu, May 9, 2019 at 1:54 PM Mikhail Gryzykhin  wrote:
>
> Congratulations Mark!
>
> From: Kenneth Knowles 
> Date: Sun, Mar 24, 2019 at 9:40 PM
> To: dev
>
>> Hi all,
>>
>> Please join me and the rest of the Beam PMC in welcoming a new committer: 
>> Mark Liu.
>>
>> Mark has been contributing to Beam since late 2016! He has proposed 100+ 
>> pull requests. Mark was instrumental in expanding test and infrastructure 
>> coverage, especially for Python. In consideration of Mark's contributions, 
>> the Beam PMC trusts Mark with the responsibilities of a Beam committer [1].
>>
>> Thank you, Mark, for your contributions.
>>
>> Kenn
>>
>> [1] 
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer


Re: Going on leave for a bit

2018-06-26 Thread Pei HE
(A late) Congrats for the newborn!
--
Pei

On Tue, Jun 26, 2018 at 1:42 PM, Kenneth Knowles  wrote:
> Hi friends,
>
> I think I did not mention on dev@ at the time, but my child #2 arrived March
> 14 (Pi day!) and I took some weeks off. Starting ~July 4 I will be taking a
> more significant absence, until ~October 1, trying my best to be totally
> offline.
>
> JFYI so that you know why JIRAs and PRs are not being addressed. I am also
> unassigning my JIRAs so that I am not holding any mutexes, and I will close
> PRs so they don't get stale.
>
> Any questions or pressing issues, I will be online this week and a little
> bit next week.
>
> Kenn


Re: [ANNOUNCEMENT] New committers, May 2018 edition!

2018-06-01 Thread Pei HE
Congrats!

On Fri, Jun 1, 2018 at 2:12 PM, Charles Chen  wrote:
> Congratulations everyone!
>
>
> On Thu, May 31, 2018, 10:14 PM Pablo Estrada  wrote:
>>
>> Thanks to the PMC! Very humbled and excited to keep taking part in this
>> great community.
>> :)
>> -P.
>>
>>
>> On Thu, May 31, 2018, 10:10 PM Tim  wrote:
>>>
>>> Congratulations!
>>>
>>>
>>> Tim
>>>
>>> On 1 Jun 2018, at 07:05, Andrew Psaltis  wrote:
>>>
>>> Congrats!
>>>
>>> On Fri, Jun 1, 2018 at 12:26 AM, Thomas Weise  wrote:

 Congrats!


 On Thu, May 31, 2018 at 9:25 PM, Alan Myrvold 
 wrote:
>
> Congrats Gris+Pablo+Jason. Well deserved.
>
> On Thu, May 31, 2018 at 9:15 PM Jason Kuster 
> wrote:
>>
>> Thank you to Davor and the PMC; I'm excited to be able to help Beam in
>> this new capacity. Bring on the PRs. :D
>>
>> On Thu, May 31, 2018 at 8:55 PM Xin Wang 
>> wrote:
>>>
>>> Congrats!
>>>
>>> - Xin Wang
>>>
>>> 2018-06-01 11:52 GMT+08:00 Rui Wang :

 Congrats!

 -Rui

 On Thu, May 31, 2018 at 8:23 PM Jean-Baptiste Onofré
  wrote:
>
> Congrats !
>
> Regards
> JB
>
> On 01/06/2018 04:08, Davor Bonaci wrote:
> > Please join me and the rest of Beam PMC in welcoming the
> > following
> > contributors as our newest committers. They have significantly
> > contributed to the project in different ways, and we look forward
> > to
> > many more contributions in the future.
> >
> > * Griselda Cuevas
> > * Pablo Estrada
> > * Jason Kuster
> >
> > (Apologizes for a delayed announcement, and the lack of the usual
> > paragraph summarizing individual contributions.)
> >
> > Congratulations to all three! Welcome!
>>>
>>>
>>>
>>>
>>> --
>>> Thanks,
>>> Xin
>>
>>
>>
>> --
>> ---
>> Jason Kuster
>> Apache Beam / Google Cloud Dataflow
>>
>> See something? Say something. go/jasonkuster-feedback


>>>
>> --
>> Got feedback? go/pabloem-feedback


Re: A personal update

2017-12-12 Thread Pei HE
Great to have you back, and congrats!

On Wed, Dec 13, 2017 at 3:20 PM, Robert Bradshaw 
wrote:

> Great to hear from you again, and really happy you're sticking around!
>
> - Robert
>
>
> On Tue, Dec 12, 2017 at 10:47 PM, Ahmet Altay  wrote:
> > Welcome back! Looking forward to your contributions.
> >
> > Ahmet
> >
> > On Tue, Dec 12, 2017 at 10:05 PM, Jesse Anderson <
> je...@bigdatainstitute.io>
> > wrote:
> >>
> >> Congrats!
> >>
> >>
> >> On Wed, Dec 13, 2017, 5:54 AM Jean-Baptiste Onofré 
> >> wrote:
> >>>
> >>> Hi Davor,
> >>>
> >>> welcome back !!
> >>>
> >>> It's really great to see you back active in the Beam community. We
> really
> >>> need you !
> >>>
> >>> I'm so happy !
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> On 12/13/2017 05:51 AM, Davor Bonaci wrote:
> >>> > My dear friends,
> >>> > As many of you have noticed, I’ve been visibly absent from the
> project
> >>> > for a
> >>> > little while. During this time, a great number of you kept reaching
> >>> > out, and for
> >>> > that I’m deeply humbled and grateful to each and every one of you.
> >>> >
> >>> > I needed some time for personal reflection, which led to a transition
> >>> > in my
> >>> > professional life. As things have settled, I’m happy to again be
> >>> > working among
> >>> > all of you, as we propel this project forward. I plan to be active in
> >>> > the
> >>> > future, but perhaps not quite full-time as I was before.
> >>> >
> >>> > In the near term, I’m working on getting the report to the Board
> >>> > completed, as
> >>> > well as framing the discussion about the project state and vision
> going
> >>> > forwards. Additionally, I’ll make sure that we foster healthy
> community
> >>> > culture
> >>> > and operate in the Apache Way.
> >>> >
> >>> > For those who are curious, I’m happy to say that I’m starting a
> company
> >>> > building
> >>> > products related to Beam, along with several other members of this
> >>> > community and
> >>> > authors of this technology. I’ll share more on this next year, but
> >>> > until then if
> >>> > you have a data processing problem or an Apache Beam question, I’d
> love
> >>> > to hear
> >>> > from you ;-).
> >>> >
> >>> > Thanks -- and so happy to be back!
> >>> >
> >>> > Davor
> >>>
> >>> --
> >>> Jean-Baptiste Onofré
> >>> jbono...@apache.org
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> >
> >
>


Re: [VOTE] Use Gradle for Apache Beam developmental processes

2017-11-29 Thread Pei HE
+1

On Thu, Nov 30, 2017 at 8:48 AM, tarush grover 
wrote:

> +1 (binding).
>
> On Thu, 30 Nov 2017 at 2:06 AM, Eugene Kirpichov 
> wrote:
>
>> +1 (binding).
>>
>> I also think that the process here was handled in an acceptable fashion.
>> Due to the way our infrastructure works, merging to master was required in
>> order to gather essential information for a vote. Though I suppose we
>> probably could have had an additional vote about whether or not we should
>> even gather the information for the main vote.
>>
>> Regarding consensus - indeed the consensus on this issue is not
>> unanimous, but from my observation the concerns of all sides have been
>> heard and addressed by due diligence, even though the disagreement persists
>> - which is really all one can reasonably ask for in a large community: I
>> think the discussion did reach a point where a vote is the right next step
>> to make a decision.
>>
>> On Wed, Nov 29, 2017 at 10:19 AM Lukasz Cwik  wrote:
>>
>>> I have to disagree about comments about process since in order:
>>> * there was a discussion thread before any POCs were created where
>>> Gradle and Bazel were brought up
>>> * a PR was created that was brought up on dev@ and available to anyone
>>> for comment
>>> * on the discussion thread it was specifically brought up that empirical
>>> evidence was needed by Ken and Romain before a meaningful vote could be had
>>> * PR was merged on to master because testing infrastructure is heavily
>>> tied to the master branch because of https://issues.apache.org/
>>> jira/browse/BEAM-3047 and https://issues.apache.org/
>>> jira/browse/BEAM-3120. Also the PR specifically said it was to compare
>>> Maven/Gradle as described in the PR and using Jenkins was valuable since
>>> the information would be public and reproducible.
>>> * and now there is this vote thread
>>>
>>> On Wed, Nov 29, 2017 at 10:02 AM, Robert Bradshaw 
>>> wrote:
>>>
 +1 (binding)

 I agree with what both JB and Reuven had to say about process.

 On Wed, Nov 29, 2017 at 7:45 AM, Jean-Baptiste Onofré 
 wrote:
 > Hi Reuven,
 >
 > I know that the merge was not malicious. No problem at all ;)
 >
 > It's just about the community and consensus.
 >
 > For instance, I did discussion + vote to have a consensus about Spark
 2
 > support & upgrade.
 > It's not a big deal, but we have to be careful with our communities
 (here
 > the dev community, for the release schedule/cycle it's more our user
 > community ;)).
 >
 > Thanks,
 > Regards
 > JB
 >
 > On 11/29/2017 04:33 PM, Reuven Lax wrote:
 >>
 >> Thanks for bringing this up JB.
 >>
 >> I don't think the merge to master was done maliciously. We don't
 usually
 >> vote before merging PRs, and since that PR did not affect the normal
 build
 >> process. It was done to gather data about how well Gradle works, not
 to
 >> force any one outcome (one possible result of the data could have
 been that
 >> Gradle was slower), I can see how it wasn't obvious that we needed
 to vote
 >> before merging.
 >>
 >> However I also see how merging Gradle to master created the
 perception
 >> that some people were trying to force the issue forward without a
 vote, and
 >> perceptions like that can be damaging to community (regardless of
 good
 >> intentions). It's good we're voting now, and let's be more careful
 about
 >> such things in the future.
 >>
 >> Reuven
 >>
 >> On Wed, Nov 29, 2017 at 12:44 AM, Jean-Baptiste Onofré <
 j...@nanthrax.net
 >> > wrote:
 >>
 >> -0
 >>
 >> It's not for the change itself (gradle build is interesting) but
 for
 >> the
 >> process used here, which, IMHO, is not clean.
 >>
 >> My major concern is the fact that gradle build has been merged on
 >> master
 >> before the vote. I would have start the vote based on the
 discussion
 >> and PR
 >> branch (acting as a PoC).
 >>
 >> I have the feeling that part of the dev community already
 decided to
 >> change
 >> to gradle and pushed without waiting for the whole consensus.
 >>
 >> I don't want to "block" this change, but I wanted to raise my
 concern
 >> from a
 >> community standpoint.
 >>
 >> Regards
 >> JB
 >>
 >>
 >> On 11/28/2017 06:55 PM, Lukasz Cwik wrote:
 >>
 >> This is a procedural vote for migrating to use Gradle for
 all our
 >> development related processes (building, testing, and
 releasing).
 >> A
 >> majority vote will signal that:
 >> * Gradle build files will be supported and maintained
 

Re: [VOTE] Fixing @yyy.com.INVALID mailing addresses

2017-11-22 Thread Pei HE
+1

On Thu, Nov 23, 2017 at 8:43 AM, Holden Karau  wrote:

> +1 (non-binding)
>
> On Wed, Nov 22, 2017 at 4:06 PM Kenneth Knowles 
> wrote:
>
> > +1
> >
> > On Wed, Nov 22, 2017 at 3:43 PM, Lukasz Cwik 
> > wrote:
> >
> > > +1
> > >
> > > On Wed, Nov 22, 2017 at 3:35 PM, Reuven Lax 
> > > wrote:
> > >
> > > > +1
> > > >
> > > > On Nov 22, 2017 3:29 PM, "Ben Sidhom" 
> > wrote:
> > > >
> > > > > I'm not a PMC member, but this would be especially valuable if it
> > > > > propagated DKIM signatures properly.
> > > > >
> > > > > On Wed, Nov 22, 2017 at 3:25 PM, Lukasz Cwik
> >  > > >
> > > > > wrote:
> > > > >
> > > > > > I have noticed that some e-mail addresses (notably @google.com)
> > get
> > > > > > .INVALID suffixed onto it so per...@yyy.com become
> > > > > per...@yyy.com.INVALID
> > > > > > in the From: header.
> > > > > >
> > > > > > I have figured out that this is an issue with the way that our
> mail
> > > > > server
> > > > > > is configured and opened https://issues.apache.org/
> > > > > jira/browse/INFRA-15529
> > > > > > .
> > > > > >
> > > > > > For those of us that are impacted, it makes it more difficult for
> > > users
> > > > > to
> > > > > > reply directly to the originator.
> > > > > >
> > > > > > Infra has asked to get consensus from PMC members before making
> the
> > > > > change
> > > > > > which I figured it would be easiest with a vote.
> > > > > >
> > > > > > Please vote:
> > > > > > +1 Update mail server to stop suffixing .INVALID
> > > > > > -1 Don't change mail server settings.
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -Ben
> > > > >
> > > >
> > >
> >
> --
> Twitter: https://twitter.com/holdenkarau
>


Re: [VOTE] Drop Spark 1.x support to focus on Spark 2.x

2017-11-08 Thread Pei HE
+1 on moving forward with Spark 2.x only.
Spark 1 users can still use already released Spark runners, and we can
support them with minor version releases for future bug fixes.

I don't see how important it is to make future Beam releases available to
Spark 1 users. If they choose not to upgrade Spark clusters, maybe they
don't need the newest Beam releases as well.

I think it is more important to 1). be able to leverage new features in
Spark 2.x, 2.) extend user base to Spark 2.
--
Pei


On Thu, Nov 9, 2017 at 1:45 PM, Holden Karau  wrote:

> That's a good point about Oozie does only supporting only Spark 1 or 2 at a
> time on a cluster -- but do we know people using Oozie and Spark 1 that
> would still be using Spark 1 by the time of the next BEAM release? The last
> Spark 1 release was a year ago (and last non-maintenance release almost 20
> months ago).
>
> On Wed, Nov 8, 2017 at 9:30 PM, NerdyNick  wrote:
>
> > I don't know if ditching Spark 1 out right right now would be a great
> move
> > given that a lot of the main support applications around spark haven't
> yet
> > fully moved to Spark 2 yet. Yet alone have support for having a cluster
> > with both. Oozie for example is still pre stable release for their Spark
> 1
> > and can't support a cluster with mixed Spark version. I think maybe doing
> > as suggested above with the common, spark1, spark2 packaging might be
> best
> > during this carry over phase. Maybe even just flag spark 1 as deprecated
> > and just being maintained might be enough.
> >
> > On Wed, Nov 8, 2017 at 10:25 PM, Holden Karau 
> > wrote:
> >
> > > Also, upgrading Spark 1 to 2 is generally easier than changing JVM
> > > versions. For folks using YARN or the hosted environments it pretty
> much
> > > trivial since you can effectively have distinct Spark clusters for each
> > > job.
> > >
> > > On Wed, Nov 8, 2017 at 9:19 PM, Holden Karau 
> > wrote:
> > >
> > > > I'm +1 on dropping Spark 1. There are a lot of exciting improvements
> in
> > > > Spark 2, and trying to write efficient code that runs between Spark 1
> > and
> > > > Spark 2 is super painful in the long term. It would be one thing if
> > there
> > > > were a lot of people available to work on the Spark runners, but it
> > seems
> > > > like we'd be better spent focusing our energy on the future.
> > > >
> > > > I don't know a lot of folks who are stuck on Spark 1, and the few
> that
> > I
> > > > know are planning to migrate in the next few months anyways.
> > > >
> > > > Note: this is a non-binding vote as I'm not a committer or PMC
> member.
> > > >
> > > > On Wed, Nov 8, 2017 at 3:43 AM, Ted Yu  wrote:
> > > >
> > > >> Having both Spark1 and Spark2 modules would benefit wider user base.
> > > >>
> > > >> I would vote for that.
> > > >>
> > > >> Cheers
> > > >>
> > > >> On Wed, Nov 8, 2017 at 12:51 AM, Jean-Baptiste Onofré <
> > j...@nanthrax.net>
> > > >> wrote:
> > > >>
> > > >> > Hi Robert,
> > > >> >
> > > >> > Thanks for your feedback !
> > > >> >
> > > >> > From an user perspective, with the current state of the PR, the
> same
> > > >> > pipelines can run on both Spark 1.x and 2.x: the only difference
> is
> > > the
> > > >> > dependencies set.
> > > >> >
> > > >> > I'm calling the vote to get suck kind of feedback: if we consider
> > > Spark
> > > >> > 1.x still need to be supported, no problem, I will improve the PR
> to
> > > >> have
> > > >> > three modules (common, spark1, spark2) and let users pick the
> > desired
> > > >> > version.
> > > >> >
> > > >> > Let's wait a bit other feedbacks, I will update the PR
> accordingly.
> > > >> >
> > > >> > Regards
> > > >> > JB
> > > >> >
> > > >> >
> > > >> > On 11/08/2017 09:47 AM, Robert Bradshaw wrote:
> > > >> >
> > > >> >> I'm generally a -0.5 on this change, or at least doing so
> hastily.
> > > >> >>
> > > >> >> As with dropping Java 7 support, I think this should at least be
> > > >> >> announced in release notes that we're considering dropping
> support
> > in
> > > >> >> the subsequent release, as this dev list likely does not reach a
> > > >> >> substantial portion of the userbase.
> > > >> >>
> > > >> >> How much work is it to move from a Spark 1.x cluster to a Spark
> 2.x
> > > >> >> cluster? I get the feeling it's not nearly as transparent as
> > > upgrading
> > > >> >> Java versions. Can Spark 1.x pipelines be run on Spark 2.x
> > clusters,
> > > >> >> or is a new cluster (and/or upgrading all pipelines) required
> (e.g.
> > > >> >> for those who operate spark clusters shared among their many
> > users)?
> > > >> >>
> > > >> >> Looks like the latest release of Spark 1.x was about a year ago,
> > > >> >> overlapping a bit with the 2.x series which is coming up on 1.5
> > years
> > > >> >> old, so I could see a lot of people still using 1.x even if 2.x
> is
> > > >> >> clearly the future. But it sure doesn't seem very backwards
> > > >> >> 

Re: [DISCUSSION] using NexMark for Beam

2017-09-14 Thread Pei HE
Could any Googlers help to run NexMark on Dataflow streaming and share the
numbers with the community?
--
Pei

On Fri, Aug 25, 2017 at 11:28 PM, Lukasz Cwik 
wrote:

> Etienne, cut some JIRAs for improvements like ValidatesRunner for the
> Nexmark suite that you think are worthy. Some of them might be good
> 'starter' tasks as well.
>
> On Fri, Aug 25, 2017 at 1:43 AM, Etienne Chauchot 
> wrote:
>
> > Hi guys,
> >
> > There is also some points to discuss:
> >
> > - I think some of the tests in this test suite should be generalized as
> > validatesRunner tests like it was done for example for custom window
> > merging (https://github.com/apache/beam/blob/5181e619f17e1f69fabe8d5
> > bdfc7a3a6a2142cde/sdks/java/core/src/test/java/org/apache/
> > beam/sdk/transforms/windowing/WindowTest.java#L591)
> >
> > - We have run almost no tests on Dataflow, so if someone could run the
> > test suite on dataflow, he's very welcome. All needed information are
> still
> > in the README, but I'll move these info to the website.
> >
> > - other points?
> >
> > WDYT?
> >
> > Best,
> >
> > Etienne
> >
> >
> >
> > Le 24/08/2017 à 18:35, Lukasz Cwik a écrit :
> >
> >> Yeah, was looking forward to this.
> >>
> >> On Thu, Aug 24, 2017 at 9:20 AM, Tyler Akidau
>  >> >
> >> wrote:
> >>
> >> Awesome news, thank you! :-D
> >>>
> >>> On Thu, Aug 24, 2017 at 12:40 AM Etienne Chauchot  >
> >>> wrote:
> >>>
> >>> Hi all,
> 
>  I wanted to let you know that the Nexmark PR is merged into master.
> Feel
>  free to use it (e.g. performance testing, release testing ...).
> 
>  Etienne
> 
>  Le 12/05/2017 à 10:55, Etienne Chauchot a écrit :
> 
> > Hi guys,
> >
> > I wanted to let you know that I have just submitted a PR around
> > NexMark. This is a port of the NexMark queries to Beam, to be used as
> > integration tests.
> > This can also be used as A-B testing (no-regression or performance
> > comparison between 2 versions of the same engine or of the same
> runner)
> >
> > This a continuation of the previous PR (#99) from Mark Shields.
> > The code has changed quite a bit: some queries have changed to use
> new
> > Beam APIs and there where some big refactorings. More important, we
> > can now run all the queries in all the runners.
> >
> > Nevertheless, there are still some open issues in Nexmark
> > (https://github.com/iemejia/beam/issues) and in Beam upstream (see
> > issue links in https://issues.apache.org/jira/browse/BEAM-160)
> >
> > I wanted to submit the PR before our (Ismaël and I) NexMark talk at
> > the ApacheCon. The PR is not perfect but it is in a good shape to
> > share it.
> >
> > Best,
> >
> > Etienne
> >
> >
> >
> > Le 22/03/2017 à 04:51, Kenneth Knowles a écrit :
> >
> >> This is great! Having a variety of realistic-ish pipelines running
> on
> >> all
> >> runners complements the validation suite and IO IT work.
> >>
> >> If I recall, some of these involve heavy and esoteric uses of state,
> >>
> > so
> >>>
>  definitely give me a ping if you hit any trouble.
> >>
> >> Kenn
> >>
> >> On Tue, Mar 21, 2017 at 9:38 AM, Etienne Chauchot <
> >>
> > echauc...@gmail.com>
> >>>
>  wrote:
> >>
> >> Hi all,
> >>>
> >>> Ismael and I are working on upgrading the Nexmark implementation
> for
> >>> Beam.
> >>> See https://github.com/iemejia/beam/tree/BEAM-160-nexmark and
> >>> https://issues.apache.org/jira/browse/BEAM-160. We are continuing
> >>>
> >> the
> >>>
>  work done by Mark Shields. See https://github.com/apache/
> >>>
> >> beam/pull/366
> >>>
>  for the original PR.
> >>>
> >>> The PR contains queries that have a wide coverage of the Beam model
> >>>
> >> and
> >>>
>  that represent a realistic end user use case (some come from client
> >>> experience on Google Cloud Dataflow).
> >>>
> >>> So far, we have upgraded the implementation to the latest Beam
> >>> snapshot.
> >>> And we are able to execute a good subset of the queries in the
> >>> different
> >>> runners. We upgraded the nexmark drivers to do so: direct driver
> >>> (upgraded
> >>> from inProcessDriver) and flink driver and we added a new one for
> >>> spark.
> >>>
> >>> There is still a good amount of work to do and we would like to
> know
> >>>
> >> if
> >>>
>  you think that this contribution can have its place into Beam
> >>> eventually.
> >>>
> >>> The interests of having Nexmark on Beam that we have seen so far
> are:
> >>>
> >>> - Rich batch/streaming test
> >>>
> >>> - A-B testing of runners or runtimes (non-regression, performance
> >>> comparison between versions ...)
> >>>
> >>> - 

Re: Merge branch DSL_SQL to master

2017-09-07 Thread Pei HE
+1

On Thu, Sep 7, 2017 at 4:03 PM, tarush grover 
wrote:

> Thank you all, it was a great learning experience!
>
> Regards,
> Tarush
>
> On Thu, 7 Sep 2017 at 1:05 PM, Jean-Baptiste Onofré 
> wrote:
>
> > +1
> >
> > Great work guys !
> > Ready to help for the merge and maintain !
> >
> > Regards
> > JB
> >
> > On 09/07/2017 08:48 AM, Mingmin Xu wrote:
> > > Hi all,
> > >
> > > On behalf of the virtual Beam SQL team[1], I'd like to propose to merge
> > > DSL_SQL branch into master (PR #3782 [2]) and include it in release
> > version
> > > 2.2.0, which will give it more visibility to other contributors and
> > users.
> > > The SQL feature satisfies the following criteria outlined in
> contribution
> > > guide[3].
> > >
> > > 1. Have at least 2 contributors interested in maintaining it, and 1
> > > committer interested in supporting it
> > >
> > > * James and me will continue for new features and maintain it;
> > >
> > >Tyler, James and me will support it as committers;
> > >
> > > 2. Provide both end-user and developer-facing documentation
> > >
> > > * A web page[4] is added to describe the usage of SQL DSL and how it
> > works;
> > >
> > >
> > > 3. Have at least a basic level of unit test coverage
> > >
> > > * Totally 230 unit/integration tests, with code coverage 83.4%;
> > >
> > > 4. Run all existing applicable integration tests with other Beam
> > components
> > > and create additional tests as appropriate
> > >
> > > * Besides of integration tests in package
> > org.apache.beam.sdk.extensions.sql,
> > > there's another example in org.apache.beam.sdk.extensions.sql.example.
> > > BeamSqlExample.
> > >
> > > [1]. Special thanks to all contributors/reviewers:
> > >
> > >   Tyler Akidau
> > >
> > >   Davor Bonaci
> > >
> > >   Robert Bradshaw
> > >
> > >   Lukasz Cwik
> > >
> > >   Tarush Grover
> > >
> > >   Kai Jiang
> > >
> > >   Kenneth Knowles
> > >
> > >   Jingsong Lee
> > >
> > >   Ismaël Mejía
> > >
> > >   Jean-Baptiste Onofré
> > >
> > >   James Xu
> > >
> > >   Mingmin Xu
> > >
> > > [2]. https://github.com/apache/beam/pull/3782
> > >
> > > [3]. https://beam.apache.org/contribute/contribution-guide/
> > > #merging-into-master
> > >
> > > [4]. https://beam.apache.org/documentation/dsls/sql/
> > >
> > > Thanks!
> > > 
> > > Mingmin
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


MapReduce Runner needs contributors

2017-09-05 Thread Pei HE
Hi all,
I am in the process of merge MapReduce Runner to its feature branch,
https://github.com/apache/beam/pull/3705

I would like to call for contributors' help for making it more mature.
Here are areas that need help:
1. Feature completion
Currently, there are few ValidatesRunners tests excluded, such as
gauge/distribution metrics, stateful/splittable pardo, user timers.
2. Performance improvement
For examples, https://issues.apache.org/jira/browse/BEAM-2835
3. Production readiness
Try run it in Hadoop cluster

Thanks and looking forward to the collaboration on MapReduce Runner.
--
Pei


Re: Beam spark 2.x runner status

2017-08-21 Thread Pei HE
Any updates for upgrading to spark 2.x?

I tried to replace the dependency and found a compile error from
implementing a scala trait:
org.apache.beam.runners.spark.io.SourceRDD.SourcePartition is not abstract
and does not override abstract method
org$apache$spark$Partition$$super$equals(java.lang.Object) in
org.apache.spark.Partition

(The spark side change was introduced in
https://github.com/apache/spark/pull/12157.)

Does anyone have ideas about this compile error?


On Wed, May 3, 2017 at 1:32 PM, Jean-Baptiste Onofré 
wrote:

> Hi Ted,
>
> My branch used Spark 2.1.0 and I just updated to 2.1.1.
>
> As discussed with Aviem, I should be able to create the pull request later
> today.
>
> Regards
> JB
>
>
> On 05/03/2017 02:50 AM, Ted Yu wrote:
>
>> Spark 2.1.1 has been released.
>>
>> Consider using the new release in this work.
>>
>> Thanks
>>
>> On Wed, Mar 29, 2017 at 5:43 AM, Jean-Baptiste Onofré 
>> wrote:
>>
>> Cool for the PR merge, I will rebase my branch on it.
>>>
>>> Thanks !
>>> Regards
>>> JB
>>>
>>>
>>> On 03/29/2017 01:58 PM, Amit Sela wrote:
>>>
>>> @Ted definitely makes sense.
 @JB I'm merging https://github.com/apache/beam/pull/2354 soon so any
 deprecated Spark API issues should be resolved.

 On Wed, Mar 29, 2017 at 2:46 PM Ted Yu  wrote:

 This is what I did over HBASE-16179:

>
> -f.call((asJavaIterator(it), conn)).iterator()
> +// the return type is different in spark 1.x & 2.x, we handle
> both
> cases
> +f.call(asJavaIterator(it), conn) match {
> +  // spark 1.x
> +  case iterable: Iterable[R] => iterable.iterator()
> +  // spark 2.x
> +  case iterator: Iterator[R] => iterator
> +}
>)
>
> FYI
>
> On Wed, Mar 29, 2017 at 1:47 AM, Amit Sela 
> wrote:
>
> Just tried to replace dependencies and see what happens:
>
>>
>> Most required changes are about the runner using deprecated Spark
>> APIs,
>>
>> and
>
> after fixing them the only real issue is with the Java API for
>> Pair/FlatMapFunction that changed return value to Iterator (in 1.6 its
>> Iterable).
>>
>> So I'm not sure that a profile that simply sets dependency on
>> 1.6.3/2.1.0
>> is feasible.
>>
>> On Thu, Mar 23, 2017 at 10:22 AM Kobi Salant 
>> wrote:
>>
>> So, if everything is in place in Spark 2.X and we use provided
>>
>>>
>>> dependencies
>>
>> for Spark in Beam.
>>> Theoretically, you can run the same code in 2.X without any need for
>>> a
>>> branch?
>>>
>>> 2017-03-23 9:47 GMT+02:00 Amit Sela :
>>>
>>> If StreamingContext is valid and we don't have to use SparkSession,
>>>

 and
>>>
>>
> Accumulators are valid as well and we don't need AccumulatorsV2, I
>>
>>>
 don't
>>>
>>
>> see a reason this shouldn't work (which means there are still tons of
>>>
 reasons this could break, but I can't think of them off the top of
 my

 head
>>>
>>> right now).

 @JB simply add a profile for the Spark dependencies and run the

 tests -
>>>
>>
> you'll have a very definitive answer ;-) .
>>
>>> If this passes, try on a cluster running Spark 2 as well.

 Let me know of I can assist.

 On Thu, Mar 23, 2017 at 6:55 AM Jean-Baptiste Onofré <

 j...@nanthrax.net>
>>>
>>
> wrote:
>>
>>>
 Hi guys,

>
> Ismaël summarize well what I have in mind.
>
> I'm a bit late on the PoC around that (I started a branch already).
> I will move forward over the week end.
>
> Regards
> JB
>
> On 03/22/2017 11:42 PM, Ismaël Mejía wrote:
>
> Amit, I suppose JB is talking about the RDD based version, so no
>>
>> need
>

>> to worry about SparkSession or different incompatible APIs.
>>>

>> Remember the idea we are discussing is to have in master both the
>> spark 1 and spark 2 runners using the RDD based translation. At
>>
>> the
>

> same time we can have a feature branch to evolve the DataSet
>>
>>>
>> based
>

> translator (this one will replace the RDD based translator for
>>
>>>
>> spark
>

>> 2
>>>
>>> once it is mature).

>
>> The advantages have been already discussed as well as the
>>
>> possible
>

> issues so I think we have to see now if JB's idea is feasible 

Re: [ANNOUNCEMENT] New PMC members, August 2017 edition!

2017-08-12 Thread Pei HE
Congratulations!
--
Pei

On Sat, Aug 12, 2017 at 3:10 PM, Aljoscha Krettek 
wrote:

> Congratulations! :-)
>
> Best,
> Aljoscha
>
> > On 12. Aug 2017, at 06:39, Robert Bradshaw 
> wrote:
> >
> > Congratulations!
> >
> > On Fri, Aug 11, 2017 at 2:23 PM, Jean-Baptiste Onofré 
> wrote:
> >> Congrats !
> >>
> >> Regards
> >> JB
> >>
> >>
> >> On 08/11/2017 07:40 PM, Davor Bonaci wrote:
> >>>
> >>> Please join me and the rest of Beam PMC in welcoming the following
> >>> committers as our newest PMC members. They have significantly
> contributed
> >>> to the project in different ways, and we look forward to many more
> >>> contributions in the future.
> >>>
> >>> * Ahmet Altay
> >>> Beyond significant work to drive the Python SDK to the master branch,
> >>> Ahmet
> >>> has worked project-wide, driving releases, improving processes and
> >>> testing,
> >>> and growing the community.
> >>>
> >>> * Aviem Zur
> >>> Beyond significant work in the Spark runner, Aviem has worked to
> improve
> >>> how the project operates, leading discussions on inclusiveness and
> >>> openness.
> >>>
> >>> Congratulations to both! Welcome!
> >>>
> >>> Davor
> >>>
> >>
> >> --
> >> Jean-Baptiste Onofré
> >> jbono...@apache.org
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
>
>


Re: [ANNOUNCEMENT] New committers, August 2017 edition!

2017-08-12 Thread Pei HE
Congratulations to all!
--
Pei

On Sat, Aug 12, 2017 at 10:50 AM, James  wrote:

> Thank you guys, glad to contribute to this great project, congratulate to
> all the new committers!
>
> On Sat, Aug 12, 2017 at 8:36 AM Manu Zhang 
> wrote:
>
> > Thanks everyone !!! It's a great journey.
> > Congrats to other new committers !
> >
> > Thanks,
> > Manu
> >
> > On Sat, Aug 12, 2017 at 5:23 AM Jean-Baptiste Onofré 
> > wrote:
> >
> > > Congrats and welcome !
> > >
> > > Regards
> > > JB
> > >
> > > On 08/11/2017 07:40 PM, Davor Bonaci wrote:
> > > > Please join me and the rest of Beam PMC in welcoming the following
> > > > contributors as our newest committers. They have significantly
> > > contributed
> > > > to the project in different ways, and we look forward to many more
> > > > contributions in the future.
> > > >
> > > > * Reuven Lax
> > > > Reuven has been with the project since the very beginning,
> contributing
> > > > mostly to the core SDK and the GCP IO connectors. He accumulated 52
> > > commits
> > > > (19,824 ++ / 12,039 --). Most recently, Reuven re-wrote several IO
> > > > connectors that significantly expanded their functionality.
> > Additionally,
> > > > Reuven authored important new design documents relating to update and
> > > > snapshot functionality.
> > > >
> > > > * Jingsong Lee
> > > > Jingsong has been contributing to Apache Beam since the beginning of
> > the
> > > > year, particularly to the Flink runner. He has accumulated 34 commits
> > > > (11,214 ++ / 6,314 --) of deep, fundamental changes that
> significantly
> > > > improved the quality of the runner. Additionally, Jingsong has
> > > contributed
> > > > to the project in other ways too -- reviewing contributions, and
> > > > participating in discussions on the mailing list, design documents,
> and
> > > > JIRA issue tracker.
> > > >
> > > > * Mingmin Xu
> > > > Mingmin started the SQL DSL effort, and has driven it to the point of
> > > > merging to the master branch. In this effort, he extended the project
> > to
> > > > the significant new user community.
> > > >
> > > > * Mingming (James) Xu
> > > > James joined the SQL DSL effort, contributing some of the trickier
> > parts,
> > > > such as the Join functionality. Additionally, he's consistently shown
> > > > himself to be an insightful code reviewer, significantly impacting
> the
> > > > project’s code quality and ensuring the success of the new major
> > > component.
> > > >
> > > > * Manu Zhang
> > > > Manu initiated and developed a runner for the Apache Gearpump
> > > (incubating)
> > > > engine, and has driven it to the point of merging to the master
> branch.
> > > In
> > > > this effort, he accumulated 65 commits (7,812 ++ / 4,882 --) and
> > extended
> > > > the project to the new user community.
> > > >
> > > > Congratulations to all five! Welcome!
> > > >
> > > > Davor
> > > >
> > >
> > > --
> > > Jean-Baptiste Onofré
> > > jbono...@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
> > >
> >
>


[DISCUSS] Beam pipeline logical and physical DAGs visualization.

2017-08-03 Thread Pei HE
Hi all,
While working on JStorm and MapReduce runners, I found that it is very
helpful to understand Beam pipelines by visualizing them.

Logical graph:
https://drive.google.com/file/d/0B6iZ7iRh-LOYc0dUS0Rwb2tvWGM/view?usp=sharing

Physical graph:
https://drive.google.com/file/d/0B6iZ7iRh-LOYbDFWeDlCcDhnQmc/view?usp=sharing

I think we can visualize Beam logical DAG in runner-core. It should also be
easy to visualize the physical DAG in each runners. (Maybe we can define
some shared data structures to make it more automatic, and even support
visualizing them in Apex/Flink/Spark/Gearpump UIs).

I have a commit for MapReduce runner in here (<200 lines). And, this commit
generates dotfiles for logical and physical DAGs.

https://github.com/peihe/incubator-beam/commit/bb3349e10c0cfacd81b610880ddfec030fedf34d

Looking forward to ideas and feedbacks.
--
Pei


Re: [DISCUSS] Beam MapReduce Runner One-Pager

2017-08-01 Thread Pei HE
I prototyped a simple MapReduce runner that can executes
Read.Bounded+ParDo+Combine:
https://github.com/peihe/incubator-beam/tree/mr-runner

I am still working on View support, and will give another update once I can
get WordCount runs.

On Sat, Jul 15, 2017 at 4:45 AM, Vikas RK <vikky...@gmail.com> wrote:

> Thanks Pei, left a few comments, but this looks exciting!
>
> -Vikas
>
> On 12 July 2017 at 21:52, Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
>
> > Hi,
> >
> > I will push my branch with the current state of the mapreduce runner.
> >
> > Regards
> > JB
> >
> >
> > On 07/13/2017 04:47 AM, Pei HE wrote:
> >
> >> Thanks guys!
> >>
> >> I replied Kenn's comments, and looking forward to more feedbacks and
> >> suggestions.
> >>
> >> Also, could we add a mapreduce-runner branch?
> >>
> >> Thanks
> >> --
> >> Pei
> >>
> >>
> >> On Sat, Jul 8, 2017 at 12:42 AM, Kenneth Knowles <k...@google.com.invalid
> >
> >> wrote:
> >>
> >> Very cool to see this. Commenting a little on the doc.
> >>>
> >>> On Fri, Jul 7, 2017 at 8:41 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
> >>> wrote:
> >>>
> >>> Hi Pei,
> >>>>
> >>>> I also pumped some ideas and part of code from Crunch for the
> MapReduce
> >>>> runner.
> >>>>
> >>>> I will push my changes on my github branch and share with you.
> >>>>
> >>>> Let me take a look on your doc.
> >>>>
> >>>> Regards
> >>>> JB
> >>>>
> >>>>
> >>>> On 07/07/2017 03:11 PM, Pei HE wrote:
> >>>>
> >>>> Hi all,
> >>>>> While JB is working on MapReduce Runner BEAM-165
> >>>>> <https://issues.apache.org/jira/browse/BEAM-165>, I have spent time
> >>>>> reading
> >>>>> Apache Crunch code and drafted Beam MapReduce Runner One-Pager
> >>>>> <https://docs.google.com/document/d/10jJ8pBTZ10rNr_IO5YnggmZ
> >>>>> ZG1MU-F47sWg8N6xkBM0/edit#heading=h.bewnehqnt4zd>
> >>>>> (mostly
> >>>>> around ParDo/Flatten fusion support, and with many missing details).
> >>>>>
> >>>>> I would like to start the discussion, and get people's attention of
> >>>>> supporting MapReduce in Beam.
> >>>>>
> >>>>> Feel free to make comments and suggestions on that doc.
> >>>>>
> >>>>> Thanks
> >>>>> --
> >>>>> Pei
> >>>>>
> >>>>>
> >>>>> --
> >>>> Jean-Baptiste Onofré
> >>>> jbono...@apache.org
> >>>> http://blog.nanthrax.net
> >>>> Talend - http://www.talend.com
> >>>>
> >>>>
> >>>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: [DISCUSS] Beam MapReduce Runner One-Pager

2017-07-12 Thread Pei HE
Thanks guys!

I replied Kenn's comments, and looking forward to more feedbacks and
suggestions.

Also, could we add a mapreduce-runner branch?

Thanks
--
Pei


On Sat, Jul 8, 2017 at 12:42 AM, Kenneth Knowles <k...@google.com.invalid>
wrote:

> Very cool to see this. Commenting a little on the doc.
>
> On Fri, Jul 7, 2017 at 8:41 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
> > Hi Pei,
> >
> > I also pumped some ideas and part of code from Crunch for the MapReduce
> > runner.
> >
> > I will push my changes on my github branch and share with you.
> >
> > Let me take a look on your doc.
> >
> > Regards
> > JB
> >
> >
> > On 07/07/2017 03:11 PM, Pei HE wrote:
> >
> >> Hi all,
> >> While JB is working on MapReduce Runner BEAM-165
> >> <https://issues.apache.org/jira/browse/BEAM-165>, I have spent time
> >> reading
> >> Apache Crunch code and drafted Beam MapReduce Runner One-Pager
> >> <https://docs.google.com/document/d/10jJ8pBTZ10rNr_IO5YnggmZ
> >> ZG1MU-F47sWg8N6xkBM0/edit#heading=h.bewnehqnt4zd>
> >> (mostly
> >> around ParDo/Flatten fusion support, and with many missing details).
> >>
> >> I would like to start the discussion, and get people's attention of
> >> supporting MapReduce in Beam.
> >>
> >> Feel free to make comments and suggestions on that doc.
> >>
> >> Thanks
> >> --
> >> Pei
> >>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


[DISCUSS] Beam MapReduce Runner One-Pager

2017-07-07 Thread Pei HE
Hi all,
While JB is working on MapReduce Runner BEAM-165
, I have spent time reading
Apache Crunch code and drafted Beam MapReduce Runner One-Pager

(mostly
around ParDo/Flatten fusion support, and with many missing details).

I would like to start the discussion, and get people's attention of
supporting MapReduce in Beam.

Feel free to make comments and suggestions on that doc.

Thanks
--
Pei


[Proposal] Fine-grained Resource Configuration in Beam

2017-06-28 Thread Pei HE
Hi guys,
We are using Blink runner (fork of Flink) and an internal MapReduce-ish
runner.

We want to configure resources (parallelism, CPU, Memory, e.t.c) in an
unified way cross runners.

Haozhi and I have drafted a proposal, and would like to have feedback from
the community.

https://docs.google.com/document/d/1N0y64dbzmukLLEy6M9CygdI_H88pIS3NtcOAkL5-oVw/edit#

Thanks
--
Pei


Re: First stable release completed!

2017-05-17 Thread Pei HE
Congratulations everyone!

--
Pei


On Wed, May 17, 2017 at 9:00 PM, Ted Yu  wrote:

> This is great news.
>
> Congratulations.
>
> On Wed, May 17, 2017 at 5:22 AM, James  wrote:
>
> > Congratulations to all!
> > On Wed, 17 May 2017 at 8:03 PM Prabeesh K.  wrote:
> >
> > > awesome. 
> > >
> > > On 17 May 2017 at 15:28, Davor Bonaci  wrote:
> > >
> > > > The first stable release is now complete!
> > > >
> > > > Release artifacts are available through various repositories,
> including
> > > > dist.apache.org, Maven Central, and PyPI. The website is updated,
> and
> > > > announcements are published.
> > > >
> > > > Apache Software Foundation press release:
> > > > http://globenewswire.com/news-release/2017/05/17/986839/0/
> > > > en/The-Apache-Software-Foundation-Announces-Apache-Beam-v2-0-0.html
> > > >
> > > > Beam blog:
> > > > https://beam.apache.org/blog/2017/05/17/beam-first-stable-
> release.html
> > > >
> > > > Congratulations to everyone -- this is a really big milestone for the
> > > > project, and I'm proud to be a part of this great community.
> > > >
> > > > Davor
> > > >
> > >
> >
>


Re: First stable release: version designation?

2017-05-08 Thread Pei HE
I vote for 2.0.

On Sun, May 7, 2017 at 7:50 PM, Prabeesh K.  wrote:

> I also vote for 2.0.
>
> On 5 May 2017 at 21:33, Hadar Hod  wrote:
>
> > I also vote for 2.0, for the same reasons Dan stated.
> > As Cham mentioned, we can clarify any confusion in the documentation.
> >
> > On Fri, May 5, 2017 at 9:50 AM, Ahmet Altay 
> > wrote:
> >
> > > I would also like to vote for strong 2.0 with the same reasons as Dan
> > > mentioned. It will be less confusing for the users overall.
> > >
> > > Ahmet
> > >
> > > On Fri, May 5, 2017 at 9:33 AM, Davor Bonaci  wrote:
> > >
> > > > Strongly for 2.0.0:
> > > > * Aljoscha
> > > > * Cham
> > > > * Dan
> > > > * Luke
> > > >
> > > > Slight preference toward 2.0.0, but fine with 1.0.0:
> > > > * Davor
> > > > * Ismael
> > > > * Kenn
> > > >
> > > > Strongly for 1.0.0: none.
> > > >
> > > > Slight preference toward 1.0.0, but fine with 2.0.0:
> > > > * Amit
> > > > * Jesse
> > > > * JB
> > > > * Manu
> > > > * Mingmin
> > > > * Ted
> > > > * Thomas W.
> > > >
> > > > Unbelievably, the tally is 7 : 7. However, the 2.0 camp tends to feel
> > > more
> > > > strongly, and we have nobody who feels strongly for 1.0. Thus, it
> seems
> > > > going with 2.0.0 is the path of least resistance.
> > > >
> > > > With that, I'll start building the 2.0.0 RCs, and we'll formally
> > > > ratify/reject this decision in an RC vote.
> > > >
> > > > On Thu, May 4, 2017 at 6:30 PM, María García Herrero <
> > > > mari...@google.com.invalid> wrote:
> > > >
> > > > > The bigger letters aimed to indicate "strongly in favor of" as
> > opposed
> > > to
> > > > > "weakly in favor of." I'm OK with not using a doc, just responding
> to
> > > > Ted's
> > > > > question.
> > > > >
> > > > > On Thu, May 4, 2017 at 3:39 PM, Ted Yu 
> wrote:
> > > > >
> > > > > > What's the difference between first and second, third and fourth
> > > > columns
> > > > > ?
> > > > > >
> > > > > > On Thu, May 4, 2017 at 3:36 PM, María García Herrero <
> > > > > > mari...@google.com.invalid> wrote:
> > > > > >
> > > > > > > Thanks for the suggestion, Ted. Get your vote in here
> > > > > > >  > 1ABx3U8ojcfUkFig3hG53lOYl73tdk
> > > > > > > Wqz5B6eQ40TEgk/edit?usp=sharing>
> > > > > > > .
> > > > > > > I have already added all the votes that Davor compiled 3 hours
> > ago
> > > > and
> > > > > > the
> > > > > > > responses afterwards.
> > > > > > >
> > > > > > > On Thu, May 4, 2017 at 12:49 PM, Ted Yu 
> > > wrote:
> > > > > > >
> > > > > > > > Maybe create a google doc with columns as the camps.
> > > > > > > >
> > > > > > > > Each person can put his/her name under the camp in his/her
> > favor.
> > > > > > > >
> > > > > > > > On Thu, May 4, 2017 at 12:32 PM, Thomas Weise <
> t...@apache.org>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > I'm in the relaxed 1.0.0 camp.
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > sent from mobile
> > > > > > > > > On May 4, 2017 12:29 PM, "Mingmin Xu" 
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > I slightly prefer1.0.0 for the *first* stable release,
> but
> > > fine
> > > > > > with
> > > > > > > > > 2.0.0.
> > > > > > > > > >
> > > > > > > > > > On Thu, May 4, 2017 at 12:25 PM, Lukasz Cwik
> > > > > > >  > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Put me under Strongly for 2.0.0
> > > > > > > > > > >
> > > > > > > > > > > On Thu, May 4, 2017 at 12:24 PM, Kenneth Knowles
> > > > > > > > >  > > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > I'll join Davor's group.
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, May 4, 2017 at 12:07 PM, Davor Bonaci <
> > > > > > da...@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > I don't think we have reached a consensus here yet.
> > > Let's
> > > > > > > > > re-examine
> > > > > > > > > > > this
> > > > > > > > > > > > > after some time has passed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > If I understand everyone's opinion correctly, this
> is
> > > the
> > > > > > > > summary:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Strongly for 2.0.0:
> > > > > > > > > > > > > * Aljoscha
> > > > > > > > > > > > > * Dan
> > > > > > > > > > > > >
> > > > > > > > > > > > > Slight preference toward 2.0.0, but fine with
> 1.0.0:
> > > > > > > > > > > > > * Davor
> > > > > > > > > > > > >
> > > > > > > > > > > > > Strongly for 1.0.0: none.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Slight preference toward 1.0.0, but fine with
> 2.0.0:
> > > > > > > > > > > > > * Amit
> > > > > > > > > > > > > * Jesse
> > > > > > > > > > > > > * JB
> > > > > > > > > > > > > * Ted
> > > > > > > > > > > > >
> > > > > > > > > > > > > Any additional opinions?
> > > > > 

Re: [PROPOSAL] Remove KeyedCombineFn

2017-04-22 Thread Pei HE
+1

On Sat, Apr 22, 2017 at 12:16 PM, Jean-Baptiste Onofré 
wrote:

> +1
>
> Regards
> JB
>
>
> On 04/21/2017 07:24 PM, Kenneth Knowles wrote:
>
>> Hi all,
>>
>> I propose that we remove KeyedCombineFn before the first stable release.
>>
>> I don't think it adds enough value for the complexity it adds to e.g.
>> CombineWithContext [1] and state [2, 3], and it doesn't seem to me that
>> users really use it when we might expect. I am happy to be demonstrated
>> wrong.
>>
>> It is very likely that you have never written [4, 5] or thought about
>> KeyedCombineFn. So for context, here are excepts from signatures just to
>> show the difference from CombineFn:
>>
>> CombineFn {
>>   AccumT createAccumulator();
>>   AccumT addInput(AccumT accum, InputT input);
>>   AccumT mergeAccumulators(Iterable accums);
>>   OutputT extractOutput(AccumT accum);
>> }
>>
>> KeyedCombineFn {
>>   AccumT createAccumulator(K key);
>>   AccumT addInput(K key, AccumT accum, InputT input);
>>   AccumT mergeAccumulators(K key, Iterable accums);
>>   OutputT extractOutput(K key, AccumT accum);
>> }
>>
>> So what are the particular reasons for this, versus a CombineFn that has
>> KVs as its input and accumulator types?
>>
>>  - There are some performance improvements potentially from not passing
>> keys around, based on the assumption they are always available.
>>
>>  - There is also a spec difference because it only has to be associative
>> and commutative per key, cannot be applied in a global combine, and
>> addInput is automatically key preserving.
>>
>> But in fact, in all of my code crawling the class is almost never used
>> (even over the course of its history at Google) and even the few uses I
>> found were often mistakes where the key is totally ignored, probably
>> because a user thinks "I am doing a keyed combine so I need a keyed
>> combine
>> function". So the number of users actually affected is about zero.
>>
>> I would be curious if anyone has a compelling case for keeping
>> KeyedCombineFn.
>>
>> Kenn
>>
>> [1]
>> https://github.com/yafengguo/Apache-beam/blob/master/sdks/ja
>> va/core/src/main/java/org/apache/beam/sdk/transforms/Combine
>> WithContext.java
>> [2] https://issues.apache.org/jira/browse/BEAM-1336
>> [3] https://github.com/apache/beam/pull/2627
>> [4]
>> https://github.com/search?l=Java=KeyedCombineFn=advsea
>> rch=Code=%E2%9C%93
>> [5] https://www.google.com/search?q=KeyedCombineFn
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Apache Storm/JStorm Runner(s) for Apache Beam

2017-04-11 Thread Pei HE
Hi Taylor,
I am very glad to see the interests in pushing forward Beam Storm runner.

However, I cannot convince myself the benefits of having one runner to
support all.

Beam have three types of users: pipeline writers, library writers, and
runner implementers.

I can see pros vs cons as followings:
Pros:
1. For pipeline writers and library writers, I don't see any benefits
because they are using Beam API directly.
2. For runner implementers: (I am not that familiar with the current
similarities and differences of Storm and JStorm, maybe you can help me to
fill it in.)

Cons:
For pipeline writers and library writers:
1. It means delay of the delivery. We already have a working prototype, and
there are lots of JStorm users eagerly want a JStorm API.
2. "One runner to support all" may increase the complexity, and
compromise the quality of the runner.

>From my point of view, cons are clearly over pros unless I am missing
something.

Let's me know what you think.
Thanks
--
Pei


On Tue, Apr 11, 2017 at 1:47 AM, P. Taylor Goetz  wrote:

> Note: cross-posting to dev@beam and dev@storm
>
> I’ve seen at least two threads on the dev@ list discussing the JStorm
> runner and my hope is we can expand on that discussion and cross-pollinate
> with the Storm/JStorm/Beam communities as well.
>
> A while back I created a very preliminary proof of concept of getting a
> Storm Beam runner working [1]. That was mainly an exercise for me to
> familiarize myself with the Beam API and discover what it would take to
> develop a Beam runner on top of Storm. That code is way out of date (I was
> targeting Beam’s HEAD before the 0.2.0 release, and a lot of changes have
> since taken place) and didn’t really work as Jian Liu pointed out. It was a
> start, that perhaps could be further built upon, or parts harvested, etc. I
> don’t have any particular attachment to that code and wouldn’t be upset if
> it were completely discarded in favor of a better or more extensible
> implementation.
>
> What I would like to see, and I think this is a great opportunity to do
> so, is a closer collaboration between the Apache Storm and JStorm
> communities. For those who aren’t familiar with those projects’
> relationship, I’ll start with a little history…
>
> JStorm began at Alibaba as a fork of Storm (pre-Apache?) with Storm’s
> Clojure code reimplemented in Java. The rationale behind that move was that
> Alibaba had a large number of Java developers but very few who were
> proficient with Clojure. Moving to pure Java made sense as it would expand
> the base of potential contributors.
>
> In late 2015 Alibaba donated the JStorm codebase to the Apache Storm
> project, and the Apache Storm PMC committed to converting its Clojure code
> to Java in order to incorporate the code donation. At the time there was
> one catch — Apache Storm had implemented comprehensive security features
> such as Kerberos authentication/authorization and multi-tenancy in its
> Clojure code, which greatly complicated the move to Java and incorporation
> of the JStorm code. JStorm did not have the same security features. A
> number of JStorm developers have also become Storm PMC members.
>
> Fast forward to today. The Storm community has completed the bulk of the
> move to Java and the next major release (presumably 2.0, which is currently
> under discussion) will be largely Java-based. We are now in a much better
> position to begin incorporating JStorm’s features, as well as implementing
> new features necessary to support the Beam API (such as support for bounded
> pipelines, among other features).
>
> Having separate Apache Storm and JStorm beam runner implementations
> doesn’t feel appropriate in my personal opinion, especially since both
> projects have expressed an ongoing commitment to bringing JStorm’s
> additional features, and just as important, community, to Apache Storm.
>
> One final note, when the Storm community initially discussed developing a
> Beam runner, the general consensus was do so within the Storm repository.
> My current thinking is that such an effort should take place within the
> Beam community, not only since that is the development pattern followed by
> other runner implementations (Flink, Apex, etc.), but also because it would
> serve to increase collaboration between Apache projects (always a good
> thing!).
>
> I would love to hear opinions from others in the Storm/JStorm/Beam
> communities.
>
> -Taylor


Re: JStorm runner

2017-04-06 Thread Pei HE
Hi Kenn,
Yes, I have a PR, but I am not able to see the jstorm-runner branch yet.

https://github.com/apache/beam/compare/jstorm-runner...peihe:jstorm-runner-pr-1

Vikas,
I am glad to hear that you are interested to help. :)

--
Pei


On Fri, Apr 7, 2017 at 10:45 AM, Vikas RK <vikky...@gmail.com> wrote:

> I am interested in getting involved in this effort. I have experience
> working with Storm API on Heron, so could help with design/code reviews
> etc.
>
> -Vikas.
>
> On 6 April 2017 at 19:10, Kenneth Knowles <k...@google.com.invalid> wrote:
>
> > Pei,
> >
> > Are you ready to get started on the first PR? I have created the branch
> > jstorm-runner for you to issue a PR against.
> >
> > Kenn
> >
> >
> > On Thu, Apr 6, 2017 at 1:39 AM, Pei HE <p...@apache.org> wrote:
> >
> > > Hi all,
> > > I have created jira https://issues.apache.org/jira/browse/BEAM-1899.
> > >
> > > LIU Jian (basti...@alibaba-inc.com) and I are starting draft PRs.
> > >
> > > To get started, we need help to:
> > > 1. add runner-jstorm JIRA component
> > > 2. add jstorm-runner feature branch in github
> > > (I don't have access for creating them.)
> > >
> > > Thanks
> > > --
> > > Pei
> > >
> >
>


JStorm runner

2017-04-06 Thread Pei HE
Hi all,
I have created jira https://issues.apache.org/jira/browse/BEAM-1899.

LIU Jian (basti...@alibaba-inc.com) and I are starting draft PRs.

To get started, we need help to:
1. add runner-jstorm JIRA component
2. add jstorm-runner feature branch in github
(I don't have access for creating them.)

Thanks
--
Pei


Re: [ANNOUNCEMENT] New committers, January 2017 edition!

2017-01-27 Thread Pei He
Thanks all! And, I am very exited to join and look forward.

On Fri, Jan 27, 2017 at 9:40 AM, Thomas Groh <tg...@google.com.invalid>
wrote:

> Congratulations all!
>
> On Fri, Jan 27, 2017 at 9:34 AM, Chamikara Jayalath <chamik...@apache.org>
> wrote:
>
> > Congrats all !! :)
> >
> > - Cham
> >
> > On Fri, Jan 27, 2017 at 4:13 AM Stas Levin <stasle...@gmail.com> wrote:
> >
> > > Thanks all, glad to be joining!
> > >
> > > On Fri, Jan 27, 2017, 13:07 Aljoscha Krettek <aljos...@apache.org>
> > wrote:
> > >
> > > > Welcome aboard! :-)
> > > >
> > > > On Fri, 27 Jan 2017 at 11:27 Ismaël Mejía <ieme...@gmail.com> wrote:
> > > >
> > > > > Congratulations, well deserved guys !
> > > > >
> > > > >
> > > > > On Fri, Jan 27, 2017 at 9:28 AM, Amit Sela <amitsel...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Welcome and congratulations to all!
> > > > > >
> > > > > > On Fri, Jan 27, 2017, 10:12 Ahmet Altay <al...@google.com.invalid
> >
> > > > > wrote:
> > > > > >
> > > > > > > Thank you all! And congratulations to other new committers.
> > > > > > >
> > > > > > > Ahmet
> > > > > > >
> > > > > > > On Thu, Jan 26, 2017 at 9:45 PM, Kobi Salant <
> > > kobi.sal...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Congrats! Well deserved Stas
> > > > > > > >
> > > > > > > > בתאריך 27 בינו' 2017 7:26,‏ "Frances Perry" <
> > fran...@apache.org>
> > > > > כתב:
> > > > > > > >
> > > > > > > > > Woohoo! Congrats ;-)
> > > > > > > > >
> > > > > > > > > On Thu, Jan 26, 2017 at 9:05 PM, Jean-Baptiste Onofré <
> > > > > > j...@nanthrax.net
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Welcome aboard !⁣
> > > > > > > > > >
> > > > > > > > > > Regards
> > > > > > > > > > JB
> > > > > > > > > >
> > > > > > > > > > On Jan 27, 2017, 01:27, at 01:27, Davor Bonaci <
> > > > da...@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >Please join me and the rest of Beam PMC in welcoming the
> > > > > following
> > > > > > > > > > >contributors as our newest committers. They have
> > > significantly
> > > > > > > > > > >contributed
> > > > > > > > > > >to the project in different ways, and we look forward to
> > > many
> > > > > more
> > > > > > > > > > >contributions in the future.
> > > > > > > > > > >
> > > > > > > > > > >* Stas Levin
> > > > > > > > > > >Stas has contributed across the breadth of the project,
> > from
> > > > the
> > > > > > > Spark
> > > > > > > > > > >runner to the core pieces and Java SDK. Looking at code
> > > > > > > contributions
> > > > > > > > > > >alone, he authored 43 commits and reported 25 issues.
> Stas
> > > is
> > > > > very
> > > > > > > > > > >active
> > > > > > > > > > >on the mailing lists too, contributing to good
> discussions
> > > and
> > > > > > > > > > >proposing
> > > > > > > > > > >improvements to the Beam model.
> > > > > > > > > > >
> > > > > > > > > > >* Ahmet Altay
> > > > > > > > > > >Ahmet is a major contributor to the Python SDK, both in
> > > terms
> > > > of
> > > > > > > > design
> > > > > > > > > > >and
> > > > > > > > > > >code contribution. Looking at code contributions alone,
> he
> > > > > > authored
> > > > > > > 98
> > > > > > > > > > >commits and reviewed dozens of pull requests. With
> Python
> > > > SDK’s
> > > > > > > > > > >imminent
> > > > > > > > > > >merge to the master branch, Ahmet contributed towards
> > > > > > establishing a
> > > > > > > > > > >new
> > > > > > > > > > >major component in Beam.
> > > > > > > > > > >
> > > > > > > > > > >* Pei He
> > > > > > > > > > >Pei has been contributing to Beam since its inception,
> > > > > > accumulating
> > > > > > > a
> > > > > > > > > > >total
> > > > > > > > > > >of 118 commits since February. He has made several major
> > > > > > > > contributions,
> > > > > > > > > > >most recently by redesigning IOChannelFactory /
> FileSystem
> > > > APIs
> > > > > > (in
> > > > > > > > > > >progress), which would extend Beam’s portability to many
> > > > > > additional
> > > > > > > > > > >file
> > > > > > > > > > >systems and cloud providers.
> > > > > > > > > > >
> > > > > > > > > > >Congratulations to all three! Welcome!
> > > > > > > > > > >
> > > > > > > > > > >Davor
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>