Re: [DISCUSS] Graduation to a top-level project

2016-11-24 Thread Maximilian Michels
+1

I see a healthy project which deserves to graduate.

On Wed, Nov 23, 2016 at 6:03 PM, Davor Bonaci  wrote:
> Thanks everyone for the enthusiastic support!
>
> Please keep the thread going, as we kick off the process on private@.
> Please don’t forget to bring up any data points that might help strengthen
> our case.
>
> Thanks!
>
> On Wed, Nov 23, 2016 at 8:45 AM, Scott Wegner 
> wrote:
>
>> +1 (beaming)
>>
>> On Wed, Nov 23, 2016 at 8:25 AM Robert Bradshaw
>> 
>> wrote:
>>
>> +1
>>
>> On Wed, Nov 23, 2016 at 7:36 AM, Lukasz Cwik 
>> wrote:
>> > +1
>> >
>> > On Wed, Nov 23, 2016 at 9:48 AM, Stephan Ewen  wrote:
>> >
>> >> +1
>> >> The community if doing very well and behaving very Apache
>> >>
>> >> On Wed, Nov 23, 2016 at 9:54 AM, Etienne Chauchot 
>> >> wrote:
>> >>
>> >> > A big +1 of course, very excited to go forward
>> >> >
>> >> > Etienne
>> >> >
>> >> >
>> >> >
>> >> > Le 22/11/2016 à 19:19, Davor Bonaci a écrit :
>> >> >
>> >> >> Hi everyone,
>> >> >> With all the progress we’ve had recently in Apache Beam, I think it
>> is
>> >> >> time
>> >> >> we start the discussion about graduation as a new top-level project
>> at
>> >> the
>> >> >> Apache Software Foundation.
>> >> >>
>> >> >> Graduation means we are a self-sustaining and self-governing
>> community,
>> >> >> and
>> >> >> ready to be a full participant in the Apache Software Foundation. It
>> >> does
>> >> >> not imply that our community growth is complete or that a particular
>> >> level
>> >> >> of technical maturity has been reached, rather that we are on a solid
>> >> >> trajectory in those areas. After graduation, we will still
>> periodically
>> >> >> report to, and be overseen by, the ASF Board to ensure continued
>> growth
>> >> of
>> >> >> a healthy community.
>> >> >>
>> >> >> Graduation is an important milestone for the project. It is also key
>> to
>> >> >> further grow the user community: many users (incorrectly) see
>> incubation
>> >> >> as
>> >> >> a sign of instability and are much less likely to consider us for a
>> >> >> production use.
>> >> >>
>> >> >> A way to think about graduation readiness is through the Apache
>> Maturity
>> >> >> Model [1]. I think we clearly satisfy all the requirements [2]. It is
>> >> >> probably worth emphasizing the recent community growth: over each of
>> the
>> >> >> past three months, no single organization contributing to Beam has
>> had
>> >> >> more
>> >> >> than ~50% of the unique contributors per month [2, see assumptions].
>> >> >> That’s
>> >> >> a great statistic that shows how much we’ve grown our diversity!
>> >> >>
>> >> >> Process-wise, graduation consists of drafting a board resolution,
>> which
>> >> >> needs to identify the full Project Management Committee, and getting
>> it
>> >> >> approved by the community, the Incubator, and the Board. Within the
>> Beam
>> >> >> community, most of these discussions and votes have to be on the
>> >> private@
>> >> >> mailing list, but, as usual, we’ll try to keep dev@ updated as much
>> as
>> >> >> possible.
>> >> >>
>> >> >> With that in mind, let’s use this discussion on dev@ for two things:
>> >> >> * Collect additional data points on our progress that we may want to
>> >> >> present to the Incubator as a part of the proposal to accept our
>> >> >> graduation.
>> >> >> * Determine whether the community supports graduation. Please reply
>> >> +1/-1
>> >> >> with any additional comments, as appropriate. I’d encourage everyone
>> to
>> >> >> participate -- regardless whether you are an occasional visitor or
>> have
>> >> a
>> >> >> specific role in the project -- we’d love to hear your perspective.
>> >> >>
>> >> >> Data points so far:
>> >> >> * Project’s maturity self-assessment [2].
>> >> >> * 1500 pull requests in incubation, which makes us one of the most
>> >> active
>> >> >> project across all of ASF on this metric.
>> >> >> * 3 releases, each driven by a different release manager.
>> >> >> * 120+ individual contributors.
>> >> >> * 3 new committers added, 2 of which aren’t from the largest
>> >> organization.
>> >> >> * 1027 issues created, 515 resolved.
>> >> >> * 442 dev@ emails in October alone, sent by 51 individuals.
>> >> >> * 50 user@ emails in the last 30 days, sent by 22 individuals.
>> >> >>
>> >> >> Thanks!
>> >> >>
>> >> >> Davor
>> >> >>
>> >> >> [1] http://community.apache.org/apache-way/apache-project-
>> >> >> maturity-model.html
>> >> >> [2] http://beam.incubator.apache.org/contribute/maturity-model/
>> >> >>
>> >> >>
>> >> >
>> >>
>>


Re: Start of release 0.3.0-incubating

2016-10-26 Thread Maximilian Michels
For releases, legal matters have top priority, e.g. licensing issues
can really get a project into trouble. Apart from that, what about
testing various functionality of Beam with different runners before an
actual release? Also, should we have a look at the list of open issues
and decide whether we want to fix some of those for the upcoming
release?

For example, it would have been nice to update the Flink version of
the Flink Runner to 1.1.3. Perhaps we can do that for the first minor
release :)

-Max


On Mon, Oct 24, 2016 at 4:28 PM, Dan Halperin
 wrote:
> Thanks JB! (et al.) Excellent suggestions.
>
> Thanks,
> Dan
>
> On Thu, Oct 20, 2016 at 9:32 PM, Jean-Baptiste Onofré 
> wrote:
>
>> Hi Dan,
>>
>> No problem, MQTT and other IOs will be in the next release..
>>
>> IMHO, it would be great to have:
>> 1. A release reminder couple of days before a release. Just to ask
>> everyone if there's no objection (something like this:
>> https://lists.apache.org/thread.html/80de75df0115940ca402132
>> 338b221e5dd5f669fd1bf915cd95e15c3@%3Cdev.karaf.apache.org%3E)
>> 2. A roughly release schedule on the website (something like this:
>> http://karaf.apache.org/download.html#container-schedule for instance).
>>
>> Just my $0.01 ;)
>>
>> Regards
>> JB
>>
>>
>> On 10/20/2016 06:30 PM, Dan Halperin wrote:
>>
>>> Hi JB,
>>>
>>> This is a great discussion to have! IMO, there's no special functionality
>>> requirements for these pre-TLP releases. It's more important to make sure
>>> we keep the process going. (I think we should start the release as soon as
>>> possible, because it's been 2 months since the last one.)
>>>
>>> If we hold a release a week for MQTT, we'll hold it another week for some
>>> other new feature, and then hold it again for some other new feature.
>>>
>>> Can you make a strong argument for why MQTT in particular should be
>>> release
>>> blocking?
>>>
>>> Dan
>>>
>>> On Thu, Oct 20, 2016 at 9:26 AM, Jean-Baptiste Onofré 
>>> wrote:
>>>
>>> +1

 Thanks Aljosha !!

 Do you mind to wait the week end or Monday to start the release ? I would
 like to include MqttIO if possible.

 Thanks !
 Regards
 JB

 ⁣

 On Oct 20, 2016, 18:07, at 18:07, Dan Halperin
 
 wrote:

> On Thu, Oct 20, 2016 at 12:37 AM, Aljoscha Krettek
> 
> wrote:
>
> Hi,
>> thanks for taking the time and writing this extensive doc!
>>
>> If no-one is against this I would like to be the release manager for
>>
> the
>
>> next (0.3.0-incubating) release. I would work with the guide and
>>
> update it
>
>> with anything that I learn along the way. Should I open a new thread
>>
> for
>
>> this or is it ok of nobody objects here?
>>
>> Cheers,
>> Aljoscha
>>
>>
> Spinning this out as a separate thread.
>
> +1 -- Sounds great to me!
>
> Dan
>
> On Thu, Oct 20, 2016 at 12:37 AM, Aljoscha Krettek
> 
> wrote:
>
> Hi,
>> thanks for taking the time and writing this extensive doc!
>>
>> If no-one is against this I would like to be the release manager for
>>
> the
>
>> next (0.3.0-incubating) release. I would work with the guide and
>>
> update it
>
>> with anything that I learn along the way. Should I open a new thread
>>
> for
>
>> this or is it ok of nobody objects here?
>>
>> Cheers,
>> Aljoscha
>>
>> On Thu, 20 Oct 2016 at 07:10 Jean-Baptiste Onofré 
>>
> wrote:
>
>>
>> Hi,
>>>
>>> well done.
>>>
>>> As already discussed, it looks good to me ;)
>>>
>>> Regards
>>> JB
>>>
>>> On 10/20/2016 01:24 AM, Davor Bonaci wrote:
>>>
 Hi everybody,
 As a project, I think we should have a Release Guide to document

>>> the
>
>> process, have consistent releases, on-board additional release

>>> managers,
>>
>>> and generally share knowledge. It is also one of the project

>>> graduation
>
>> guidelines.

 Dan and I wrote a draft version, documenting the process we did

>>> for the
>
>> first two releases. It is currently in a pull request [1]. I'd

>>> invite
>
>> everyone interested to take a peek and comment, either on the

>>> pull
>
>> request
>>>
 itself or here on mailing list, as appropriate.

 Thanks,
 Davor

 [1] https://github.com/apache/incubator-beam-site/pull/49


>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>

>>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>


Re: [VOTE] Release 0.3.0-incubating, release candidate #1

2016-10-26 Thread Maximilian Michels
+1 (binding)

Thanks for managing the release, Aljoscha!

-Max


On Wed, Oct 26, 2016 at 6:46 AM, Jean-Baptiste Onofré  wrote:
> Agree. We already discussed about that on the mailing list. I mentionned this 
> some weeks ago.
>
> Regards
> JB
>
> ⁣
>
> On Oct 26, 2016, 02:26, at 02:26, Dan Halperin  
> wrote:
>>My reading of the LEGAL threads is that since we are not including
>>(shading
>>or bundling) the ASL-licensed code we are fine to distribute kinesis-io
>>module. This was the original conclusion that LEGAL-198 got to, and
>>that
>>thread has not been resolved differently (even if Spark went ahead and
>>broke the assembly). The beam-sdks-java-io-kinesis module is an
>>optional
>>part (Beam materially works just fine without it).
>>
>>So I think we're fine to keep this vote open.
>>
>>+1 (binding) on the release
>>
>>Thanks Aljoscha!
>>
>>
>>On Tue, Oct 25, 2016 at 12:07 PM, Aljoscha Krettek
>>
>>wrote:
>>
>>> Yep, I was looking at those same threads when I reviewing the
>>artefacts.
>>> The release was already close to being finished so I went through
>>with it
>>> but if we think it's not good to have them in we should quickly
>>cancel in
>>> favour of a new RC without a published Kinesis connector.
>>>
>>> On Tue, 25 Oct 2016 at 20:46 Dan Halperin
>>
>>> wrote:
>>>
>>> > I can't tell whether it is a problem that we are distributing the
>>> > beam-sdks-java-io-kinesis module [0].
>>> >
>>> > Here is the dev@ discussion thread [1] and the (unanswered)
>>relevant
>>> LEGAL
>>> > thread [2].
>>> > We linked through to a Spark-related discussion [3], and here is
>>how to
>>> > disable distribution of the KinesisIO module [4].
>>> >
>>> > [0]
>>> >
>>> > https://repository.apache.org/content/repositories/staging/
>>> org/apache/beam/beam-sdks-java-io-kinesis/
>>> > [1]
>>> >
>>> > https://lists.apache.org/thread.html/6784bc005f329d93fd59d0f8759ed4
>>> 745e72f105e39d869e094d9645@%3Cdev.beam.apache.org%3E
>>> > [2]
>>> >
>>> > https://issues.apache.org/jira/browse/LEGAL-198?
>>> focusedCommentId=15471529&page=com.atlassian.jira.
>>> plugin.system.issuetabpanels:comment-tabpanel#comment-15471529
>>> > [3] https://issues.apache.org/jira/browse/SPARK-17418
>>> > [4] https://github.com/apache/spark/pull/15167/files
>>> >
>>> > Dan
>>> >
>>> > On Tue, Oct 25, 2016 at 11:01 AM, Seetharam Venkatesh <
>>> > venkat...@innerzeal.com> wrote:
>>> >
>>> > > +1
>>> > >
>>> > > Thanks!
>>> > >
>>> > > On Mon, Oct 24, 2016 at 2:30 PM Aljoscha Krettek
>>
>>> > > wrote:
>>> > >
>>> > > > Hi Team!
>>> > > >
>>> > > > Please review and vote at your leisure on release candidate #1
>>for
>>> > > version
>>> > > > 0.3.0-incubating, as follows:
>>> > > > [ ] +1, Approve the release
>>> > > > [ ] -1, Do not approve the release (please provide specific
>>comments)
>>> > > >
>>> > > > The complete staging area is available for your review, which
>>> includes:
>>> > > > * JIRA release notes [1],
>>> > > > * the official Apache source release to be deployed to
>>> dist.apache.org
>>> > > > [2],
>>> > > > * all artifacts to be deployed to the Maven Central Repository
>>[3],
>>> > > > * source code tag "v0.3.0-incubating-RC1" [4],
>>> > > > * website pull request listing the release and publishing the
>>API
>>> > > reference
>>> > > > manual [5].
>>> > > >
>>> > > > Please keep in mind that this release is not focused on
>>providing new
>>> > > > functionality. We want to refine the release process and make
>>stable
>>> > > source
>>> > > > and binary artefacts available to our users.
>>> > > >
>>> > > > The vote will be open for at least 72 hours. It is adopted by
>>> majority
>>> > > > approval, with at least 3 PPMC affirmative votes.
>>> > > >
>>> > > > Cheers,
>>> > > > Aljoscha
>>> > > >
>>> > > > [1]
>>> > > >
>>> > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>>> > > projectId=12319527&version=12338051
>>> > > > [2]
>>> > > >
>>> >
>>https://dist.apache.org/repos/dist/dev/incubator/beam/0.3.0-incubating/
>>> > > > [3]
>>> > > > https://repository.apache.org/content/repositories/staging/
>>> > > org/apache/beam/
>>> > > > [4]
>>> > > >
>>> > > > https://git-wip-us.apache.org/repos/asf?p=incubator-beam.
>>> git;a=tag;h=
>>> > > 5d86ff7f04862444c266142b0d5acecb5a6b7144
>>> > > > [5] https://github.com/apache/incubator-beam-site/pull/52
>>> > > >
>>> > >
>>> >
>>>


Re: [ANNOUNCEMENT] New committers!

2016-10-24 Thread Maximilian Michels
Congrats and a warm welcome!

-Max


On Sun, Oct 23, 2016 at 6:02 AM, Robert Bradshaw
 wrote:
> Congrats and welcome to all three of you!
>
> On Sat, Oct 22, 2016 at 9:02 AM, Thomas Weise  wrote:
>> Thanks everyone!
>>
>>
>> On Sat, Oct 22, 2016 at 12:59 AM, Aljoscha Krettek 
>> wrote:
>>
>>> Welcome everyone! +3 :-)
>>>
>>> On Sat, 22 Oct 2016 at 06:43 Jean-Baptiste Onofré  wrote:
>>>
>>> > Just a small thing.
>>> >
>>> > If it's not already done, don't forget to sign a ICLA and let us know
>>> > your apache ID.
>>> >
>>> > Thanks,
>>> > Regards
>>> > JB
>>> >
>>> > On 10/22/2016 12:18 AM, Davor Bonaci wrote:
>>> > > Hi everyone,
>>> > > Please join me and the rest of Beam PPMC in welcoming the following
>>> > > contributors as our newest committers. They have significantly
>>> > contributed
>>> > > to the project in different ways, and we look forward to many more
>>> > > contributions in the future.
>>> > >
>>> > > * Thomas Weise
>>> > > Thomas authored the Apache Apex runner for Beam [1]. This is an
>>> exciting
>>> > > new runner that opens a new user base. It is a large contribution,
>>> which
>>> > > starts the whole new component with a great potential.
>>> > >
>>> > > * Jesse Anderson
>>> > > Jesse has contributed significantly by promoting Beam. He has
>>> > co-developed
>>> > > a Beam tutorial and delivered it at a top big data conference. He
>>> > published
>>> > > several blog posts positioning Beam, Q&A with the Apache Beam team,
>>> and a
>>> > > demo video how to run Beam on multiple runners [2]. On the side, he has
>>> > > authored 7 pull requests and reported 6 JIRA issues.
>>> > >
>>> > > * Thomas Groh
>>> > > Since starting incubation, Thomas has contributed the most commits to
>>> the
>>> > > project [3], a total of 226 commits, which is more than anybody else.
>>> He
>>> > > has contributed broadly to the project, most significantly by
>>> developing
>>> > > from scratch the DirectRunner that supports the full model semantics.
>>> > > Additionally, he has contributed a new set of APIs for testing
>>> unbounded
>>> > > pipelines. He published a blog highlighting this work.
>>> > >
>>> > > Congratulations to all three! Welcome!
>>> > >
>>> > > Davor
>>> > >
>>> > > [1] https://github.com/apache/incubator-beam/tree/apex-runner
>>> > > [2] http://www.smokinghand.com/
>>> > > [3] https://github.com/apache/incubator-beam/graphs/contributors
>>> > > ?from=2016-02-01&to=2016-10-14&type=c
>>> > >
>>> >
>>> > --
>>> > Jean-Baptiste Onofré
>>> > jbono...@apache.org
>>> > http://blog.nanthrax.net
>>> > Talend - http://www.talend.com
>>> >
>>>


Re: Start of release 0.3.0-incubating

2016-10-21 Thread Maximilian Michels
+1 for the release. We have plenty of fixes in and users have already
asked for a new release.

-Max


On Fri, Oct 21, 2016 at 10:22 AM, Jean-Baptiste Onofré  
wrote:
> Hi Aljoscha,
>
> OK for me, you can go ahead ;)
>
> Thanks again to tackle this release !
>
> Regards
> JB
>
>
> On 10/21/2016 08:51 AM, Aljoscha Krettek wrote:
>>
>> +1 @JB
>>
>> We should definitely keep that in mind for the next releases. I think this
>> one is now sufficiently announced so I'll get started on the process.
>> (Which will take me a while since I have to do all the initial setup.)
>>
>>
>>
>> On Fri, 21 Oct 2016 at 06:32 Jean-Baptiste Onofré  wrote:
>>
>>> Hi Dan,
>>>
>>> No problem, MQTT and other IOs will be in the next release..
>>>
>>> IMHO, it would be great to have:
>>> 1. A release reminder couple of days before a release. Just to ask
>>> everyone if there's no objection (something like this:
>>>
>>>
>>> https://lists.apache.org/thread.html/80de75df0115940ca402132338b221e5dd5f669fd1bf915cd95e15c3@%3Cdev.karaf.apache.org%3E
>>> )
>>> 2. A roughly release schedule on the website (something like this:
>>> http://karaf.apache.org/download.html#container-schedule for instance).
>>>
>>> Just my $0.01 ;)
>>>
>>> Regards
>>> JB
>>>
>>> On 10/20/2016 06:30 PM, Dan Halperin wrote:

 Hi JB,

 This is a great discussion to have! IMO, there's no special
 functionality
 requirements for these pre-TLP releases. It's more important to make
 sure
 we keep the process going. (I think we should start the release as soon
>>>
>>> as

 possible, because it's been 2 months since the last one.)

 If we hold a release a week for MQTT, we'll hold it another week for
 some
 other new feature, and then hold it again for some other new feature.

 Can you make a strong argument for why MQTT in particular should be
>>>
>>> release

 blocking?

 Dan

 On Thu, Oct 20, 2016 at 9:26 AM, Jean-Baptiste Onofré 
 wrote:

> +1
>
> Thanks Aljosha !!
>
> Do you mind to wait the week end or Monday to start the release ? I
>>>
>>> would
>
> like to include MqttIO if possible.
>
> Thanks !
> Regards
> JB
>
> ⁣
>
> On Oct 20, 2016, 18:07, at 18:07, Dan Halperin
>>>
>>> 
>
> wrote:
>>
>> On Thu, Oct 20, 2016 at 12:37 AM, Aljoscha Krettek
>> 
>> wrote:
>>
>>> Hi,
>>> thanks for taking the time and writing this extensive doc!
>>>
>>> If no-one is against this I would like to be the release manager for
>>
>> the
>>>
>>> next (0.3.0-incubating) release. I would work with the guide and
>>
>> update it
>>>
>>> with anything that I learn along the way. Should I open a new thread
>>
>> for
>>>
>>> this or is it ok of nobody objects here?
>>>
>>> Cheers,
>>> Aljoscha
>>>
>>
>> Spinning this out as a separate thread.
>>
>> +1 -- Sounds great to me!
>>
>> Dan
>>
>> On Thu, Oct 20, 2016 at 12:37 AM, Aljoscha Krettek
>> 
>> wrote:
>>
>>> Hi,
>>> thanks for taking the time and writing this extensive doc!
>>>
>>> If no-one is against this I would like to be the release manager for
>>
>> the
>>>
>>> next (0.3.0-incubating) release. I would work with the guide and
>>
>> update it
>>>
>>> with anything that I learn along the way. Should I open a new thread
>>
>> for
>>>
>>> this or is it ok of nobody objects here?
>>>
>>> Cheers,
>>> Aljoscha
>>>
>>> On Thu, 20 Oct 2016 at 07:10 Jean-Baptiste Onofré 
>>
>> wrote:
>>>
>>>
 Hi,

 well done.

 As already discussed, it looks good to me ;)

 Regards
 JB

 On 10/20/2016 01:24 AM, Davor Bonaci wrote:
>
> Hi everybody,
> As a project, I think we should have a Release Guide to document
>>
>> the
>
> process, have consistent releases, on-board additional release
>>>
>>> managers,
>
> and generally share knowledge. It is also one of the project
>>
>> graduation
>
> guidelines.
>
> Dan and I wrote a draft version, documenting the process we did
>>
>> for the
>
> first two releases. It is currently in a pull request [1]. I'd
>>
>> invite
>
> everyone interested to take a peek and comment, either on the
>>
>> pull

 request
>
> itself or here on mailing list, as appropriate.
>
> Thanks,
> Davor
>
> [1] https://github.com/apache/incubator-beam-site/pull/49
>

 --
 Jean-Baptiste Onofré
 jbono...@apache.org
 http://blog.nanthrax.net
 Talend - http://www.talend.com


Re: [KUDOS] Contributed runner: Apache Apex!

2016-10-18 Thread Maximilian Michels
Great to have another Runner on board! Congrats!

-Max


On Tue, Oct 18, 2016 at 8:10 AM, Jean-Baptiste Onofré  wrote:
> Awesome !
>
> Great job guys !
>
> Thanks to Thomas, Vlad, Guaray and Ken for this.
>
> Regards
> JB
>
>
> On 10/17/2016 06:51 PM, Kenneth Knowles wrote:
>>
>> Hi all,
>>
>> I would to, once again, call attention to a great addition to Beam: a
>> runner for Apache Apex.
>>
>> After lots of review and much thoughtful revision, pull request #540 has
>> been merged to the apex-runner feature branch today. Please do take a
>> look,
>> and help us put the finishing touches on it to get it ready for the master
>> branch.
>>
>> And please also congratulate and thank Thomas Weise for this large
>> endeavor, Vlad Rosov who helped get the integration tests working, and
>> Guarav Gupta who contributed review comments.
>>
>> Kenn
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: [Proposal] Add waitToFinish(), cancel(), waitToRunning() to PipelineResult.

2016-10-13 Thread Maximilian Michels
The Flink runner currently only supports blocking execution. I'll open
a pull request to at least fix waitUntilFinish().

-Max


On Thu, Oct 13, 2016 at 11:10 AM, Amit Sela  wrote:
> Hi Pei,
>
> I have someone on my time who started to work on this, I'll follow-up,
> thanks for the bum ;-)
>
> Amit
>
> On Thu, Oct 13, 2016 at 8:38 AM Jean-Baptiste Onofré 
> wrote:
>
>> Hi Pei,
>>
>> good one !
>>
>> We now have to update the 'other' runners.
>>
>> Thanks.
>>
>> Regards
>> JB
>>
>> On 10/12/2016 10:48 PM, Pei He wrote:
>> > Hi,
>> > I just want to bump this thread, and brought it to attention.
>> >
>> > PipelineResult now have cancel() and waitUntilFinish(). However,
>> currently
>> > only DataflowRunner supports it in DataflowPipelineJob.
>> >
>> > We agreed that users should do "p.run().waitUntilFinish()" if they want
>> to
>> > block. But, if they do it now, direct, flink, spark runners will throw
>> > exceptions.
>> >
>> > I have following jira issues opened, I am wondering could any people help
>> > on them?
>> >
>> > https://issues.apache.org/jira/browse/BEAM-596
>> > https://issues.apache.org/jira/browse/BEAM-595
>> > https://issues.apache.org/jira/browse/BEAM-593
>> >
>> > Thanks
>> > --
>> > Pei
>> >
>> >
>> >
>> >
>> > On Tue, Jul 26, 2016 at 10:54 AM, Amit Sela 
>> wrote:
>> >
>> >> +1 and Thanks!
>> >>
>> >> On Tue, Jul 26, 2016 at 2:01 AM Robert Bradshaw
>> >> 
>> >> wrote:
>> >>
>> >>> +1, sounds great. Thanks Pei.
>> >>>
>> >>> On Mon, Jul 25, 2016 at 3:28 PM, Lukasz Cwik > >
>> >>> wrote:
>>  +1 for your proposal Pei
>> 
>>  On Mon, Jul 25, 2016 at 5:54 PM, Pei He 
>> >>> wrote:
>> 
>> > Looks to me that followings are agreed:
>> > (1). adding cancel() and waitUntilFinish() to PipelineResult.
>> > (In streaming mode, "all data watermarks reach to infinity" is
>> > considered as finished.)
>> > (2). PipelineRunner.run() should return relatively quick as soon as
>> > the pipeline/job is started/running. The blocking logic should be
>> left
>> > to users' code to handle with PipelineResult.waitUntilFinish(). (Test
>> > runners that finish quickly can block run() until the execution is
>> > done. So, it is cleaner to verify test results after run())
>> >
>> > I will send out PR for (1), and create jira issues to improve runners
>> >>> for
>> > (2).
>> >
>> > waitToRunning() is controversial, and we have several half way agreed
>> > proposals.
>> > I will pull them out from this thread, so we can close this proposal
>> > with cancel() and waitUntilFinish(). And, i will create a jira issue
>> > to track how to support ''waiting until other states".
>> >
>> > Does that sound good with anyone?
>> >
>> > Thanks
>> > --
>> > Pei
>> >
>> > On Thu, Jul 21, 2016 at 4:32 PM, Robert Bradshaw
>> >  wrote:
>> >> On Thu, Jul 21, 2016 at 4:18 PM, Ben Chambers > >>>
>> > wrote:
>> >>> This health check seems redundant with just waiting a while and
>> >> then
>> >>> checking on the status, other than returning earlier in the case of
>> >>> reaching a terminal state. What about adding:
>> >>>
>> >>> /**
>> >>>  * Returns the state after waiting the specified duration. Will
>> >>> return
>> >>> earlier if the pipeline
>> >>>  * reaches a terminal state.
>> >>>  */
>> >>> State getStateAfter(Duration duration);
>> >>>
>> >>> This seems to be a useful building block, both for the user's
>> >>> pipeline
>> > (in
>> >>> case they wanted to build something like wait and then check
>> >> health)
>> >>> and
>> >>> also for the SDK (to implement waitUntilFinished, etc.)
>> >>
>> >> A generic waitFor(Duration) which may return early if a terminal
>> >> state
>> >> is entered seems useful. I don't know that we need a return value
>> >> here, given that we an then query the PipelineResult however we want
>> >> once this returns. waitUntilFinished is simply
>> >> waitFor(InfiniteDuration).
>> >>
>> >>> On Thu, Jul 21, 2016 at 4:11 PM Pei He 
>> > wrote:
>> >>>
>>  I am not in favor of supporting wait for every states or
>>  waitUntilState(...).
>>  One reason is PipelineResult.State is not well defined and is not
>>  agreed upon runners.
>>  Another reason is users might not want to wait for a particular
>> >>> state.
>>  For example,
>>  waitUntilFinish() is to wait for a terminal state.
>>  So, even runners have different states, we still can define shared
>>  properties, such as finished/terminal.
>> >>
>> >> +1. Running is an intermediate state that doesn't have an obvious
>> >> mapping onto all runners, which is another reason it's odd to wait
>> >> until then. All runners have terminal states.
>> >>
>>  I think when users call waitUntilRunning(), they want to make sure
>> >>> the
>>  pi

Re: [DISCUSS] UnboundedSource and the KafkaIO.

2016-10-12 Thread Maximilian Michels
Thanks for the explanation, Amit!

What you described doesn't sound so different from how the Flink
Runner interfaces with the UnboundedSource interface. Taken aside the
mini batches and the discretization of the stream that you need to
apply therefore, the checkpointing logic is pretty similar. The Flink
wrapper doesn't use an id to identify the checkpointed state because
the state is kept per operator and restored to each instance in case
of a failure. In Spark, the state is directly scoped by key. That
actually makes a lot of sense when you want to rescale a job and
that's the direction in which Flink is currently improving its state
interface.


-Max


On Mon, Oct 10, 2016 at 3:35 PM, Amit Sela  wrote:
> Thanks Max!
>
> I'll try to explain Spark's stateful operators and how/why I used them with
> UnboundedSource.
>
> Spark has two stateful operators: *updateStateByKey* and *mapWithState*.
> Since updateStateByKey is bound to output the (updated) state itself - the
> CheckpointMark in our case - we're left with mapWithState.
> mapWithState provides a persistent, distributed "map-like", that is
> partitioned according to the stream. This is indeed how I manage state
> between micro-batches.
> However, mapWithState (like any map) will give you a value (state)
> corresponding to a specific key, so I use a running-id from the initial
> splitting to identify the appropriate state.
> I took a look at Flink's implementation ( I do that sometimes ;-) ) and I
> could do the same and save the split source with the CheckpointMark but
> it'll still have to correspond to the same id, and since I had to wrap the
> split Source to perform a sort of "BoundedReadFromUnboundedSource" I simply
> added an id field and I'm hashing by that id.
> I'll also add that the stateful operator can only be applied to a
> (Pair)Stream and not to input operators so I'm actually generating a stream
> of splits (the same ones for every micro-batch) and reading from within the
> mappingFunction of the mapWithState.
>
> It's not the simplest design, but given how Spark's persistent state and
> InputDStream are designed comparing to the Beam model, I don't see another
> way - though I'd be happy to hear one!
>
> Pretty sure I've added this here but no harm in adding the link again: design
> doc
> <https://docs.google.com/document/d/12BzHbETDt7ICIF7vc8zzCeLllmIpvvaVDIdBlcIwE1M/edit?usp=sharing>
> and
> a work-in-progress branch
> <https://github.com/amitsela/incubator-beam/tree/BEAM-658-WIP> all
> mentioned in the ticket <https://issues.apache.org/jira/browse/BEAM-658> as
> well.
> The design doc also relates to how "pure" Spark works with Kafka, which I
> think is interesting and very different from Flink/Dataflow.
>
> Hope this helped clear things up a little, please keep on asking if
> something is not clear yet.
>
> Thanks,
> Amit.
>
> On Mon, Oct 10, 2016 at 4:02 PM Maximilian Michels  wrote:
>
>> Just to add a comment from the Flink side and its
>>
>> UnboundedSourceWrapper. We experienced the only way to guarantee
>>
>> deterministic splitting of the source, was to generate the splits upon
>>
>> creation of the source and then checkpoint the assignment during
>>
>> runtime. When restoring from a checkpoint, the same reader
>>
>> configuration is restored. It's not possible to change the splitting
>>
>> after the initial splitting has taken place. However, Flink will soon
>>
>> be able to repartition the operator state upon restart/rescaling of a
>>
>> job.
>>
>>
>>
>> Does Spark have a way to pass state of a previous mini batch to the
>>
>> current mini batch? If so, you could restore the last configuration
>>
>> and continue reading from the checkpointed offset. You just have to
>>
>> checkpoint before the mini batch ends.
>>
>>
>>
>> -Max
>>
>>
>>
>> On Mon, Oct 10, 2016 at 10:38 AM, Jean-Baptiste Onofré 
>> wrote:
>>
>> > Hi Amit,
>>
>> >
>>
>> > thanks for the explanation.
>>
>> >
>>
>> > For 4, you are right, it's slightly different from DataXchange (related
>> to
>>
>> > the elements in the PCollection). I think storing the "starting point"
>> for a
>>
>> > reader makes sense.
>>
>> >
>>
>> > Regards
>>
>> > JB
>>
>> >
>>
>> >
>>
>> > On 10/10/2016 10:33 AM, Amit Sela wrote:
>>
>> >>
>>
>> >&g

Re: [DISCUSS] UnboundedSource and the KafkaIO.

2016-10-10 Thread Maximilian Michels
Just to add a comment from the Flink side and its
UnboundedSourceWrapper. We experienced the only way to guarantee
deterministic splitting of the source, was to generate the splits upon
creation of the source and then checkpoint the assignment during
runtime. When restoring from a checkpoint, the same reader
configuration is restored. It's not possible to change the splitting
after the initial splitting has taken place. However, Flink will soon
be able to repartition the operator state upon restart/rescaling of a
job.

Does Spark have a way to pass state of a previous mini batch to the
current mini batch? If so, you could restore the last configuration
and continue reading from the checkpointed offset. You just have to
checkpoint before the mini batch ends.

-Max

On Mon, Oct 10, 2016 at 10:38 AM, Jean-Baptiste Onofré  
wrote:
> Hi Amit,
>
> thanks for the explanation.
>
> For 4, you are right, it's slightly different from DataXchange (related to
> the elements in the PCollection). I think storing the "starting point" for a
> reader makes sense.
>
> Regards
> JB
>
>
> On 10/10/2016 10:33 AM, Amit Sela wrote:
>>
>> Inline, thanks JB!
>>
>> On Mon, Oct 10, 2016 at 9:01 AM Jean-Baptiste Onofré 
>> wrote:
>>
>>> Hi Amit,
>>>
>>>
>>>
>>> For 1., the runner is responsible of the checkpoint storage (associated
>>>
>>> with the source). It's the way for the runner to retry and know the
>>>
>>> failed bundles.
>>>
>> True, this was a recap/summary of another, not-so-clear, thread.
>>
>>>
>>>
>>>
>>> For 4, are you proposing that KafkaRecord store additional metadata for
>>>
>>> that ? It sounds like what I proposed in the "Technical Vision" appendix
>>>
>>> document: there I proposed to introduce a DataXchange object that store
>>>
>>> some additional metadata (like offset) used by the runner. It would be
>>>
>>> the same with SDF as the tracker state should be persistent as well.
>>>
>> I think I was more focused on persisting the "starting point" for a
>> reader,
>> even if no records were read (yet), so that the next time the reader
>> attempts to read it will pick of there. This has more to do with how the
>> CheckpointMark handles this.
>> I have to say that I'm not familiar with your DataXchange proposal, I will
>> take a look though.
>>
>>>
>>>
>>>
>>> Regards
>>>
>>> JB
>>>
>>>
>>>
>>> On 10/08/2016 01:55 AM, Amit Sela wrote:
>>>
 I started a thread about (suggesting) UnboundedSource splitId's and it
>>>
>>>
 turned into an UnboundedSource/KafkaIO discussion, and I think it's best
>>>
>>> to
>>>
 start over in a clear [DISCUSS] thread.
>>>
>>>

>>>
 When working on UnboundedSource support for the Spark runner, I've
 raised
>>>
>>>
 some questions, some were general-UnboundedSource, and others
>>>
>>>
 Kafka-specific.
>>>
>>>

>>>
 I'd like to recap them here, and maybe have a more productive and
>>>
>>>
 well-documented discussion for everyone.
>>>
>>>

>>>
1. UnboundedSource id's - I assume any runner persists the
>>>
>>>
UnboundedSources's CheckpointMark for fault-tolerance, but I wonder
>>>
>>> how it
>>>
matches the appropriate split (of the UnboundedSource) to it's
>>>
>>> previously
>>>
persisted CheckpointMark in any specific worker ?
>>>
>>>
*Thomas Groh* mentioned that Source splits have to have an
>>>
>>>
 associated identifier,
>>>
>>>
and so the runner gets to tag splits however it pleases, so long as
>>>
>>>
those tags don't allow splits to bleed into each other.
>>>
>>>
2. Consistent splitting - an UnboundedSource splitting seems to
>>>
>>> require
>>>
consistent splitting if it were to "pick-up where it left", correct ?
>>>
>>> this
>>>
is not mentioned as a requirement or a recommendation in
>>>
>>>
UnboundedSource#generateInitialSplits(), so is this a Kafka-only
>>>
>>> issue ?
>>>
*Raghu Angadi* mentioned that Kafka already does so by applying
>>>
>>>
partitions to readers in a round-robin manner.
>>>
>>>
*Thomas Groh* also added that while the UnboundedSource API doesn't
>>>
>>>
require deterministic splitting (although it's recommended), a
>>>
>>>
PipelineRunner
>>>
>>>
should keep track of the initially generated splits.
>>>
>>>
3. Support reading of Kafka partitions that were added to topic/s
>>>
>>> while
>>>
a Pipeline reads from them - BEAM-727
>>>
>>>
 was filed.
>>>
>>>
4. Reading/persisting Kafka start offsets - since Spark works in
>>>
>>>
micro-batches, if "latest" was applied on a fairly sparse topic each
>>>
>>> worker
>>>
would actually begin reading only after it saw a message during the
>>>
>>> time
>>>
window it had to read messages. This is because fetching the offsets
>>>
>>> is
>>>
done by the worker running the Reader. This means that each Reader
>>>
>>> sees a
>>>

Re: [PROPOSAL] Introduce review mailing list and provide update on open discussion

2016-10-07 Thread Maximilian Michels
Hi JB!

> 1. We create a new mailing list: rev...@beam.incubator.apache.org.
> 2. We configure github integration to send all pull request comments on 
> review mailing list. It would allow to track and simplify the way to read the 
> comments and to keep up to date.

I already have it organized that way through filters but having a
dedicated mailing list is a much better idea.

> 3. A technical discussion should be send on dev mailing list with the 
> [DISCUSS] keyword in the subject.
> 4. Once a discussion is open, the author should periodically send an update 
> on the discussion (once a week) >containing a summary of the last exchanges 
> happened on the Jira or github (quick and direct summary).

We can try that on a best-effort basis. Enforcing this seems to be
difficult and could also introduce verbosity on the mailing list.

> 5. Once we consider the discussion close (no update in the last two weeks), 
> the author send a [CLOSE] e-mail on the thread.

I think it is hard to decide when a discussion is closed. Two weeks
seems like a too short amount of time.

In general, +1 for an open development process.

-Max

On Fri, Oct 7, 2016 at 4:05 AM, Jungtaek Lim  wrote:
> +1 except [4] for me, too. [4] may be replaced with linking DISCUSSION mail
> thread archive to JIRA.
> Yes it doesn't update news on discussion to JIRA and/or Github, but at
> least someone needed to see can find out manually.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2016년 10월 7일 (금) 오전 11:00, Satish Duggana 님이 작성:
>
>> +1 for proposal except for [4]. Agree with Raghu on [4] as it may be
>> burdensome to update with summaries and folks may start replying comments
>> on those summaries etc and conclusions are updated on respective design
>> docs. We may want to start without [4].
>>
>> Thanks,
>> Satish.
>>
>> On Fri, Oct 7, 2016 at 12:00 AM, Raghu Angadi 
>> wrote:
>>
>> > +1 for rev...@beam.incubator.apache.org. Open lists are critically
>> > important.
>> >
>> > My comment earlier was mainly about (4). Sorry about the not being clear.
>> >
>> > On Thu, Oct 6, 2016 at 11:00 AM, Lukasz Cwik 
>> > wrote:
>> >
>> > > +1 for supporting different working styles.
>> > >
>> > > On Thu, Oct 6, 2016 at 10:58 AM, Kenneth Knowles
>> > > >
>> > > wrote:
>> > >
>> > > > +1 to rev...@beam.incubator.apache.org if it is turnkey for infra to
>> > set
>> > > > up, aka points 1 and 2.
>> > > >
>> > > > Even though I would not personally read it via email, getting the
>> > > > information in yet another format and infrastructure (and
>> stewardship)
>> > is
>> > > > valuable for search, archival, and supporting diverse work styles.
>> The
>> > > > benefit might not be huge, but I think it will be enough to justify
>> the
>> > > > (hopefully negligible) cost.
>> > > >
>> > > > Kenn
>> > > >
>> > > > On Thu, Oct 6, 2016 at 4:54 AM Jean-Baptiste Onofré > >
>> > > > wrote:
>> > > >
>> > > > Hi team,
>> > > >
>> > > > following the discussion we had about technical discussion that
>> should
>> > > > happen on the mailing list, I would like to propose the following:
>> > > >
>> > > > 1. We create a new mailing list: rev...@beam.incubator.apache.org.
>> > > > 2. We configure github integration to send all pull request comments
>> on
>> > > > review mailing list. It would allow to track and simplify the way to
>> > > > read the comments and to keep up to date.
>> > > > 3. A technical discussion should be send on dev mailing list with the
>> > > > [DISCUSS] keyword in the subject.
>> > > > 4. Once a discussion is open, the author should periodically send an
>> > > > update on the discussion (once a week) containing a summary of the
>> last
>> > > > exchanges happened on the Jira or github (quick and direct summary).
>> > > > 5. Once we consider the discussion close (no update in the last two
>> > > > weeks), the author send a [CLOSE] e-mail on the thread.
>> > > >
>> > > > WDYT ?
>> > > >
>> > > > Regards
>> > > > JB
>> > > > --
>> > > > Jean-Baptiste Onofré
>> > > > jbono...@apache.org
>> > > > http://blog.nanthrax.net
>> > > > Talend - http://www.talend.com
>> > > >
>> > >
>> >
>>


Re: We've hit 1000 PRs!

2016-09-27 Thread Maximilian Michels
Indeed, that number is pretty impressive! For one, it shows Beam is a
very active project, but also it reflects Beam's rigorous PR policy.
+1 more of that!

On Tue, Sep 27, 2016 at 10:35 AM, Aljoscha Krettek  wrote:
> Sweet! :-)
>
> On Mon, 26 Sep 2016 at 23:47 Dan Halperin 
> wrote:
>
>> Hey folks!
>>
>> Just wanted to send out a note -- we've hit 1000 PRs in GitHub as of
>> Saturday! That's a tremendous amount of work for the 7 months since PR#1.
>>
>> I bet we hit 2000 in much fewer than 7 months ;)
>>
>> Dan
>>


Re: Support for Flink 1.1.0 in release-0.2.0-incubating

2016-09-20 Thread Maximilian Michels
That is totally understandable. However, upgrading to a new version of
Flink is also a big change that could require additional changes which are
out of scope for a Beam minor release.

My advice, if you want to use the latest version but prevent changes coming
in constantly, you could use a fixed snapshot release. For example:


   org.apache.beam
   beam-runners-flink_2.10
   0.3.0-incubating-20160920.071715-50



You can derive this version umber from the snapshot repository: If you look
at the snapshot repository:
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-runners-flink_2.10/0.3.0-incubating-SNAPSHOT/maven-metadata.xml

It would be worthwhile to discuss how we handle version upgrades of
backends in the Beam release cycle. Unfortunately, for the Runner to be
compatible across Flink versions, we need a bit more than just API
stability because some parts integrate also with the Flink runtime. I can
see this becoming an issue once more runners are part of Beam.

Best,
Max

On Sun, Sep 18, 2016 at 11:04 PM, Chawla,Sumit 
wrote:

> Hi Max
>
> Thanks for the information. I agree with you that 0.3.0 is the way ahead,
> but i am hesitant to use 0.3.0-SNAPSHOT due to its changing nature.
>
> Regards
> Sumit Chawla
>
>
> On Fri, Sep 16, 2016 at 5:51 AM, Maximilian Michels 
> wrote:
>
>> Hi Sumit,
>>
>> Thanks for the PR. Your changes looks good. I think there are
>> currently no plans for a minor release 0.2.1-incubating. A lot of
>> issues were fixed on the latest master which should give you a better
>> experience than the 0.2.0-incubating release.
>>
>> These are the current issues which will be fixed in 0.3.0-incubating:
>> https://issues.apache.org/jira/browse/BEAM-102?jql=project%2
>> 0%3D%20BEAM%20AND%20fixVersion%20%3D%200.3.0-incubating%20AN
>> D%20component%20%3D%20runner-flink
>>
>> 1. Most notably, it fixes an issue with comparing keys unencoded in
>> streaming mode which was an issue if you had not implemented
>> equals/hashCode for your objects.
>>
>> 2. Further, we support side inputs in streaming now. In the course, we
>> have also unified execution paths in streaming mode.
>>
>> 3. There was an issue with checkpointed sources that has been resolved.
>>
>> If you could try out the latest version, that would be great. If not,
>> we can probably merge your PR and think about a minor release.
>>
>> Best,
>> Max
>>
>> On Fri, Sep 16, 2016 at 6:10 AM, Chawla,Sumit 
>> wrote:
>> > Hi Max
>> >
>> > I have opened a PR - https://github.com/apache/incubator-beam/pull/963
>> for
>> > adding support of Flink 1.1.2 in Beam 0.2.0 release.
>> >
>> > Regards
>> > Sumit Chawla
>> >
>> >
>> > On Wed, Sep 14, 2016 at 1:32 PM, Chawla,Sumit 
>> > wrote:
>> >
>> >> Hi Max
>> >>
>> >> I was able to compile 0.2.0 with Flink 1.1.0 with small modification,
>> and
>> >> run a simple pipeline.
>> >>
>> >>@Override
>> >> -  public void restoreState(StreamTaskState taskState, long
>> >> recoveryTimestamp) throws Exception {
>> >> -super.restoreState(taskState, recoveryTimestamp);
>> >> +  public void restoreState(StreamTaskState taskState) throws
>> Exception {
>> >> +super.restoreState(taskState);
>> >>
>> >>
>> >> Can i get a sense of the changes that have happened in 0.3.0 for
>> Flink?  I
>> >> observed some classes completely reworked.  It will be crucial for me
>> to
>> >> understand the scope of change and impact before making a move to 0.3.0
>> >>
>> >>
>> >>
>> >> Regards
>> >> Sumit Chawla
>> >>
>> >>
>> >> On Wed, Sep 14, 2016 at 3:03 AM, Maximilian Michels 
>> >> wrote:
>> >>
>> >>> We support Flink 1.1.2 on the latest snapshot version
>> >>> 0.3.0-incubating-SNAPSHOT. Would it be possible for you to work with
>> >>> this version?
>> >>>
>> >>> On Tue, Sep 13, 2016 at 11:55 PM, Chawla,Sumit <
>> sumitkcha...@gmail.com>
>> >>> wrote:
>> >>> > When trying to use Beam 0.2.0 with Flink 1.1.0 jar, i am seeing
>> >>> following
>> >>> > error:
>> >>> >
>> >>> > java.lang.NoSuchMethodError:
>> >>> > org.apache.flink.streaming.api.operators.StreamingRuntimeCon
>> >>> text.registerTimer(J

Re: BEAM-635 - Support Flink Release Version 1.1.2 in release-0.2.0-incubating

2016-09-16 Thread Maximilian Michels
At the moment we have both, a 0.2.0-incubating branch and tag. As far
as I understand we would push all changes for a minor release in the
0.2.0-incubating branch which would then be used to create a
0.2.1-incubating release and tag. If not, what is the purpose of the
0.2.0-incubating branch?

Perhaps we could rename 0.2.0-incubating to 0.2-incubating? I think
that is a popular scheme for release branches.

That said, a proper minor release should properly include more changes
then just this one fix. At this point in time we're focusing on
development and typically do not backport changes. This will
definitely change in the future. So I'm not really sure if a minor
release is feasible.


On Fri, Sep 16, 2016 at 8:36 AM, Jean-Baptiste Onofré  wrote:
> Hi Sumit,
>
> The only possible way would be to create 0.2.x branch based on the 0.2.0
> release tag and then apply your PR.
>
> 0.3.0 release is planned for in a couple of months max. So, if you can wait,
> I would strongly recommend to wait this release that will include Flink
> upgrade.
>
> Regards
> JB
>
>
> On 09/16/2016 06:14 AM, Chawla,Sumit  wrote:
>>
>> Hi All
>>
>> I have created this PR to upgrade 0.2.0-release to use Flink 1.1.2.  There
>> are some nice features added in Flink 1.1.2 which we cannot consume until
>> Beam supports it.  I know that Flink 1.1.2 will be supported in 0.3.0.
>> But
>> for time being, we don't want to add a dependency on a snapshot version.
>>
>> Would love to hear your feedback and views on this one..
>>
>> Regards
>> Sumit Chawla
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: Support for Flink 1.1.0 in release-0.2.0-incubating

2016-09-16 Thread Maximilian Michels
Hi Sumit,

Thanks for the PR. Your changes looks good. I think there are
currently no plans for a minor release 0.2.1-incubating. A lot of
issues were fixed on the latest master which should give you a better
experience than the 0.2.0-incubating release.

These are the current issues which will be fixed in 0.3.0-incubating:
https://issues.apache.org/jira/browse/BEAM-102?jql=project%20%3D%20BEAM%20AND%20fixVersion%20%3D%200.3.0-incubating%20AND%20component%20%3D%20runner-flink

1. Most notably, it fixes an issue with comparing keys unencoded in
streaming mode which was an issue if you had not implemented
equals/hashCode for your objects.

2. Further, we support side inputs in streaming now. In the course, we
have also unified execution paths in streaming mode.

3. There was an issue with checkpointed sources that has been resolved.

If you could try out the latest version, that would be great. If not,
we can probably merge your PR and think about a minor release.

Best,
Max

On Fri, Sep 16, 2016 at 6:10 AM, Chawla,Sumit  wrote:
> Hi Max
>
> I have opened a PR - https://github.com/apache/incubator-beam/pull/963 for
> adding support of Flink 1.1.2 in Beam 0.2.0 release.
>
> Regards
> Sumit Chawla
>
>
> On Wed, Sep 14, 2016 at 1:32 PM, Chawla,Sumit 
> wrote:
>
>> Hi Max
>>
>> I was able to compile 0.2.0 with Flink 1.1.0 with small modification, and
>> run a simple pipeline.
>>
>>@Override
>> -  public void restoreState(StreamTaskState taskState, long
>> recoveryTimestamp) throws Exception {
>> -super.restoreState(taskState, recoveryTimestamp);
>> +  public void restoreState(StreamTaskState taskState) throws Exception {
>> +super.restoreState(taskState);
>>
>>
>> Can i get a sense of the changes that have happened in 0.3.0 for Flink?  I
>> observed some classes completely reworked.  It will be crucial for me to
>> understand the scope of change and impact before making a move to 0.3.0
>>
>>
>>
>> Regards
>> Sumit Chawla
>>
>>
>> On Wed, Sep 14, 2016 at 3:03 AM, Maximilian Michels 
>> wrote:
>>
>>> We support Flink 1.1.2 on the latest snapshot version
>>> 0.3.0-incubating-SNAPSHOT. Would it be possible for you to work with
>>> this version?
>>>
>>> On Tue, Sep 13, 2016 at 11:55 PM, Chawla,Sumit 
>>> wrote:
>>> > When trying to use Beam 0.2.0 with Flink 1.1.0 jar, i am seeing
>>> following
>>> > error:
>>> >
>>> > java.lang.NoSuchMethodError:
>>> > org.apache.flink.streaming.api.operators.StreamingRuntimeCon
>>> text.registerTimer(JLorg/apache/flink/streaming/
>>> runtime/operators/Triggerable;)V
>>> > at org.apache.beam.runners.flink.translation.wrappers.streaming
>>> .io.UnboundedSourceWrapper.setNextWatermarkTimer(UnboundedSo
>>> urceWrapper.java:381)
>>> > at org.apache.beam.runners.flink.translation.wrappers.streaming
>>> .io.UnboundedSourceWrapper.run(UnboundedSourceWrapper.java:233)
>>> > at org.apache.flink.streaming.api.operators.StreamSource.run(
>>> StreamSource.java:80)
>>> > at org.apache.flink.streaming.api.operators.StreamSource.run(
>>> StreamSource.java:53)
>>> > at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.
>>> run(SourceStreamTask.java:56)
>>> > at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
>>> StreamTask.java:266)
>>> > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
>>> > at java.lang.Thread.run(Thread.java:745)
>>> >
>>> >
>>> > Regards
>>> > Sumit Chawla
>>> >
>>> >
>>> > On Tue, Sep 13, 2016 at 2:20 PM, Chawla,Sumit 
>>> > wrote:
>>> >
>>> >> Hi All
>>> >>
>>> >> The release-0.2.0-incubating supports Flink 1.0.3. With Flink 1.1.0
>>> out,
>>> >> is there a plan to support it with any 0.2.0 patch? I tried compiling
>>> 0.2.0
>>> >> with Flink 1.1.0,
>>> >> and got couple of compliation errors in FlinkGroupAlsoByWindowWrapper.
>>> java.
>>> >> Going back to master i see lots of change in Flink translation
>>> wrappers,
>>> >> and
>>> >> FlinkGroupAlsoByWindowWrapper.java being removed.
>>> >>
>>> >> Just want to get a sense of things here, on what would it take to
>>> support Flink
>>> >> 1.1.0 with release-0.2.0. Would appreciate views of people who are
>>> already
>>> >> working on upgrading it to Flink 1.1.0
>>> >>
>>> >> Regards
>>> >> Sumit Chawla
>>> >>
>>> >>
>>>
>>
>>


Re: Support for Flink 1.1.0 in release-0.2.0-incubating

2016-09-14 Thread Maximilian Michels
We support Flink 1.1.2 on the latest snapshot version
0.3.0-incubating-SNAPSHOT. Would it be possible for you to work with
this version?

On Tue, Sep 13, 2016 at 11:55 PM, Chawla,Sumit  wrote:
> When trying to use Beam 0.2.0 with Flink 1.1.0 jar, i am seeing following
> error:
>
> java.lang.NoSuchMethodError:
> org.apache.flink.streaming.api.operators.StreamingRuntimeContext.registerTimer(JLorg/apache/flink/streaming/runtime/operators/Triggerable;)V
> at 
> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.setNextWatermarkTimer(UnboundedSourceWrapper.java:381)
> at 
> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.run(UnboundedSourceWrapper.java:233)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:80)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:53)
> at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:56)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:266)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
> at java.lang.Thread.run(Thread.java:745)
>
>
> Regards
> Sumit Chawla
>
>
> On Tue, Sep 13, 2016 at 2:20 PM, Chawla,Sumit 
> wrote:
>
>> Hi All
>>
>> The release-0.2.0-incubating supports Flink 1.0.3. With Flink 1.1.0 out,
>> is there a plan to support it with any 0.2.0 patch? I tried compiling 0.2.0
>> with Flink 1.1.0,
>> and got couple of compliation errors in FlinkGroupAlsoByWindowWrapper.java.
>> Going back to master i see lots of change in Flink translation wrappers,
>> and
>> FlinkGroupAlsoByWindowWrapper.java being removed.
>>
>> Just want to get a sense of things here, on what would it take to support 
>> Flink
>> 1.1.0 with release-0.2.0. Would appreciate views of people who are already
>> working on upgrading it to Flink 1.1.0
>>
>> Regards
>> Sumit Chawla
>>
>>


Re: [MENTOR] Resigning as a Beam mentor

2016-09-08 Thread Maximilian Michels
Hi Bertrand,

Thanks for your work and hope to see you again some time here or elsewhere!

Best,
Max

On Thu, Sep 8, 2016 at 11:03 AM, Bertrand Delacretaz
 wrote:
> On Thu, Sep 8, 2016 at 10:59 AM, Jean-Baptiste Onofré  
> wrote:
>> ...thanks a lot for all the commitment and feedback you gave to
>> the Beam podling...
>
> You're welcome! I have removed my name from
> http://incubator.apache.org/projects/beam.html and will inform
> private@incubator.a.o
>
> -Bertrand


Re: NullPointerException in beam stream runner

2016-09-02 Thread Maximilian Michels
I didn't know it was already fixed :) I pulled in changes from the
latest master before I tested. Thus, it worked for me.


On Thu, Sep 1, 2016 at 10:38 AM, Maximilian Michels  wrote:
> Hi Alexey,
>
> You don't have to set the streaming mode. The Flink Runner will
> automatically choose to use streaming mode when it discovers
> UnboundedSources like Kafka. I'm wondering why that didn't work in
> your case. I just ran your example and it chose streaming mode and
> didn't return an error during pipeline translation.
>
> So I'm curious, which version of Beam are you working with?
>
> Best,
> Max
>
>
> On Wed, Aug 31, 2016 at 12:34 PM, Demin Alexey  wrote:
>> Thanks
>>
>> with if (translator == null || !applyCanTranslate(transform, node,
>> translator))  all working as expectected
>>
>>
>> Regards,
>> Alexey Diomin
>>
>>
>> 2016-08-31 14:12 GMT+04:00 Aljoscha Krettek :
>>
>>> Ah I see, an unbounded source, such as the Kafka source does not work in
>>> batch mode (which streamStreaming(false) enables). The code should work in
>>> streaming mode if you apply some window that is compatible with the
>>> side-input window to the main input.
>>>
>>> I think the code in streaming still works because there cannot be cases
>>> where the translator is null right now. The correct check should be this,
>>> though:
>>> if (translator == null || !applyCanTranslate(transform, node, translator))
>>>
>>> Cheers,
>>> Aljoscha
>>>
>>> On Wed, 31 Aug 2016 at 12:07 Demin Alexey  wrote:
>>>
>>> > Program for reproduce
>>> >
>>> > https://gist.github.com/xhumanoid/d784a4463a45e68acb124709a521156e
>>> >
>>> > 1) options.setStreaming(false);  - we have NPE and i can't understand how
>>> > code work
>>> > 2) options.setStreaming(true);  - pipeline can compile (he still have
>>> > error, but it's my incorrect work with window)
>>> >
>>> >
>>> > 2016-08-31 13:53 GMT+04:00 Demin Alexey :
>>> >
>>> > > Hi
>>> > >
>>> > > If we can change code on translator != null then next line (
>>> > > applyStreamingTransform(transform, node, translator); ) will cause NPE
>>> > >
>>> > > It's main problem why I don't understand code:
>>> > >
>>> > > x = null;
>>> > > if (x == null && f1_null_value_forbid(x)) { ..}
>>> > > f2_null_value_forbid(x);
>>> > >
>>> > > change (x == null) => (x !=null) simple change point of NPE
>>> > >
>>> > >
>>> > > 2016-08-31 13:43 GMT+04:00 Aljoscha Krettek :
>>> > >
>>> > >> Hi,
>>> > >> I think this is more suited for the Beam dev list. Nevertheless, I
>>> think
>>> > >> this is a coding error and the condition should be
>>> > >> if (translator != null && !applyCanTranslate(transform, node,
>>> > translator))
>>> > >>
>>> > >> With what program did you encounter an NPE, it seems to me that this
>>> > >> should
>>> > >> rarely happen, at least it doesn't happen in all the Beam runner
>>> tests.
>>> > >>
>>> > >> Cheers,
>>> > >> Aljoscha
>>> > >>
>>> > >> On Wed, 31 Aug 2016 at 11:27 Demin Alexey  wrote:
>>> > >>
>>> > >> > Hi
>>> > >> >
>>> > >> > Sorry if i mistake with mailing list.
>>> > >> >
>>> > >> > After  BEAM-102 was solved in FlinkStreamingPipelineTranslator we
>>> have
>>> > >> code
>>> > >> > in visitPrimitiveTransform:
>>> > >> >
>>> > >> >
>>> > >> > if (translator == null && applyCanTranslate(transform, node,
>>> > >> translator)) {
>>> > >> >   LOG.info(node.getTransform().getClass().toString());
>>> > >> >   throw new UnsupportedOperationException(
>>> > >> >   "The transform " + transform + " is currently not
>>> supported.");
>>> > >> > }
>>> > >> > applyStreamingTransform(transform, node, translator);
>>> > >> >
>>> > >> >
>>> > >> > but applyCanTranslate and applyStreamingTransform always require
>>> > NotNull
>>> > >> > translator
>>> > >> > as result if you try use side input in your code then you will cause
>>> > NPE
>>> > >> >
>>> > >> > Maybe Aljoscha Krettek could describe how this code must work?
>>> > >> >
>>> > >> >
>>> > >> > Regards,
>>> > >> > Alexey Diomin
>>> > >> >
>>> > >>
>>> > >
>>> > >
>>> >
>>>


Re: Suggestion for Writing Sink Implementation

2016-08-17 Thread Maximilian Michels
Hi Kenneth,

The problem is that the Write transform is not supported in streaming
execution of the Flink Runner because the streaming execution doesn't
currently support side inputs. PR is open to fix that..

Cheers,
Max

On Thu, Jul 28, 2016 at 8:56 PM, Kenneth Knowles  
wrote:
> Hi Sumit,
>
> I see what has happened here, from that snippet you pasted from the Flink
> runner's code [1]. Thanks for looking into it!
>
> The Flink runner today appears to reject Write.Bounded transforms in
> streaming mode if the sink is not an instance of UnboundedFlinkSink. The
> intent of that code, I believe, was to special case UnboundedFlinkSink to
> make it easy to use an existing Flink sink, not to disable all other Write
> transforms. What do you think, Max?
>
> Until we fix this issue, you should use ParDo transforms to do the writing.
> If you can share a little about your sink, we may be able to suggest
> patterns for implementing it. Like Eugene said, the Write.of(Sink)
> transform is just a specialized pattern of ParDo's, not a Beam primitive.
>
> Kenn
>
> [1]
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkStreamingTransformTranslators.java#L203
>
>
> On Wed, Jul 27, 2016 at 5:57 PM, Eugene Kirpichov <
> kirpic...@google.com.invalid> wrote:
>
>> Thanks Sumit. Looks like your question is, indeed, specific to the Flink
>> runner, and I'll then defer to somebody familiar with it.
>>
>> On Wed, Jul 27, 2016 at 5:25 PM Chawla,Sumit 
>> wrote:
>>
>> > Thanks a lot Eugene.
>> >
>> > >>>My immediate requirement is to run this Sink on FlinkRunner. Which
>> > mandates that my implementation must also implement SinkFunction<>.  In
>> > that >>>case, none of the Sink<> methods get called anyway.
>> >
>> > I am using FlinkRunner. The Sink implementation that i was writing by
>> > extending Sink<> class had to implement Flink Specific SinkFunction for
>> the
>> > correct translation.
>> >
>> > private static class WriteSinkStreamingTranslator implements
>> >
>> FlinkStreamingPipelineTranslator.StreamTransformTranslator>
>> > {
>> >
>> >   @Override
>> >   public void translateNode(Write.Bound transform,
>> > FlinkStreamingTranslationContext context) {
>> > String name = transform.getName();
>> > PValue input = context.getInput(transform);
>> >
>> > Sink sink = transform.getSink();
>> > if (!(sink instanceof UnboundedFlinkSink)) {
>> >   throw new UnsupportedOperationException("At the time, only
>> > unbounded Flink sinks are supported.");
>> > }
>> >
>> > DataStream> inputDataSet =
>> > context.getInputDataStream(input);
>> >
>> > inputDataSet.flatMap(new FlatMapFunction, Object>()
>> {
>> >   @Override
>> >   public void flatMap(WindowedValue value, Collector
>> > out) throws Exception {
>> > out.collect(value.getValue());
>> >   }
>> > }).addSink(((UnboundedFlinkSink)
>> > sink).getFlinkSource()).name(name);
>> >   }
>> > }
>> >
>> >
>> >
>> >
>> > Regards
>> > Sumit Chawla
>> >
>> >
>> > On Wed, Jul 27, 2016 at 4:53 PM, Eugene Kirpichov <
>> > kirpic...@google.com.invalid> wrote:
>> >
>> > > Hi Sumit,
>> > >
>> > > All reusable parts of a pipeline, including connectors to storage
>> > systems,
>> > > should be packaged as PTransform's.
>> > >
>> > > Sink is an advanced API that you can use under the hood to implement
>> the
>> > > transform, if this particular connector benefits from this API - but
>> you
>> > > don't have to, and many connectors indeed don't need it, and are
>> simpler
>> > to
>> > > implement just as wrappers around a couple of ParDo's writing the data.
>> > >
>> > > Even if the connector is implemented using a Sink, packaging the
>> > connector
>> > > as a PTransform is important because it's easier to apply in a pipeline
>> > and
>> > > because it's more future-proof (the author of the connector may later
>> > > change it to use something else rather than Sink under the hood without
>> > > breaking existing users).
>> > >
>> > > Sink is, currently, useful in the following case:
>> > > - You're writing a bounded amount of data (we do not yet have an
>> > unbounded
>> > > Sink analogue)
>> > > - The location you're writing to is known at pipeline construction
>> time,
>> > > and does not depend on the data itself (support for "data-dependent"
>> > sinks
>> > > is on the radar https://issues.apache.org/jira/browse/BEAM-92)
>> > > - The storage system you're writing to has a distinct "initialization"
>> > and
>> > > "finalization" step, allowing the write operation to appear atomic
>> > (either
>> > > all data is written or none). This mostly applies to files (where
>> writing
>> > > is done by first writing to a temporary directory, and then renaming
>> all
>> > > files to their final location), but there can be other cases too.
>> > >
>> > > Here's an example GCP connector using the Sink API under the hood:
>> > >
>> > >
>> >
>> https://github.com/apache/incub

Re: [KUDOS] Contributed runner: Gearpump!

2016-07-21 Thread Maximilian Michels
Big news. Congrats!

On Thu, Jul 21, 2016 at 9:11 AM, Ismaël Mejía  wrote:
> Congratulations Manu!
> Great work! Looking forward to test it with our current pipelines.
>
> Ismael.
>
>
> On Thu, Jul 21, 2016 at 8:23 AM, Amit Sela  wrote:
>
>> Congrats Manu!
>>
>> On Thu, Jul 21, 2016, 06:35 Frances Perry  wrote:
>>
>> > Awesome!
>> >
>> > On Wed, Jul 20, 2016 at 6:42 PM, Manu Zhang 
>> > wrote:
>> >
>> > > Thanks Kenn and others for the review and help along the way.  Feel
>> free
>> > to
>> > > ping me on slack if you want to know more about gearpump-runner or
>> > > gearpump.
>> > >
>> > > Thanks everyone :)
>> > > Manu Zhang
>> > >
>> > >
>> > >
>> > > On Thu, Jul 21, 2016 at 8:31 AM Jesse Anderson 
>> > > wrote:
>> > >
>> > > > Thanks and congrats!
>> > > >
>> > > > On Wed, Jul 20, 2016, 7:27 PM Kenneth Knowles > >
>> > > > wrote:
>> > > >
>> > > > > Hi all,
>> > > > >
>> > > > > I would like to call attention to a huge contribution to Beam: a
>> > runner
>> > > > for
>> > > > > Apache Gearpump (incubating).
>> > > > >
>> > > > > The runner landed on the gearpump-runner feature branch today.
>> Check
>> > it
>> > > > > out! And contribute towards getting it ready for the master branch
>> > :-)
>> > > > >
>> > > > > Please join me in special congratulations and thanks to Manu Zhang
>> > who
>> > > > > worked through everything on PR #323.
>> > > > >
>> > > > > Kenn
>> > > > >
>> > > >
>> > >
>> >
>>


Re: Adding DoFn Setup and Teardown methods

2016-07-18 Thread Maximilian Michels
Hoping it becomes usual as soon as we have this useful addition :)

On Mon, Jul 18, 2016 at 1:53 PM, Aljoscha Krettek 
wrote:

> Did you mean "usual" or "useful"? ;-)
>
> On Mon, 18 Jul 2016 at 12:42 Maximilian Michels  wrote:
>
> > +1 for setup() and teardown() methods. Very usual for proper
> initialization
> > and cleanup of DoFn related data structures.
> >
> > On Wed, Jun 29, 2016 at 9:34 PM, Aljoscha Krettek 
> > wrote:
> >
> > > +1 I think some people might already mistake the
> > > startBundle()/finishBundle() methods for what the new methods are
> > supposed
> > > to be
> > >
> > > On Tue, 28 Jun 2016 at 19:38 Raghu Angadi 
> > > wrote:
> > >
> > > > This is terrific!
> > > > Thanks for the proposal.
> > > >
> > > > On Tue, Jun 28, 2016 at 9:06 AM, Thomas Groh
>  > >
> > > > wrote:
> > > >
> > > > > Hey Everyone:
> > > > >
> > > > > We've recently started to be permitted to reuse DoFn instances in
> > > > Beam[1].
> > > > > Beyond the efficiency gains from not having to deserialize new DoFn
> > > > > instances for every bundle, DoFn reuse also provides the ability to
> > > > > minimize expensive setup work done per-bundle, which hasn't
> formerly
> > > been
> > > > > possible. Additionally, it has also enabled more failure cases,
> where
> > > > > element-based state leaks improperly across bundles.
> > > > >
> > > > > I've written a document proposing that two methods are added to the
> > API
> > > > of
> > > > > DoFn, setup and teardown, which both provides hooks for users to
> > write
> > > > > efficient DoFns, as well as signals that DoFns will be reused.
> > > > >
> > > > > The document is located at
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1LLQqggSePURt3XavKBGV7SZJYQ4NW8yCu63lBchzMRk/edit?ts=5771458f#
> > > > > and committers have edit access
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Thomas
> > > > >
> > > > > [1] https://github.com/apache/incubator-beam/pull/419
> > > > >
> > > >
> > >
> >
>


Re: Adding DoFn Setup and Teardown methods

2016-07-18 Thread Maximilian Michels
+1 for setup() and teardown() methods. Very usual for proper initialization
and cleanup of DoFn related data structures.

On Wed, Jun 29, 2016 at 9:34 PM, Aljoscha Krettek 
wrote:

> +1 I think some people might already mistake the
> startBundle()/finishBundle() methods for what the new methods are supposed
> to be
>
> On Tue, 28 Jun 2016 at 19:38 Raghu Angadi 
> wrote:
>
> > This is terrific!
> > Thanks for the proposal.
> >
> > On Tue, Jun 28, 2016 at 9:06 AM, Thomas Groh 
> > wrote:
> >
> > > Hey Everyone:
> > >
> > > We've recently started to be permitted to reuse DoFn instances in
> > Beam[1].
> > > Beyond the efficiency gains from not having to deserialize new DoFn
> > > instances for every bundle, DoFn reuse also provides the ability to
> > > minimize expensive setup work done per-bundle, which hasn't formerly
> been
> > > possible. Additionally, it has also enabled more failure cases, where
> > > element-based state leaks improperly across bundles.
> > >
> > > I've written a document proposing that two methods are added to the API
> > of
> > > DoFn, setup and teardown, which both provides hooks for users to write
> > > efficient DoFns, as well as signals that DoFns will be reused.
> > >
> > > The document is located at
> > >
> > >
> >
> https://docs.google.com/document/d/1LLQqggSePURt3XavKBGV7SZJYQ4NW8yCu63lBchzMRk/edit?ts=5771458f#
> > > and committers have edit access
> > >
> > > Thanks,
> > >
> > > Thomas
> > >
> > > [1] https://github.com/apache/incubator-beam/pull/419
> > >
> >
>


Re: Improvements to issue/version tracking

2016-06-30 Thread Maximilian Michels
+1

>For us normally resolved issues will always have a development version as
>"Fix Versions" field, so the issue will only be closed when the version
>that includes that issue (bug, feature or whatever) actually gets released.

I think it should be optional as Davor suggested because you don't
always want to fix all open issues in the next release.

On Wed, Jun 29, 2016 at 10:58 PM, Amit Sela  wrote:
> +1
>
> On Wed, Jun 29, 2016 at 12:04 AM Lukasz Cwik 
> wrote:
>
>> +1
>>
>> On Tue, Jun 28, 2016 at 12:15 PM, Kenneth Knowles 
>> wrote:
>>
>> > +1
>> >
>> > On Tue, Jun 28, 2016 at 12:06 AM, Jean-Baptiste Onofré 
>> > wrote:
>> >
>> > > +1
>> > >
>> > > Regards
>> > > JB
>> > >
>> > >
>> > > On 06/28/2016 01:01 AM, Davor Bonaci wrote:
>> > >
>> > >> Hi everyone,
>> > >> I'd like to propose a simple change in Beam JIRA that will hopefully
>> > >> improve our issue and version tracking -- to actually use the "Fix
>> > >> Versions" field as intended [1].
>> > >>
>> > >> The goal would be to simplify issue tracking, streamline generation of
>> > >> release notes, add a view of outstanding work towards a release, and
>> > >> clearly communicate which Beam version contains fixes for each issue.
>> > >>
>> > >> The standard usage of the field is:
>> > >> * For open (or in-progress/re-opened) issues, "Fix Versions" field is
>> > >> optional and indicates an unreleased version that this issue is
>> > targeting.
>> > >> The release is not expected to proceed unless this issue is fixed, or
>> > the
>> > >> field is changed.
>> > >> * For closed (or resolved) issues, "Fix Versions" field indicates a
>> > >> released or unreleased version that has the fix.
>> > >>
>> > >> I think the field should be mandatory once the issue is
>> resolved/closed
>> > >> [4], so we make a deliberate choice about this. I propose we use "Not
>> > >> applicable" for all those issues that aren't being resolved as Fixed
>> > >> (e.g.,
>> > >> duplicates, working as intended, invalid, etc.) and those that aren't
>> > >> released (e.g., website, build system, etc.).
>> > >>
>> > >> We can then trivially view outstanding work for the next release [2],
>> or
>> > >> generate release notes [3].
>> > >>
>> > >> I'd love to hear if there are any comments! I know that at least JB
>> > >> agrees,
>> > >> as he was convincing me on this -- thanks ;).
>> > >>
>> > >> Thanks,
>> > >> Davor
>> > >>
>> > >> [1]
>> > >>
>> > >>
>> >
>> https://confluence.atlassian.com/adminjiraserver071/managing-versions-802592484.html
>> > >> [2]
>> > >>
>> > >>
>> >
>> https://issues.apache.org/jira/browse/BEAM/fixforversion/12335766/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel
>> > >> [3]
>> > >>
>> > >>
>> >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12335764
>> > >> [4] https://issues.apache.org/jira/browse/INFRA-12120
>> > >>
>> > >>
>> > > --
>> > > Jean-Baptiste Onofré
>> > > jbono...@apache.org
>> > > http://blog.nanthrax.net
>> > > Talend - http://www.talend.com
>> > >
>> >
>>


Re: DoFn Reuse

2016-06-13 Thread Maximilian Michels
Thanks for the clarification, Thomas. I think we have to improve the
bundle execution of the Flink Runner. It is also not uniform among
batch/streaming execution and different operators.

On Wed, Jun 8, 2016 at 7:43 PM, Raghu Angadi  wrote:
> On Wed, Jun 8, 2016 at 10:39 AM, Ben Chambers 
> wrote:
>
>>
>> To clarify -- this case is actually not allowed by the beam model. The
>> guarantee is that either a bundle is successfully completed (startBundle,
>> processElement*, finishBundle, commit) or not. If it isn't, then the bundle
>> is reprocessed. So, if a `DoFn` instance builds up any state while
>> processing a bundle and a failure happens at any point prior to the commit,
>> it will be retried. Even though the actual state in the first `DoFn` was
>> lost, the second attempt will build up the same state.
>
>
> Makes sense. I missed the fact that finshBundle(Context) could emit more
> records, which affects the pipeline state.


Re: 0.1.0-incubating release

2016-06-08 Thread Maximilian Michels
I like the compromise on the Maven naming scheme. Thanks for
incorporating all the feedback!

On Wed, Jun 8, 2016 at 6:49 AM, Jean-Baptiste Onofré  wrote:
> Hi Taylor,
>
> Just to be clearn, in most other projects, we stage the distributions on
> repository. We upload the distro and signatures to dist.apache.org only when
> the vote passed.
>
> Basically, the release process I talked with Davor (and that I will
> document) is:
> - Tag and stage using mvn release:prepare release:perform
> - Close repo
> - Start vote
> - If passed, forward vote to incubator
> - If passed, close repo
> - Upload distro to dist
> - Announce the release (mailing lists, website)
>
> It's based on what I do in Karaf, ServiceMix, etc.
>
> Regards
> JB
>
>
> On 06/08/2016 02:39 AM, P. Taylor Goetz wrote:
>>
>> Out of curiosity, is there a reason for distributing the release on
>> repository.a.o vs. dist.a.o?
>>
>> In my experience repository.a.o has traditionally been used for maven
>> artifacts, and dist.a.o has been for release artifacts (source archives and
>> convenience binaries).
>>
>> I'd be happy to help with documenting the process.
>>
>> I ask because this might come up during an IPMC release vote.
>>
>> -Taylor
>>
>>> On Jun 1, 2016, at 9:46 PM, Davor Bonaci 
>>> wrote:
>>>
>>> Hi everyone!
>>> We've started the release process for our first release,
>>> 0.1.0-incubating.
>>>
>>> To recap previous discussions, we don't have particular functional goals
>>> for this release. Instead, we'd like to make available what's currently
>>> in
>>> the repository, as well as work through the release process.
>>>
>>> With this in mind, we've:
>>> * branched off the release branch [1] at master's commit 8485272,
>>> * updated master to prepare for the second release, 0.2.0-incubating,
>>> * built the first release candidate, RC1, and deployed it to a staging
>>> repository [2].
>>>
>>> We are not ready to start a vote just yet -- we've already identified a
>>> few
>>> issues worth fixing. That said, I'd like to invite everybody to take a
>>> peek
>>> and comment. I'm hoping we can address as many issues as possible before
>>> we
>>> start the voting process.
>>>
>>> Please let us know if you see any issues.
>>>
>>> Thanks,
>>> Davor
>>>
>>> [1]
>>> https://github.com/apache/incubator-beam/tree/release-0.1.0-incubating
>>> [2]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1000/
>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: 0.1.0-incubating release

2016-06-03 Thread Maximilian Michels
Thanks for getting us ready for the first release, Davor! We would
like to fix BEAM-315 next week. Is there already a timeline for the
first release? If so, we could also address this in a minor release.
Releasing often will give us some experience with our release process
:)

I would like everyone to think about the artifact names and group ids
again. "parent" and "flink" are not very suitable names for the Beam
parent or the Flink Runner artifact (same goes for the Spark Runner).
I'd prefer "beam-parent", "flink-runner", and "spark-runner" as
artifact ids.

One might think of Maven GroupIds as a sort of hierarchy but they're
not. They're just an identifier. Renaming the parent pom to
"apache-beam" or "beam-parent" would give us the old naming scheme
which used flat group ids (before [1]).

In the end, I guess it doesn't matter too much if we document the
naming schemes accordingly. What matters is that we use a consistent
naming scheme.

Cheers,
Max

[1] https://issues.apache.org/jira/browse/BEAM-287


On Thu, Jun 2, 2016 at 4:00 PM, Jean-Baptiste Onofré  wrote:
> Actually, I think we can fix both issue in one commit.
>
> What do you think about renaming the main parent POM with:
> groupId: org.apache.beam
> artifactId: apache-beam
>
> ?
>
> Thanks to that, the source distribution will be named
> apache-beam-xxx-sources.zip and it would be clearer to dev.
>
> Thoughts ?
>
> Regards
> JB
>
>
> On 06/02/2016 03:10 PM, Jean-Baptiste Onofré wrote:
>>
>> Another annoying thing is the main parent POM artifactId.
>>
>> Now, it's just "parent". What do you think about renaming to
>> "beam-parent" ?
>>
>> Regarding the source distribution name, I would cancel this staging to
>> fix that (I will have a PR ready soon).
>>
>> Thoughts ?
>>
>> Regards
>> JB
>>
>> On 06/02/2016 03:46 AM, Davor Bonaci wrote:
>>>
>>> Hi everyone!
>>> We've started the release process for our first release,
>>> 0.1.0-incubating.
>>>
>>> To recap previous discussions, we don't have particular functional goals
>>> for this release. Instead, we'd like to make available what's
>>> currently in
>>> the repository, as well as work through the release process.
>>>
>>> With this in mind, we've:
>>> * branched off the release branch [1] at master's commit 8485272,
>>> * updated master to prepare for the second release, 0.2.0-incubating,
>>> * built the first release candidate, RC1, and deployed it to a staging
>>> repository [2].
>>>
>>> We are not ready to start a vote just yet -- we've already identified
>>> a few
>>> issues worth fixing. That said, I'd like to invite everybody to take a
>>> peek
>>> and comment. I'm hoping we can address as many issues as possible
>>> before we
>>> start the voting process.
>>>
>>> Please let us know if you see any issues.
>>>
>>> Thanks,
>>> Davor
>>>
>>> [1]
>>> https://github.com/apache/incubator-beam/tree/release-0.1.0-incubating
>>> [2]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1000/
>>>
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: [PROPOSAL] IRC or slack channel for Apache Beam

2016-05-24 Thread Maximilian Michels
Thanks. I've also invited Aljoscha Krettek and Kostas Kloudas.

On Tue, May 24, 2016 at 2:15 PM, Jean-Baptiste Onofré  wrote:
> Hi Max,
>
> I just invited you.
>
> Regards
> JB
>
>
> On 05/24/2016 02:12 PM, Maximilian Michels wrote:
>>
>> +1 for Slack.
>>
>> @James Could you invite me?
>>
>> On Thu, May 19, 2016 at 9:24 PM, James Malone
>>  wrote:
>>>
>>> Hi all,
>>>
>>> It sounds like Slack is the clear winner here. So, I am happy to say that
>>> we now have our own Slack Team, open to all!
>>>
>>> http://apachebeam.slack.com
>>>
>>> Once I created the Slack team, it rejected the large blanket list of
>>> "acceptable email domains" I wanted to use (so signup is painless.)
>>> Instead, it looks like we'll have to use an invite system. I've already
>>> modified the team so anyone can invite anyone else (to make it easy to
>>> grow
>>> the Beam community.) But, we will need to manually invite some people to
>>> get this process started.
>>>
>>> If you'd like an invite today, can you please email me -
>>> jamesmal...@apache.org and I will invite you ASAP.
>>>
>>> Best,
>>>
>>> James
>>>
>>> On Thu, May 19, 2016 at 9:36 AM, Milindu Sanoj Kumarage <
>>> agentmili...@gmail.com> wrote:
>>>
>>>> +1 for Slack.
>>>> On 19 May 2016 5:43 p.m., "GANESH RAJU"  wrote:
>>>>
>>>>> +1 on slack
>>>>>
>>>>> Ganesh Raju
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>>> On May 18, 2016, at 3:41 AM, Jean-Baptiste Onofré 
>>>>>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Good point Robert.
>>>>>>
>>>>>> I will be on the channel for sure (I'm already on bunch of Apache IRC
>>>>>
>>>>> channels ;)).
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>>> On 05/18/2016 10:26 AM, Robert Bradshaw wrote:
>>>>>>> The value in such a channel is highly dependent on people regularly
>>>>>>> being there--do we have a critical mass of developers that would hang
>>>>>>> out there? If so, I'd say go for it.
>>>>>>>
>>>>>>>> On Wed, May 18, 2016 at 12:51 AM, Amit Sela 
>>>>>
>>>>> wrote:
>>>>>>>>
>>>>>>>> +1 for Slack
>>>>>>>>
>>>>>>>> On Wed, May 18, 2016 at 10:47 AM Jean-Baptiste Onofré <
>>>>
>>>> j...@nanthrax.net
>>>>>>
>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> What do you think about creating a #apache-beam IRC channel on
>>>>>
>>>>> freenode
>>>>>>>>>
>>>>>>>>> ? Or if it's more convenient a channel on Slack ?
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> JB
>>>>>>>>> --
>>>>>>>>> Jean-Baptiste Onofré
>>>>>>>>> jbono...@apache.org
>>>>>>>>> http://blog.nanthrax.net
>>>>>>>>> Talend - http://www.talend.com
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jean-Baptiste Onofré
>>>>>> jbono...@apache.org
>>>>>> http://blog.nanthrax.net
>>>>>> Talend - http://www.talend.com
>>>>>
>>>>>
>>>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: [PROPOSAL] IRC or slack channel for Apache Beam

2016-05-24 Thread Maximilian Michels
+1 for Slack.

@James Could you invite me?

On Thu, May 19, 2016 at 9:24 PM, James Malone
 wrote:
> Hi all,
>
> It sounds like Slack is the clear winner here. So, I am happy to say that
> we now have our own Slack Team, open to all!
>
> http://apachebeam.slack.com
>
> Once I created the Slack team, it rejected the large blanket list of
> "acceptable email domains" I wanted to use (so signup is painless.)
> Instead, it looks like we'll have to use an invite system. I've already
> modified the team so anyone can invite anyone else (to make it easy to grow
> the Beam community.) But, we will need to manually invite some people to
> get this process started.
>
> If you'd like an invite today, can you please email me -
> jamesmal...@apache.org and I will invite you ASAP.
>
> Best,
>
> James
>
> On Thu, May 19, 2016 at 9:36 AM, Milindu Sanoj Kumarage <
> agentmili...@gmail.com> wrote:
>
>> +1 for Slack.
>> On 19 May 2016 5:43 p.m., "GANESH RAJU"  wrote:
>>
>> > +1 on slack
>> >
>> > Ganesh Raju
>> >
>> > Sent from my iPhone
>> >
>> > > On May 18, 2016, at 3:41 AM, Jean-Baptiste Onofré 
>> > wrote:
>> > >
>> > > Good point Robert.
>> > >
>> > > I will be on the channel for sure (I'm already on bunch of Apache IRC
>> > channels ;)).
>> > >
>> > > Regards
>> > > JB
>> > >
>> > >> On 05/18/2016 10:26 AM, Robert Bradshaw wrote:
>> > >> The value in such a channel is highly dependent on people regularly
>> > >> being there--do we have a critical mass of developers that would hang
>> > >> out there? If so, I'd say go for it.
>> > >>
>> > >>> On Wed, May 18, 2016 at 12:51 AM, Amit Sela 
>> > wrote:
>> > >>> +1 for Slack
>> > >>>
>> > >>> On Wed, May 18, 2016 at 10:47 AM Jean-Baptiste Onofré <
>> j...@nanthrax.net
>> > >
>> > >>> wrote:
>> > >>>
>> >  Hi all,
>> > 
>> >  What do you think about creating a #apache-beam IRC channel on
>> > freenode
>> >  ? Or if it's more convenient a channel on Slack ?
>> > 
>> >  Regards
>> >  JB
>> >  --
>> >  Jean-Baptiste Onofré
>> >  jbono...@apache.org
>> >  http://blog.nanthrax.net
>> >  Talend - http://www.talend.com
>> > >
>> > > --
>> > > Jean-Baptiste Onofré
>> > > jbono...@apache.org
>> > > http://blog.nanthrax.net
>> > > Talend - http://www.talend.com
>> >
>>


Re: Using Side Inputs to Join with Static Data Sets

2016-05-13 Thread Maximilian Michels
Hi Stephan,

As far as I understand side inputs, by definition, always need to be
"ready" before processing of any kind can start. What is considered
ready depends on the type of side input. If you use View.asList() or
View.asSingleton() then the whole side input needs to be materialized.
On the other hand, if you use View.asIterable(), processing can start
once the the first element arrives.

If the side input itself is windowed, then the notion of "ready" only
applies to the individual windows. Side Input itself can also be
scoped by key if you use the View.asMap() or View.asMultimap() side
inputs views.

>From a quick look at the InProcessRunner it appears that processing
does not start until the side input of the window is ready. Beam
experts, please correct me if I got this wrong.

Cheers,
Max

On Fri, May 13, 2016 at 1:12 PM, Stephan Ewen  wrote:
> Hi!
>
> Aljoscha and me have been going through the side inputs quite a bit, and we
> were wondering about the following:
>
> How does one properly join a static data set with a stream?.
>
> This sounds like a job for a side input, but would require that the side
> input materializes the initial static data before the main input can begin
> processing.
>
> Given that the static data set is in a global window, and the Beam side
> inputs only wait for the first element in the window to be available, the
> main input would start joining against the side input prematurely.
>
> Is this simply considered an uncommon use case, or is there a way to
> realize this that we overlooked?
>
> Greetings,
> Stephan


Re: add component tag to pull request title / commit comment

2016-05-12 Thread Maximilian Michels
@Aljoscha: Yes, less space for the commit message if it has to fit
into 70 chars. I'm not sure how important a 70 character limit is
nowadays.

@Davor I think your observation is correct. In Flink we also like to
do tagged commits but enforcing it is almost impossible. Automatic
tagging would be super cool.

On Wed, May 11, 2016 at 4:11 PM, Davor Bonaci  wrote:
> In general, I think this is fine to try, but I think its reliability will
> likely end up being low. There's a human factor involved -- people will
> forget, pull request will evolve in unexpected ways -- all leading to a low
> reliability of this indicator.
>
> I think it would be really awesome to have automatic tagging. Not really on
> the priority list right now, but I'm hopeful we'll have something like that
> in the not-so-distant future.
>
> On Wed, May 11, 2016 at 3:00 AM, Aljoscha Krettek 
> wrote:
>
>> This will, however, also take precious space in the Commit Title. And some
>> commits might not be about only one clear-cut component.
>>
>> On Wed, 11 May 2016 at 11:43 Maximilian Michels  wrote:
>>
>> > +1 I think it makes it easier to see at a glance to which part of Beam
>> > a commit belongs. We could use the Jira components as tags.
>> >
>> > On Wed, May 11, 2016 at 11:09 AM, Jean-Baptiste Onofré 
>> > wrote:
>> > > Hi Manu,
>> > >
>> > > good idea. Theoretically the component in the corresponding Jira should
>> > give
>> > > the information, but for convenience, we could add a "tag" in the
>> commit
>> > > comment.
>> > >
>> > > Regards
>> > > JB
>> > >
>> > >
>> > > On 05/10/2016 06:27 AM, Manu Zhang wrote:
>> > >>
>> > >> Guys,
>> > >>
>> > >> As I've been developing Gearpump runner for Beam, I've closely watched
>> > the
>> > >> pull requests updates for API changes. Sometimes, it turns out changes
>> > are
>> > >> only to be applied to a specific runner after I go through the codes.
>> > >> Could
>> > >> we add a tag in the pull request title / commit comment to mark
>> whether
>> > >> it's API change or Runner specific change, or something else, like,
>> > >> [BEAM-XXX] [Direct-Java] Comments... ?
>> > >>
>> > >> Thanks,
>> > >> Manu Zhang
>> > >>
>> > >
>> > > --
>> > > Jean-Baptiste Onofré
>> > > jbono...@apache.org
>> > > http://blog.nanthrax.net
>> > > Talend - http://www.talend.com
>> >
>>


Re: add component tag to pull request title / commit comment

2016-05-11 Thread Maximilian Michels
+1 I think it makes it easier to see at a glance to which part of Beam
a commit belongs. We could use the Jira components as tags.

On Wed, May 11, 2016 at 11:09 AM, Jean-Baptiste Onofré  
wrote:
> Hi Manu,
>
> good idea. Theoretically the component in the corresponding Jira should give
> the information, but for convenience, we could add a "tag" in the commit
> comment.
>
> Regards
> JB
>
>
> On 05/10/2016 06:27 AM, Manu Zhang wrote:
>>
>> Guys,
>>
>> As I've been developing Gearpump runner for Beam, I've closely watched the
>> pull requests updates for API changes. Sometimes, it turns out changes are
>> only to be applied to a specific runner after I go through the codes.
>> Could
>> we add a tag in the pull request title / commit comment to mark whether
>> it's API change or Runner specific change, or something else, like,
>> [BEAM-XXX] [Direct-Java] Comments... ?
>>
>> Thanks,
>> Manu Zhang
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: (Virtual) Beam Developers Meetup - 5/4 @ 8am PDT

2016-05-03 Thread Maximilian Michels
Hi Frances,

Thanks for setting up the next Dev meeting. I'd like to join and see
how everyone is doing. I'll add some content to the slides.

Cheers,
Max

On Mon, May 2, 2016 at 3:06 AM, Frances Perry  wrote:
> Looks like we can do up to 25 participants (since we can start it via a
> Google Apps for Work account) and I expect a bunch of us will join from the
> same room. So that might still be enough unless there's a lot of interest.
>
> On Fri, Apr 29, 2016 at 10:13 PM, Seetharam Venkatesh <
> venkat...@innerzeal.com> wrote:
>
>> In my experience, G+Hangouts has a limitation of 10 attendees which is
>> lame. You could explore zoom or some free conferencing service.
>>
>> Thanks!
>>
>> On Fri, Apr 29, 2016 at 9:49 PM Frances Perry 
>> wrote:
>>
>> > Yes -- will definitely aim to present it in a way that newbies can learn
>> > and participate!
>> >
>> > On Fri, Apr 29, 2016 at 9:42 PM, Yash Sharma  wrote:
>> >
>> > > +1 for google hangout.
>> > > Will the discussion be useful for newbie contributors?
>> > >
>> > > - Thanks, via mobile,  excuse brevity.
>> > > On Apr 30, 2016 2:39 PM, "Frances Perry" 
>> wrote:
>> > >
>> > > > As discussed earlier this month, we're going to try a virtual Beam
>> > > meeting
>> > > > for anyone who is interested in joining.
>> > > >
>> > > > *When:* Wednesday 5/4 at 8am PDT
>> > > >
>> > > > *Where: *Google Hangouts ok by folks? Alternative suggestions?
>> > > >
>> > > > *Agenda: *Here's a list to get us started. Please suggest other
>> things
>> > > > you'd like to discuss!
>> > > >
>> > > >- *Overview*: introductions, project vision and priorities
>> > > >- *Project Structure Updates*: infrastructure, module
>> organization,
>> > > >inter-component APIs
>> > > >- *Component Updates*: Java SDK, Spark/Flink/CloudDataflow runners
>> > > >- *First Release*: timing, contents
>> > > >- *Documentation: *for users, for developers, samples & examples
>> > > >
>> > > > If you'd like to create any slides, feel free to add them here
>> > > > <
>> > > >
>> > >
>> >
>> https://docs.google.com/presentation/d/17i7SHViboWtLEZw27iabdMisPl987WWxvapJaXg_dEE/edit?usp=sharing
>> > > > >
>> > > > so
>> > > > we have things in a single place. (Just add a comment if you need
>> > > access.)
>> > > >
>> > > > Frances
>> > > >
>> > >
>> >
>>


Re: [DISCUSS] Beam IO &runners native IO

2016-05-03 Thread Maximilian Michels
Correct, Kafka doesn't support rollbacks of the producer. In Flink
there is the RollingSink which supports transactional rolling files.
Admittedly, that is the only one. Still, checkpointing sinks in Beam
could be useful for users who are concerned about exactly once
semantics. I'm not sure whether we can implement something similar
with the bundle mechanism.

On Mon, May 2, 2016 at 11:50 PM, Raghu Angadi
 wrote:
> What are good examples of streaming sinks that support checkpointing (or
> transactions/rollbacks)? I don't Kafka supports a rollback.
>
> On Mon, May 2, 2016 at 2:54 AM, Maximilian Michels  wrote:
>
>> Yes, I would expect sinks to provide similar additional interfaces
>> like sources, e.g. checkpointing. We could also use the
>> startBundle/processElement/finishBundle lifecycle methods to implement
>> checkpointing. I just wonder, if we want to make it more explicit.
>> Also, does it make sense that sinks can return a PCollection? You can
>> return PDone but you don't have to.
>>
>> Since sinks are fundamental in streaming pipelines, it just seemed odd
>> to me that there is not dedicated interface. I understand a bit
>> clearer now that it is not viewed as crucial because we can use
>> existing primitives to create sinks. In a way, that might be elegant
>> but also less explicit.
>>
>> On Fri, Apr 29, 2016 at 11:00 PM, Frances Perry 
>> wrote:
>> >>
>> >> @Frances Sources are not simple DoFns. They add additional
>> >> functionality, e.g. checkpointing, watermark generation, creating
>> >> splits. If we want sinks to be portable, we should think about a
>> >> dedicated interface. At least for the checkpointing.
>> >>
>> >
>> > We might be mixing sources and sinks in this conversation. ;-) Sources
>> > definitely provide additional functionality as you mentioned. But at
>> least
>> > currently, sinks don't provide any new primitive functionality. Are you
>> > suggestion there needs to be a checkpointing interface for sinks beyond
>> > DoFn's bundle finalization? (Note that the existing Write for batch is
>> just
>> > a PTransform based around ParDo.)
>>


Re: IO timelines (Was: How to read/write avro data using FlinkKafka Consumer/Producer)

2016-05-03 Thread Maximilian Michels
Hi Davor, hi Dan,

The discussion was partly held in JB's thread about "useNative()" for
Beam transforms. I think we reached consensus that we prefer portable
out of the box Beam IO over Runner specific IO.

On Thu, Apr 28, 2016 at 10:03 AM, Davor Bonaci 
wrote:
>
> Of course, there's a long way to go, but there should *not* be any users that 
> are
> blocked or scenarios that are impossible.


Exactly, while we are transitioning, let's not make any scenarios
impossible. For example, if users want to use Kafka 8, they should be
able to use the Flink Consumer/Producer as long as there is no support
yet. Exchanging sources/sinks in Beam programs is relatively easy to
do; users still get to keep all the nice semantics. Once we have a
decent support for IO, these wrappers should go away.

>  - Complete conversion of existing IOs to the Source / Sink API. ETA: a
>   week or two for full completion.


Which ones are there which have not been converted? From a first
glance, I see AvroIO, BigQueryIO, and PubsubIO. Only sources should be
affected because they have a dedicated interface; sinks are ParDos.

>   - Make sure Spark & Flink runners fully support Source / Sink API, and
>   that ties into the new Runner / Fn API discussion.


Yes, it's not hard to fix those. We will fix those as soon as possible.

Cheers,
Max


On Tue, May 3, 2016 at 1:06 AM, Dan Halperin
 wrote:
>
> Coming back from vacation, sorry for delay.
>
> I agree with Davor. While it's nice to have a `UnboundedFlinkSource`
> <https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedFlinkSource.java>
> wrapper that can be used to convert a Flink-specific source into one that
> can be used in Beam, I'd hope this is a temporary stop-gap until we have
> Beam-API versions of these for all important connectors. If a
> Flink-specific source has major advantages over one that can be implemented
> in Beam, we should explore why and see whether we have gaps in our APIs.
>
> (And similar analogs for Spark and for Sinks, Transforms libraries, and
> everything else).
>
> Thanks!
> Dan
>
> On Thu, Apr 28, 2016 at 10:03 AM, Davor Bonaci 
> wrote:
>
> > [ Moving over to the dev@ list ]
> >
> > I think we should be aiming a little higher than "trying out Beam" ;)
> >
> > Beam SDK currently has built-in IOs for Kafka, as well as for all important
> > Google Cloud Platform services. Additionally, there are pull requests for
> > Firebase and Cassandra. This is not bad, particularly talking into account
> > that we have APIs for user to develop their own IO connectors. Of course,
> > there's a long way to go, but there should *not* be any users that are
> > blocked or scenarios that are impossible.
> >
> > In terms of the runner support, Cloud Dataflow runner supports all IOs,
> > including any user-written ones. Other runners don't as extensively, but
> > this is a high priority item to address.
> >
> > In my mind, we should strive to address the following:
> >
> >- Complete conversion of existing IOs to the Source / Sink API. ETA: a
> >week or two for full completion.
> >- Make sure Spark & Flink runners fully support Source / Sink API, and
> >that ties into the new Runner / Fn API discussion.
> >- Increase the set of built-in IOs. No ETA; iterative process over time.
> >There are 2 pending pull requests, others in development.
> >
> > I'm hopeful we can address all of these items in a relatively short period
> > of time -- in a few months or so -- and likely before we can call any
> > release "stable". (This is why the new Runner / Fn API discussions are so
> > important.)
> >
> > In summary, in my mind, "long run" here means "< few months".
> >
> > -- Forwarded message --
> > From: Maximilian Michels 
> > Date: Thu, Apr 28, 2016 at 3:20 AM
> > Subject: Re: How to read/write avro data using FlinkKafka Consumer/Producer
> > (Beam Pipeline) ?
> > To: u...@beam.incubator.apache.org
> >
> > On Wed, Apr 27, 2016 at 11:12 PM, Jean-Baptiste Onofré 
> > wrote:
> > > generally speaking, we have to check that all runners work fine with the
> > provided IO. I don't think it's a good idea that the runners themselves
> > implement any IO: they should use "out of the box" IO.
> >
> > In the long run, big yes and I liked to help to make it possible!
> > However, there is still a gap between what Beam and its Runners
> > provide and what users want to do. For the time being, I think the
> > solution we have is fine. It gives users the option to try out Beam
> > with sources and sinks that they expect to be available in streaming
> > systems.
> >


Re: [DISCUSS] Beam IO &runners native IO

2016-05-02 Thread Maximilian Michels
Yes, I would expect sinks to provide similar additional interfaces
like sources, e.g. checkpointing. We could also use the
startBundle/processElement/finishBundle lifecycle methods to implement
checkpointing. I just wonder, if we want to make it more explicit.
Also, does it make sense that sinks can return a PCollection? You can
return PDone but you don't have to.

Since sinks are fundamental in streaming pipelines, it just seemed odd
to me that there is not dedicated interface. I understand a bit
clearer now that it is not viewed as crucial because we can use
existing primitives to create sinks. In a way, that might be elegant
but also less explicit.

On Fri, Apr 29, 2016 at 11:00 PM, Frances Perry  wrote:
>>
>> @Frances Sources are not simple DoFns. They add additional
>> functionality, e.g. checkpointing, watermark generation, creating
>> splits. If we want sinks to be portable, we should think about a
>> dedicated interface. At least for the checkpointing.
>>
>
> We might be mixing sources and sinks in this conversation. ;-) Sources
> definitely provide additional functionality as you mentioned. But at least
> currently, sinks don't provide any new primitive functionality. Are you
> suggestion there needs to be a checkpointing interface for sinks beyond
> DoFn's bundle finalization? (Note that the existing Write for batch is just
> a PTransform based around ParDo.)


Re: [DISCUSS] Beam IO &runners native IO

2016-04-29 Thread Maximilian Michels
@Raghu Thanks for the explanation. I had already realized after
Aljoscha's comment that Kafka enforces this data model. I was too much
in the Flink land, where we usually only use the value part (although
it its also possible to set the key).
Good to hear you agree on factoring out the watermark/timestamp
extraction API methods into a separate interface so that it can be
reused across sources.

@Frances Sources are not simple DoFns. They add additional
functionality, e.g. checkpointing, watermark generation, creating
splits. If we want sinks to be portable, we should think about a
dedicated interface. At least for the checkpointing.

To come back to the original question, I think we reached a consensus
that we don't want a 'useNative()' method on Beam sources :)

On Fri, Apr 29, 2016 at 5:14 PM, Raghu Angadi
 wrote:
> On Fri, Apr 29, 2016 at 2:11 AM, Maximilian Michels  wrote:
>
>> Further, the KafkaIO enforces a  data model which AFAIK is
>> not enforced by the Beam model. I don't know the details for this
>> design decision but I would like this to be communicated before it is
>> merged into the master.
>>
>
>  structure comes from Kafka. It is not related to Beam or any
> runner. I am not sure if I understood the concern here correctly. Every Kafka
> record
> <https://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/ConsumerRecord.html#ConsumerRecord(java.lang.String,%20int,%20long,%20K,%20V)>
> has key, value, and some metadata like offset and partition. KafkaIO
> directly reflects that.
>
> As most users don't need Kafka metadata, we added '.withoutMetadata()
> <https://github.com/apache/incubator-beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L130>'
> that returns just .
>
> One can argue many kafka consumers don't need key either. We debated adding
> another helper method '.dropKey()', but left it out. Users can easily drop
> keys in Beam :
>
>pipeline.apply(KafkaIO.read()withoutMetaData())
>*.apply(Values.create())*
>
>
>> Otherwise, it will be harder to achieve
>> portability among the Runners. In Flink and Spark we are already
>> experienced with all kinds of connectors and user's needs. It would be
>> nice to feed that back in the course of adding new connectors to the
>> Beam API.
>>


Re: [DISCUSS] Beam IO &runners native IO

2016-04-29 Thread Maximilian Michels
@Aljoscha I didn't know that Kafka always stores Key/Value but I see
that we also have support for setting Kafka keys in Flink.

@JB I get your point that a sink is simply a DoFn, but a ParDo is not
a good match for a sink. A Sink doesn't produce a PCollection but
represents the end of a pipeline. Like an UnboundedSource, an
UnboundedSink has also special needs, i.e. it needs to provide a
checkpointing mechanism. I think we need something along the lines of
the existing Write transform for batch.

On Fri, Apr 29, 2016 at 12:27 PM, Jean-Baptiste Onofré  
wrote:
> Hi,
>
> KafkaIO uses KafkaRecord which is basically key,value + some metadata
> (topic, partition, offset).
>
> Can you describe the behavior of an UnboundedSink ?
>
> UnboundedSource is obvious: it's still consuming data creating PCollection
> sent into the pipeline.
>
> But UnboundedSink ? Do you mean that the UnboundedSink will write each
> record in the PCollection ? When does it stop ?
>
> Regards
> JB
>
>
> On 04/29/2016 12:07 PM, Aljoscha Krettek wrote:
>>
>> Hi,
>> I think the fact that KafkaIO has a  model comes from Kafka
>> having a  model. I imagine most sources will emit the type of
>> values appropriate for them.
>>
>> I agree with Max that the lack of an UnboundedSink seems strange. Do we
>> have any "sinks" implemented as a ParDo already?
>>
>> Cheers,
>> Aljoscha
>>
>> On Fri, 29 Apr 2016 at 11:22 Jean-Baptiste Onofré  wrote:
>>
>>> Hi Max,
>>>
>>> Your four points are valid and we already discussed about that.
>>>
>>> 1. +1, the runner API should bring utils around that
>>> 2. UnboundedSink has been discussed (I don't see strong use case for
>>> now, as it takes a PCollection).
>>> 3. +1, Dan talked about improving the hierarchy.
>>> 4. +1, I'm working on new IO (JMS, MQTT, JDBC, Cassandra, Camel, ...).
>>>
>>> I would add:
>>>
>>> 5. Add a page on the website listing the IO, their usage and
>>> configuration. Something like we have in Camel:
>>> http://camel.apache.org/components.html
>>> 6. Refactore the FileIO to avoid usage of IOChannelFactory and use a
>>> filesystem plugin (file, hdfs, s3, aws, etc).
>>>
>>> Dan planned to create "util" to deal with watermark and timestamps.
>>>
>>> Regards
>>> JB
>>>
>>> On 04/29/2016 11:11 AM, Maximilian Michels wrote:
>>>>
>>>> @Amir: This is the Developer mailing list. Please post your questions
>>>> regarding Beam on the user mailing list.
>>>>
>>>> +1 for portability in general. However, I see some crucial TODOs coming
>>>
>>> up:
>>>>
>>>>
>>>> 1) Improving the integration of Runners with the Beam sink/source API
>>>> 2) Providing interfaces to implement new connectors (i.e. still no
>>>> existing UnboundedSink)
>>>> 3) Extending existing interfaces to ease implementation of connectors
>>>> and provide a uniform API (i.e. on top of UnboundedSource)
>>>> 4) Working on a more complete set of connectors in Beam
>>>>
>>>> Looking at the KafkaIO implementation, I wonder shouldn't we extract
>>>> the custom Watermark and Timestamp function into an extra interface?
>>>> All connectors are going to have methods these methods. It would be
>>>> nice to have a uniform API among the connectors.
>>>>
>>>> Further, the KafkaIO enforces a  data model which AFAIK is
>>>> not enforced by the Beam model. I don't know the details for this
>>>> design decision but I would like this to be communicated before it is
>>>> merged into the master. Otherwise, it will be harder to achieve
>>>> portability among the Runners. In Flink and Spark we are already
>>>> experienced with all kinds of connectors and user's needs. It would be
>>>> nice to feed that back in the course of adding new connectors to the
>>>> Beam API.
>>>>
>>>> I would expect an active discussion on the Dev mailing list before any
>>>> new connector API gets merged. Furthermore, let us provide better
>>>> interfaces for connector needs. Finally, let us introduce unbounded
>>>> sinks :)
>>>>
>>>> On Fri, Apr 29, 2016 at 7:54 AM, amir bahmanyari
>>>>  wrote:
>>>>>
>>>>> This may help trace it:Exception in thread "main&quo

Re: [DISCUSS] Beam IO &runners native IO

2016-04-29 Thread Maximilian Michels
@Amir: This is the Developer mailing list. Please post your questions
regarding Beam on the user mailing list.

+1 for portability in general. However, I see some crucial TODOs coming up:

1) Improving the integration of Runners with the Beam sink/source API
2) Providing interfaces to implement new connectors (i.e. still no
existing UnboundedSink)
3) Extending existing interfaces to ease implementation of connectors
and provide a uniform API (i.e. on top of UnboundedSource)
4) Working on a more complete set of connectors in Beam

Looking at the KafkaIO implementation, I wonder shouldn't we extract
the custom Watermark and Timestamp function into an extra interface?
All connectors are going to have methods these methods. It would be
nice to have a uniform API among the connectors.

Further, the KafkaIO enforces a  data model which AFAIK is
not enforced by the Beam model. I don't know the details for this
design decision but I would like this to be communicated before it is
merged into the master. Otherwise, it will be harder to achieve
portability among the Runners. In Flink and Spark we are already
experienced with all kinds of connectors and user's needs. It would be
nice to feed that back in the course of adding new connectors to the
Beam API.

I would expect an active discussion on the Dev mailing list before any
new connector API gets merged. Furthermore, let us provide better
interfaces for connector needs. Finally, let us introduce unbounded
sinks :)

On Fri, Apr 29, 2016 at 7:54 AM, amir bahmanyari
 wrote:
> This may help trace it:Exception in thread "main" 
> java.lang.IllegalStateException: no evaluator registered for 
> Read(UnboundedKafkaSource) at 
> org.apache.beam.sdk.runners.DirectPipelineRunner$Evaluator.visitTransform(DirectPipelineRunner.java:898)
>  at 
> org.apache.beam.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:221)
>  at 
> org.apache.beam.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:217)
>  at 
> org.apache.beam.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:217)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:104)
>  at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:261) at 
> org.apache.beam.sdk.runners.DirectPipelineRunner$Evaluator.run(DirectPipelineRunner.java:860)
>  at 
> org.apache.beam.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:572)
>  at 
> org.apache.beam.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:106)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:182) at 
> benchmark.flinkspark.flink.ReadFromKafka2.main(ReadFromKafka2.java:286)
>
>   From: Jean-Baptiste Onofré 
>  To: dev@beam.incubator.apache.org
>  Sent: Thursday, April 28, 2016 10:08 PM
>  Subject: Re: [DISCUSS] Beam IO &runners native IO
>
> I gonna take a look. The DirectPipelineRunner didn't support the
> unbounded collection (it has been fixed last night AFAIR). It could be
> related.
>
> Regards
> JB
>
> On 04/29/2016 07:00 AM, amir bahmanyari wrote:
>> Hi JB,I used the sample KafkaIO usage 
>> below.p.apply(KafkaIO.read().withBootstrapServers("kafkahost:9092").withTopics(topics));
>> p.run();
>>
>> It threw the following at p.run():Exception in thread "main" 
>> java.lang.IllegalStateException: no evaluator registered for 
>> Read(UnboundedKafkaSource) at 
>> org.apache.beam.sdk.runners.DirectPipelineRunner$Evaluator.visitTransform(DirectPipelineRunner.java:898)
>>  I am sure I am missing something.Would be great if I could see a sample 
>> code.I appreciate it sir.Cheers
>>
>>From: Jean-Baptiste Onofré 
>>  To: dev@beam.incubator.apache.org
>>  Sent: Thursday, April 28, 2016 5:30 AM
>>  Subject: [DISCUSS] Beam IO &runners native IO
>>
>> Hi all,
>>
>> regarding the recent threads on the mailing list, I would like to start
>> a format discussion around the IO.
>> As we can expect the first contributions on this area (I already have
>> some work in progress around this ;)), I think it's a fair discussion to
>> have.
>>
>> Now, we have two kinds of IO: the one "generic" to Beam, the one "local"
>> to the runners.
>>
>> For example, let's take Kafka: we have the KafkaIO (in IO), and for
>> instance, we have the spark-streaming kafka connector (in Spark Runner).
>>
>> Right now, we have two approaches for the user:
>> 1. In the pipeline, we use KafkaIO from Beam: it's the preferred
>> approach for sure. However, the user may want to use the runner specific
>> IO for two reasons:
>>  * Beam doesn't provide the IO yet (for instance, spark cassandra
>> connector is available whereas we don't have yet any CassandraIO (I'm
>> working on it anyway ;))
>>  * The runner native IO is optimized or contain more features that the
>> Beam native IO
>> 2. So, for the previous reasons, the user could want to use the native
>> runner IO. The drawback of this approach is that the pipeline will be
>> tight to a specific runner, which is completely against the Beam design.

Re: [PROPOSAL] Nightly builds by Jenkins

2016-04-05 Thread Maximilian Michels
Hey JB,

I would also propose three Jenkins jobs (apart from the Cloud Dataflow tests):

- Test coverage of pull requests (beam_PreCommit)
- Test coverage of the master and all other branches (beam_MavenVerify)
- A daily job that deploys artifacts to the snapshot repository (beam_Nightly)

Keeping the last two separate makes it easier to change the deployment
of snapshots. Daily deployment should be enough. If we ever want to
deploy more frequently, we can easily adjust the schedule of the
build. In addition, we can identify problems regarding the deployment
more easily and prevent breakage of the test execution of the
development branches.

Best,
Max

On Tue, Apr 5, 2016 at 9:57 AM, Jason Kuster
 wrote:
> Hey JB,
>
> Just want to clarify - do you mean that beam_nightly would continue to run
> on the schedule it currently has (SCM poll/hourly), plus one run at
> midnight?
>
> I think Dan's question centers around whether beam_nightly build would just
> run once every 24h. We want our postsubmit coverage to run more often than
> that is my impression. Doing a deploy every time the SCM poll returns some
> changes seems like an aggressive schedule to me, but I welcome your
> thoughts there. Otherwise we could keep beam_mavenverify running on the scm
> poll/hourly schedule and add the beam_nightly target which just does a
> single deploy every 24h.
>
> Thanks,
>
> Jason
>
> On Tue, Apr 5, 2016 at 12:41 AM, Jean-Baptiste Onofré 
> wrote:
>
>> Hi Dan,
>>
>> you can have both mvn clean deploy and mvn clean verify, but IMHO, the
>> first already covers the second ;)
>>
>> So, I think mvn clean deploy is fine instead of clean verify.
>>
>> WDYT ?
>>
>> Regards
>> JB
>>
>>
>> On 04/05/2016 08:05 AM, Dan Halperin wrote:
>>
>>> I am completely behind producing nightly jars.
>>>
>>> But, I don't think that `beam_MavenVerify` is completely redundant -- I
>>> was
>>> under the impression it was our main post-submit coverage. Is that wrong?
>>>
>>> If I'm not wrong, then I think this should simply be a fourth Jira target
>>> that runs every 24h.
>>>
>>> On Mon, Apr 4, 2016 at 10:50 PM, Jean-Baptiste Onofré 
>>> wrote:
>>>
>>> Hi beamers,

 Now, on Jenkins, we have three jobs:

 - beam_PreCommit does a mvn clean verify for each opened PR
 - beam_MavenVerify does a mvn clean verify on master branch
 - beam_RunnableOnService_GoogleCloudDataflow does a mvn clean verify
 -PDataflowPipelineTests on master branch

 As discussed last week, Davor and I are working on renaming (especially
 package).

 Once this renaming done (it should take a week or so), I propose to
 change
 beam_MavenVerify as beam_Nightly: it will do a mvn clean deploy deploying
 SNAPSHOTs on the Apache SNAPSHOT repo (deploy phase includes verify and
 test of course) with a schedule every night and SCM change.

 It will allow people to test and try beam without building.

 Thoughts ?

 Regards
 JB
 --
 Jean-Baptiste Onofré
 jbono...@apache.org
 http://blog.nanthrax.net
 Talend - http://www.talend.com


>>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>
>
> --
> ---
> Jason Kuster
> Apache Beam (Incubating) / Google Cloud Dataflow


Re: Draft Contribution Guide

2016-03-29 Thread Maximilian Michels
@Ben: Fixing a subtle bug that was introduced and that users are affected
by. Fixing instable builds. Minor cosmetic changes. Changes to JavaDoc.
However, PRs make an excellent tool for code reviews most of the time. I
see that we probably can't stress that enough.

I'd actually think that my paragraph gave more reasons to use pull requests
than the current draft does. I'd be happy if we incorporated some of my
suggestions. I have removed the "Whenever possible" but kept my other
points about communication and follow-up discussions:

"Committers should never commit anything without going through a pull
request, since that would bypass test coverage and potentially cause the
build to fail due to checkstyle, etc. *In addition, pull requests ensure
that changes are communicated properly and potential flaws or improvements
can be spotted*. Always go through the pull request, even if you won’t wait
for the code review. *Even then, comments can be provided in the pull
requests after it has been merged to work on follow-ups.*"

Best,
Max

On Wed, Mar 23, 2016 at 8:19 PM, Jean-Baptiste Onofré 
wrote:
> Hi Max,
>
> I would keep a "stronger statement", something like:
>
> "Committers always provide a pull request. This pull request has to be
> merged cleanly, and doesn't break the build (including test, checkstyle,
> documentation). The pull request has to be reviewed and only pushed on the
> upstream when the reviewer gives the LGTM keyword as comment."
>
> All other situations where the committer doesn't/can't provide a PR should
> be approved on the dev mailing list.
>
> My $0.01
>
> Regards
> JB
>
>
> On 03/23/2016 07:22 PM, Maximilian Michels wrote:
>>
>> I didn't see this paragraph before:
>>
>> "Committers should never commit anything without going through a pull
>> request, since that would bypass test coverage and potentially cause
>> the build to fail due to checkstyle, etc. Always go through the pull
>> request, even if you won’t wait for the code review."
>>
>> How about:
>>
>> "Whenever possible, commits should be reviewed in a pull request. Pull
>> requests ensure that changes can be communicated properly with the
>> community and potential flaws or improvements can be spotted. In
>> addition, pull requests ensure proper test coverage and verification
>> of the build. Whenever possible, go through the pull request, even if
>> you won’t wait for the code review."
>>
>> - Max
>>
>>
>> On Wed, Mar 23, 2016 at 5:33 PM, Jean-Baptiste Onofré 
>> wrote:
>>>
>>> +1
>>>
>>> Regards
>>> JB
>>>
>>>
>>> On 03/23/2016 05:30 PM, Davor Bonaci wrote:
>>>>
>>>>
>>>> Thanks everyone for commenting!
>>>>
>>>> There were no new comments in the last several days, so we'll start
>>>> moving
>>>> the doc over to the Beam website.
>>>>
>>>> Of course, there's nothing here set in stone -- please reopen the
>>>> discussion about any particular point at any time in the future.
>>>>
>>>> On Fri, Mar 18, 2016 at 4:44 AM, Maximilian Michels 
>>>> wrote:
>>>>
>>>>> Hi Frances,
>>>>>
>>>>> Very nice comprehensive guide. I'll leave some comments in the doc.
>>>>>
>>>>> Cheers,
>>>>> Max
>>>>>
>>>>> On Fri, Mar 18, 2016 at 11:51 AM, Sandeep Deshmukh
>>>>>  wrote:
>>>>>>
>>>>>>
>>>>>> The document captures the process very well and has right amount of
>>>>>
>>>>>
>>>>> details
>>>>>>
>>>>>>
>>>>>> for newbies too.
>>>>>>
>>>>>> Great work!!!
>>>>>>
>>>>>> Regards,
>>>>>> Sandeep
>>>>>>
>>>>>> On Fri, Mar 18, 2016 at 10:46 AM, Siva Kalagarla <
>>>>>
>>>>>
>>>>> siva.kalaga...@gmail.com>
>>>>>>
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Frances,  This document is helpful for newbies like myself.
>>>>>>> Will
>>>>>>> follow these steps over this weekend.
>>>>>>>
>>>>>>> On Thu, Mar 17, 2016 at 2:19 PM, Frances Perry
>>>>>>> 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Beamers!
>>>>>>>>
>>>>>>>> We've started a draft
>>>>>>>> <
>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>>
https://docs.google.com/document/d/1syFyfqIsGOYDE_Hn3ZkRd8a6ylcc64Kud9YtrGHgU0E/comment
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> for the Beam contribution guide. Please take a look and provide
>>>>>
>>>>>
>>>>> feedback.
>>>>>>>>
>>>>>>>>
>>>>>>>> Once things settle, we'll get this moved over on to the Beam
>>>>>>>> website.
>>>>>>>>
>>>>>>>> Frances
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Siva Kalagarla
>>>>>>> @SivaKalagarla <https://twitter.com/SivaKalagarla>
>>>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: GitHub mirroring broken

2016-03-24 Thread Maximilian Michels
Yep, happens quite regularly actually. Usually resolved within a
couple of hours.

On Thu, Mar 24, 2016 at 7:43 AM, Jean-Baptiste Onofré  wrote:
> Yes, same issue with other Apache projects (including TLP).
>
> It's not yet completely fixed.
>
> Regards
> JB
>
>
> On 03/24/2016 06:32 AM, Davor Bonaci wrote:
>>
>> Mirroring of our Apache git repository to GitHub seems to be broken for
>> the
>> last 6 hours or so. There has been some flakiness last night, but now it
>> seems to be completely broken.
>>
>> It is not just us; other projects are reporting this too. For now, I've
>> just commented on the Zeppelin's INFRA issue [1].
>>
>> Impact:
>> * Latest commits not visible in GitHub.
>> * Pull requests not getting closed.
>> * Jenkins project not picking up any changes, and not running any builds.
>>
>> Sorry about this! Hopefully, we can get it resolved quickly.
>>
>> [1] https://issues.apache.org/jira/browse/INFRA-11532
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: Draft Contribution Guide

2016-03-23 Thread Maximilian Michels
I didn't see this paragraph before:

"Committers should never commit anything without going through a pull
request, since that would bypass test coverage and potentially cause
the build to fail due to checkstyle, etc. Always go through the pull
request, even if you won’t wait for the code review."

How about:

"Whenever possible, commits should be reviewed in a pull request. Pull
requests ensure that changes can be communicated properly with the
community and potential flaws or improvements can be spotted. In
addition, pull requests ensure proper test coverage and verification
of the build. Whenever possible, go through the pull request, even if
you won’t wait for the code review."

- Max


On Wed, Mar 23, 2016 at 5:33 PM, Jean-Baptiste Onofré  wrote:
> +1
>
> Regards
> JB
>
>
> On 03/23/2016 05:30 PM, Davor Bonaci wrote:
>>
>> Thanks everyone for commenting!
>>
>> There were no new comments in the last several days, so we'll start moving
>> the doc over to the Beam website.
>>
>> Of course, there's nothing here set in stone -- please reopen the
>> discussion about any particular point at any time in the future.
>>
>> On Fri, Mar 18, 2016 at 4:44 AM, Maximilian Michels 
>> wrote:
>>
>>> Hi Frances,
>>>
>>> Very nice comprehensive guide. I'll leave some comments in the doc.
>>>
>>> Cheers,
>>> Max
>>>
>>> On Fri, Mar 18, 2016 at 11:51 AM, Sandeep Deshmukh
>>>  wrote:
>>>>
>>>> The document captures the process very well and has right amount of
>>>
>>> details
>>>>
>>>> for newbies too.
>>>>
>>>> Great work!!!
>>>>
>>>> Regards,
>>>> Sandeep
>>>>
>>>> On Fri, Mar 18, 2016 at 10:46 AM, Siva Kalagarla <
>>>
>>> siva.kalaga...@gmail.com>
>>>>
>>>> wrote:
>>>>
>>>>> Thanks Frances,  This document is helpful for newbies like myself.
>>>>> Will
>>>>> follow these steps over this weekend.
>>>>>
>>>>> On Thu, Mar 17, 2016 at 2:19 PM, Frances Perry 
>>>>> wrote:
>>>>>
>>>>>> Hi Beamers!
>>>>>>
>>>>>> We've started a draft
>>>>>> <
>>>>>>
>>>>>
>>>
>>> https://docs.google.com/document/d/1syFyfqIsGOYDE_Hn3ZkRd8a6ylcc64Kud9YtrGHgU0E/comment
>>>>>>>
>>>>>>>
>>>>>> for the Beam contribution guide. Please take a look and provide
>>>
>>> feedback.
>>>>>>
>>>>>> Once things settle, we'll get this moved over on to the Beam website.
>>>>>>
>>>>>> Frances
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> Regards,
>>>>> Siva Kalagarla
>>>>> @SivaKalagarla <https://twitter.com/SivaKalagarla>
>>>>>
>>>
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: Renaming process: first step Maven coordonates

2016-03-21 Thread Maximilian Michels
I would be in favor of one group id. For the developer, hierarchies
are really important. They are visible in the directory layout of the
Maven project and in the dependency tree. For the user, it shouldn't
matter how the project is structured. He pulls in artifacts simply
from the "org.apache.beam" group. I think it makes outside-interaction
easier when we have a fixed group id.

Cheers,
Max

On Mon, Mar 21, 2016 at 5:02 PM, Jean-Baptiste Onofré  wrote:
> Hi Lukasz,
>
> both are possible.
>
> Some projects use different groupId. It's the case for Karaf or Camel for
> instance:
>
> http://repo.maven.apache.org/maven2/org/apache/karaf/
>
> You can see there the different groupId, containing the different artifacts.
>
> On the other hand, other projects use an unique groupId and multiple
> artifactId. It's the case in Spark or Flink for instance.
>
> At first glance, I had a preference to groupId for "global" Beam kind of
> artifacts (like io, runner, etc). But, it would make sense to work more on
> the artifactId.
>
> Regards
> JB
>
>
> On 03/21/2016 04:50 PM, Lukasz Cwik wrote:
>>
>> I like the single groupId since it makes it simpler to find all related
>> components for a project.
>>
>> Is there a common practice in maven for multi-module vs inheritance
>> projects for choosing the groupId?
>>
>> On Mon, Mar 21, 2016 at 7:32 AM, Jean-Baptiste Onofré 
>> wrote:
>>
>>> Hi beamers,
>>>
>>> I updated the PR according to your comments.
>>>
>>> I have couple of points I want to discuss:
>>>
>>> 1. All modules use the same groupId (org.apache.beam). In order to have a
>>> cleaner structure on the Maven repo, I wonder if it's not better to have
>>> different groupId depending of the artifacts. For instance,
>>> org.apache.beam.sdk, containing a module with java as artifactId (it will
>>> contain new artifacts with id python, scala, ...),
>>> org.apache.beam.runners
>>> containing modules with flink and spark as artifactId, etc. Thoughts ?
>>> 2. The version has been set to 0.1.0-incubating-SNAPSHOT for all
>>> artifacts, including the runners. It doesn't mean that the runners will
>>> have to use the same version as parent (they can have their own release
>>> cycle). However, as we "bootstrap" the project, I used the same version
>>> in
>>> all modules.
>>>
>>> Now, I'm starting two new commits:
>>> - renaming of the packages
>>> - folders re-organization
>>>
>>> Thanks !
>>> Regards
>>> JB
>>>
>>>
>>> On 03/21/2016 01:56 PM, Jean-Baptiste Onofré wrote:
>>>
 Hi Davor,

 thank you so much for your comments. I'm updating the PR according to
 your PR (and will provide explanation to some changes).

 Thanks dude !

 Regards
 JB

 On 03/21/2016 06:29 AM, Davor Bonaci wrote:

> I left a few comments on PR #46.
>
> Thanks JB for doing this; a clear improvement.
>
> On Mon, Mar 14, 2016 at 6:04 PM, Jean-Baptiste Onofré 
> wrote:
>
> Hi all,
>>
>>
>> I started the renaming process from Dataflow to Beam.
>>
>> I submitted a first PR about the Maven coordinates:
>>
>> https://github.com/apache/incubator-beam/pull/46
>>
>> I will start the packages renaming (updating the same PR). For the
>> directories structure, I would like to talk with Frances, Dan, Tyler,
>> and
>> Davor first.
>>
>> Regards
>> JB
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>>
>

>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: [HEADS UP] Renaming/polishing

2016-03-21 Thread Maximilian Michels
If we can leave out the "incubating" qualifier for development, I
would very much appreciate that. I like Davor's proposal to append it
only once we release. Apart from the improved Maven version semantics,
it would incorporate the fact that incubating projects are only
required to include the "incubating" qualifier for releases.

+1 for 0.1-SNAPSHOT for development
+1 for 0.1-incubating or 0.1.0-incubating for the first release

On Sun, Mar 20, 2016 at 10:22 PM, Davor Bonaci  wrote:
> I believe we'll put ourselves into a corner with "0.1-incubating-SNAPSHOT".
>
> The format has to be: ..-, as per
> [1], i.e., no two dashes. If it is not, Maven resolution will get things
> wrong by comparing strings instead of numbers: 10 becomes less than 2, etc.
> Maven handles "-SNAPSHOT" qualifier specially; qualifier 
> "-incubating-SNAPSHOT"
> will not get that benefit.
>
> Here's a very specific example from [1]:
>
> Take the version release numbers “1.2.3-alpha-2” and “1.2.3-alpha-10,”
> where the “alpha-2” build corresponds to the 2nd alpha build, and the
> “alpha-10” build corresponds to the 10th alpha build. Even though
> “alpha-10” should be considered more recent than “alpha-2,” Maven is going
> to sort “alpha-10” before “alpha-2”.
>
>
> There are several orthogonal decisions here:
>
> 1. How much version numbers do we need for now? I argue do don't need the
> incremental part before the first stable release -- two numbers should be
> sufficient. So, the format, before the first stable release, can be
> .-.
>
> 2. I don't think we need "incubating-SNAPSHOT" ever. For the most part,
> both qualifiers communicate the same thing -- that this is not really ready
> for primetime yet. For example, we can use -SNAPSHOT for the nightly build,
> and "-incubating" for the actual releases while we are in the incubation
> phase. Snapshots will not get released anywhere -- no reason for them to
> carry "incubating" too; we'll just mess up resolution handling.
>
> 3. I found many projects in the Incubator that don't actually have
> "incubating" in the version part. Some put it in the artifact id; others
> put it in the name only; a few don't have it at all. I dislike the artifact
> approach, and I'm neutral between name & version. Name is easier, however.
>
> 4. When we release the first stable version, I propose that it is marked as
> 2.0.0. Before that, we'll likely push several pre-release versions. We have
> released 1.5.0 in Dataflow recently, and might release a few more. It might
> be smarter to leave a few numbers for any such versions of Dataflow. So, we
> could start with something like 1.9.0. I think 0.1 communicates more
> clearly that this is a pre-release version.
>
> To summarize, I think a good proposal is as follows:
>
> Start with 0.1-SNAPSHOT. This goes into Beam's parent pom.xml. When we
> release 0.1, we override it to 0.1-incubating. At that time, the pom goes
> to 0.2-SNAPSHOT, and we release it as 0.2-incubating. Sometime before the
> first stable release post incubation, we change it to 2.0.0-SNAPSHOT, and
> release as 2.0.0.
>
> [1]
> https://books.sonatype.com/mvnref-book/reference/pom-relationships-sect-pom-syntax.html
>
> On Sun, Mar 20, 2016 at 12:31 PM, Jean-Baptiste Onofré 
> wrote:
>
>> Hi beamers,
>>
>> as the project is more and more visible, and we begin to see incoming
>> contributions, I think we really have to move forward on the code cleanup
>> and polishing.
>>
>> So, I'm updating PR #46 about renaming the packages and re-organizing the
>> folders. I will update the PR by tomorrow.
>>
>> In the mean time, I sent an e-mail about the version. Right now, I
>> proposed 1.5.0-incubating-SNAPSHOT. Some expressed to start with
>> 0.1-incubating-SNAPSHOT.
>>
>> I think 0.1-incubating-SNAPSHOT makes sense. Please, if you disagree, let
>> me know, else I will update the version in PR #46.
>>
>> Thanks
>> Regards
>> JB
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>


Re: Committer workflow

2016-03-21 Thread Maximilian Michels
Hi JB,

If I remember correctly from the past discussions, we agreed that we
want a PR-review process for all code changes which are
important/major or break the API in some way. I wholeheartedly agree
with this process.

In addition, committers preserve the right to provide small fixes
which do not change important logic of the system. Ideally, this
should only be changes that are part of the component which the
committer is responsible for. In this way, we can focus on important
pull requests and still provide meaningful fixes. It is the
committer's responsibility to make sure those fixes don't hinder other
committers' work. If necessary, committers should sync with each other
before pushing fixes. In rare cases, commits can be reverted.

Pull requests are a very important tool in the open source development
process. Whenever possible, let's use them but please let's not
enforce them. Community is also about trusting each other. Direct
pushes should be used with care and shouldn't be the normal
development process. On the other side, pull requests which get merged
away immediately are not meaningful either.

So far I think we are doing a good job. Please correct me if you feel
differently.

Best,
Max


On Sun, Mar 20, 2016 at 8:27 PM, Jean-Baptiste Onofré  wrote:
> Hi all,
>
> As a reminder, we agreed that everyone in the project should use the same
> workflow: prepare a PR, submit the PR, give some time to review, apply the
> PR.
>
> Right now, we had some pushes directly without a PR.
>
> It would be great to we *all* use the same workflow.
>
> Thanks !
>
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: Unable to serialize exception running KafkaWindowedWordCountExample

2016-03-19 Thread Maximilian Michels
@Dan: You're right that the PipelineOptions shouldn't be cached like
this. In this particular wrapper, it was not even necessary.

@Jiankang: I've pushed a fix to the repository with a few
improvements. Could you please try again? You will have to recompile.

Thanks,
Max

On Thu, Mar 17, 2016 at 8:44 AM, Dan Halperin  wrote:
> +Max for the Flink Runner, and +Luke who wrote most of the initial code
> around PipelineOptions.
>
> The UnboundedFlinkSource is caching the `PipelineOptions` object, here:
> https://github.com/apache/incubator-beam/blob/071e4dd67021346b0cab2aafa0900ec7e34c4ef8/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedFlinkSource.java#L36
>
> I think this is a mismatch with how we intended them to be used. For
> example, the PipelineOptions may be changed by a Runner between graph
> construction time (when the UnboundedFlinkSource is created) and actual
> pipeline execution time. This is partially why, for example, PipelineOptions
> are provided by the Runner as an argument to functions like
> DoFn.startBundle, .processElement, and .finishBundle.
>
> PipelineOptions itself does not extend Serializable, and per the
> PipelineOptions documentation it looks like we intend for it to be
> serialized through Jackson rather than through Java serialization. I bet the
> Flink runner does this, and we probably just need to remove this cached
> PipelineOptions from the unbounded source.
>
> I'll let Luke and Max correct me on any or all of the above :)
>
> Thanks,
> Dan
>
> On Wed, Mar 16, 2016 at 10:57 PM, 刘见康  wrote:
>>
>> Hi guys,
>>
>> Failed to run KafkaWindowedWordCountExample with Unable to serialize
>> exception, the stack exception as below:
>>
>> 16/03/17 13:49:09 INFO flink.FlinkPipelineRunner:
>> PipelineOptions.filesToStage was not specified. Defaulting to files from
>> the classpath: will stage 160 files. Enable logging at DEBUG level to see
>> which files will be staged.
>> 16/03/17 13:49:09 INFO flink.FlinkPipelineExecutionEnvironment: Creating
>> the required Streaming Environment.
>> 16/03/17 13:49:09 INFO kafka.FlinkKafkaConsumer08: Trying to get topic
>> metadata from broker localhost:9092 in try 0/3
>> 16/03/17 13:49:09 INFO kafka.FlinkKafkaConsumerBase: Consumer is going to
>> read the following topics (with number of partitions):
>> Exception in thread "main" java.lang.IllegalArgumentException: unable to
>> serialize
>>
>> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedFlinkSource@2d29b4ee
>> at
>>
>> com.google.cloud.dataflow.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:54)
>> at
>>
>> com.google.cloud.dataflow.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:84)
>> at com.google.cloud.dataflow.sdk.io.Read$Unbounded.(Read.java:194)
>> at com.google.cloud.dataflow.sdk.io.Read$Unbounded.(Read.java:189)
>> at com.google.cloud.dataflow.sdk.io.Read.from(Read.java:69)
>> at
>>
>> org.apache.beam.runners.flink.examples.streaming.KafkaWindowedWordCountExample.main(KafkaWindowedWordCountExample.java:129)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:497)
>> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
>> Caused by: java.io.NotSerializableException:
>> com.google.cloud.dataflow.sdk.options.ProxyInvocationHandler
>> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
>> at
>>
>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>> at
>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>> at
>>
>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>> at
>>
>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>> at
>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>> at
>>
>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
>> at
>>
>> com.google.cloud.dataflow.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:50)
>> ... 10 more
>>
>> I found there is a similar issue in flink-dataflow
>> https://github.com/dataArtisans/flink-dataflow/issues/8.
>>
>> Do you have an idea about this error?
>>
>> Thanks
>> Jiankang
>
>


Re: Unable to serialize exception running KafkaWindowedWordCountExample

2016-03-19 Thread Maximilian Michels
Hi Jiankang,

Thanks for reporting again. I'm sorry that you ran into another
problem. This example had been working but it has some small problems
with the new code base we just migrated to.

I've fixed and tested the example and would invite you to try again.

Thanks,
Max

On Thu, Mar 17, 2016 at 1:25 PM, 刘见康  wrote:
> @Max:
> Thanks for your quick fix, this serializable exception has been solved.
> However, it reported another one:
> 16/03/17 20:14:23 INFO flink.FlinkPipelineRunner:
> PipelineOptions.filesToStage was not specified. Defaulting to files from
> the classpath: will stage 158 files. Enable logging at DEBUG level to see
> which files will be staged.
> 16/03/17 20:14:23 INFO flink.FlinkPipelineExecutionEnvironment: Creating
> the required Streaming Environment.
> 16/03/17 20:14:23 INFO kafka.FlinkKafkaConsumer08: Trying to get topic
> metadata from broker localhost:9092 in try 0/3
> 16/03/17 20:14:23 INFO kafka.FlinkKafkaConsumerBase: Consumer is going to
> read the following topics (with number of partitions):
> Exception in thread "main" java.lang.RuntimeException: Flink Sources are
> supported only when running with the FlinkPipelineRunner.
> at
> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedFlinkSource.getDefaultOutputCoder(UnboundedFlinkSource.java:71)
> at
> com.google.cloud.dataflow.sdk.io.Read$Unbounded.getDefaultOutputCoder(Read.java:230)
> at
> com.google.cloud.dataflow.sdk.transforms.PTransform.getDefaultOutputCoder(PTransform.java:294)
> at
> com.google.cloud.dataflow.sdk.transforms.PTransform.getDefaultOutputCoder(PTransform.java:309)
> at
> com.google.cloud.dataflow.sdk.values.TypedPValue.inferCoderOrFail(TypedPValue.java:167)
> at
> com.google.cloud.dataflow.sdk.values.TypedPValue.getCoder(TypedPValue.java:48)
> at
> com.google.cloud.dataflow.sdk.values.PCollection.getCoder(PCollection.java:137)
> at
> com.google.cloud.dataflow.sdk.values.TypedPValue.finishSpecifying(TypedPValue.java:88)
> at com.google.cloud.dataflow.sdk.Pipeline.applyInternal(Pipeline.java:331)
> at com.google.cloud.dataflow.sdk.Pipeline.applyTransform(Pipeline.java:274)
> at
> com.google.cloud.dataflow.sdk.values.PCollection.apply(PCollection.java:161)
> at
> org.apache.beam.runners.flink.examples.streaming.KafkaWindowedWordCountExample.main(KafkaWindowedWordCountExample.java:127)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
>
> Dive into the UnboundedFlinkSource class, it just like a simple class imply
> the UnboundedSource interface with throw RuntimeException.
> I just wonder if this Kafka Streaming example is runnable?
>
> Thanks
> Jiankang
>
>
> On Thu, Mar 17, 2016 at 7:35 PM, Maximilian Michels  wrote:
>
>> @Dan: You're right that the PipelineOptions shouldn't be cached like
>> this. In this particular wrapper, it was not even necessary.
>>
>> @Jiankang: I've pushed a fix to the repository with a few
>> improvements. Could you please try again? You will have to recompile.
>>
>> Thanks,
>> Max
>>
>> On Thu, Mar 17, 2016 at 8:44 AM, Dan Halperin  wrote:
>> > +Max for the Flink Runner, and +Luke who wrote most of the initial code
>> > around PipelineOptions.
>> >
>> > The UnboundedFlinkSource is caching the `PipelineOptions` object, here:
>> >
>> https://github.com/apache/incubator-beam/blob/071e4dd67021346b0cab2aafa0900ec7e34c4ef8/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedFlinkSource.java#L36
>> >
>> > I think this is a mismatch with how we intended them to be used. For
>> > example, the PipelineOptions may be changed by a Runner between graph
>> > construction time (when the UnboundedFlinkSource is created) and actual
>> > pipeline execution time. This is partially why, for example,
>> PipelineOptions
>> > are provided by the Runner as an argument to functions like
>> > DoFn.startBundle, .processElement, and .finishBundle.
>> >
>> > PipelineOptions itself does not extend Serializable, and per the
>> > PipelineOptions documentation it looks like we intend for it to be
>> > serialized through Jackson rather than through Java serialization. I bet
>> the
>> > Flink runner does this, and we probably just need to remove this cached
>> > PipelineOptions from t

Re: Capability Matrix

2016-03-19 Thread Maximilian Michels
Well done. The matrix provides a good basis for improving the existing
runners. Moreover, new backends can use it to evaluate capabilities
for creating a runner.

On Fri, Mar 18, 2016 at 1:15 AM, Jean-Baptiste Onofré  wrote:
> Catcha, thanks !
>
> Regards
> JB
>
>
> On 03/18/2016 12:51 AM, Frances Perry wrote:
>>
>> That's "partially". Check out the full matrix for complete details:
>> http://beam.incubator.apache.org/capability-matrix/
>>
>> On Thu, Mar 17, 2016 at 4:50 PM, Jean-Baptiste Onofré 
>> wrote:
>>
>>> Great job !
>>>
>>> By the way, when you use ~ in the matrix, does it mean that it works only
>>> in some cases (depending of the pipeline or transform) or it doesn't work
>>> as expected ? Just curious for the Aggregators and the meaning in the
>>> Beam
>>> Model.
>>>
>>> Thanks,
>>> Regards
>>> JB
>>>
>>>
>>> On 03/18/2016 12:45 AM, Tyler Akidau wrote:
>>>
 Just pushed the capability matrix and an attendant blog post to the
 site:

  - Blog post:


 http://beam.incubator.apache.org/beam/capability/2016/03/17/capability-matrix.html
  - Matrix: http://beam.incubator.apache.org/capability-matrix/

 For those of you that want to keep the matrix up to date as your runner
 evolves, you'll want to make updates in the _data/capability-matrix.yml
 file:


 https://github.com/apache/incubator-beam-site/blob/asf-site/_data/capability-matrix.yml

 Thanks to everyone for helping fill out the initial set of capabilities!
 Looking forward to updates as things progress. :-)

 And thanks also to Max for moving all the website stuff to git!

 -Tyler


 On Sat, Mar 12, 2016 at 9:37 AM Tyler Akidau  wrote:

 Thanks all! At this point, it looks like most all of the fields have
 been
>
> filled out. I'm in the process of migrating the spreadsheet contents to
> YAML within the website source, so I've revoked edit access from the
> doc
> to
> keep things from changing while I'm doing that. If you have further
> edits
> to make, feel free to leave a comment, and I'll incorporate it into the
> YAML.
>
> -Tyler
>
>
> On Thu, Mar 10, 2016 at 12:43 AM Jean-Baptiste Onofré 
> wrote:
>
> Hi Tyler,
>>
>>
>> good idea !
>>
>> I like it !
>>
>> Regards
>> JB
>>
>> On 03/09/2016 11:14 PM, Tyler Akidau wrote:
>>
>>> I just filed BEAM-104
>>> 
>>> regarding publishing a capability matrix on the Beam website. We've
>>>
>> seeded
>>
>>> the spreadsheet linked there (
>>>
>>>
>>
>> https://docs.google.com/spreadsheets/d/1OM077lZBARrtUi6g0X0O0PHaIbFKCD6v0djRefQRE1I/edit
>>
>>> )
>>> with an initial proposed set of capabilities, as well as descriptions
>>>
>> for
>>
>>> the model and Cloud Dataflow. If folks for other runners (currently
>>>
>> Flink
>>
>>> and Spark) could please make sure their columns are filled out as
>>> well,
>>> it'd be much appreciated. Also let us know if there are capabilities
>>> you
>>> think we've missed.
>>>
>>> Our hope is to get this up and published soon, since we've been
>>> getting
>>>
>> a
>>
>>> lot of questions regarding runner capabilities, portability, etc.
>>>
>>> -Tyler
>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>>
>

>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: Draft Contribution Guide

2016-03-19 Thread Maximilian Michels
Hi Frances,

Very nice comprehensive guide. I'll leave some comments in the doc.

Cheers,
Max

On Fri, Mar 18, 2016 at 11:51 AM, Sandeep Deshmukh
 wrote:
> The document captures the process very well and has right amount of details
> for newbies too.
>
> Great work!!!
>
> Regards,
> Sandeep
>
> On Fri, Mar 18, 2016 at 10:46 AM, Siva Kalagarla 
> wrote:
>
>> Thanks Frances,  This document is helpful for newbies like myself.  Will
>> follow these steps over this weekend.
>>
>> On Thu, Mar 17, 2016 at 2:19 PM, Frances Perry 
>> wrote:
>>
>> > Hi Beamers!
>> >
>> > We've started a draft
>> > <
>> >
>> https://docs.google.com/document/d/1syFyfqIsGOYDE_Hn3ZkRd8a6ylcc64Kud9YtrGHgU0E/comment
>> > >
>> > for the Beam contribution guide. Please take a look and provide feedback.
>> > Once things settle, we'll get this moved over on to the Beam website.
>> >
>> > Frances
>> >
>>
>>
>>
>> --
>>
>>
>> Regards,
>> Siva Kalagarla
>> @SivaKalagarla 
>>


Re: Sorry for un-fixed-up PR merge

2016-03-11 Thread Maximilian Michels
Hi Kenneth,

Thanks for the notice. It happens, we're all human :)

Cheers,
Max

On Fri, Mar 11, 2016 at 12:13 AM, Kenneth Knowles
 wrote:
> I want to apologize for leaving fixup commits in a PR merge I just
> performed. I'm leaving as-is rather than mess about with `git push -f` to
> rewrite a prettier history. Just don't want anyone to think that I would
> normally go about like that.
>
> Kenn


Re: Travis for pull requests

2016-03-10 Thread Maximilian Michels
Well done :)

About the Flink tests in Jenkins: I wonder why they don't execute.
Just had a look at the Jenkins job. They seem to run fine:
https://builds.apache.org/job/beam_MavenVerify/35/org.apache.beam$flink-runner/console

On Thu, Mar 10, 2016 at 7:40 AM, Jean-Baptiste Onofré  wrote:
> Awesome ! Thanks Davor.
>
> Regards
> JB
>
>
> On 03/10/2016 01:10 AM, Davor Bonaci wrote:
>>
>> I'm happy to announce that we now have both Travis and Jenkins set up in
>> Beam.
>>
>> Both systems are building our master branch. The most recent status is
>> incorporated into the top-level README.md file. Clicking the badge will
>> take you to the specific build results. Additionally, we have automatic
>> coverage for each pull request, with results integrated into the GitHub
>> pull request UI.
>>
>> Exciting!
>>
>> Low-level details:
>> The systems aren't exactly equal. Travis will run on any branch, while
>> Jenkins will run on master only. Travis will run multi-OS, multi-JDK
>> version, while Jenkins does just one combination. Notifications to Travis
>> are pushed, Jenkins periodically polls for changes. Flink tests may not be
>> running in Jenkins right now -- we need to investigate why.
>>
>> On Wed, Mar 9, 2016 at 8:57 AM, Davor Bonaci  wrote:
>>
>>> Sounds like we are all in agreement. Great!
>>>
>>> On Wed, Mar 9, 2016 at 8:49 AM, Jean-Baptiste Onofré 
>>> wrote:
>>>
>>>> I agree, and it's what I mean (assuming the signing is OK).
>>>>
>>>> Basically, a release requires the following action:
>>>>
>>>> - mvn release:prepare && mvn release:perform (with pgp signing, etc): it
>>>> can be done by Jenkins, BUT it requires some credentials in
>>>> .m2/settings.xml (for signing and upload on nexus), etc. In lot of
>>>> Apache
>>>> projects, you have some guys dedicated for the releases, and a release
>>>> is
>>>> simply an unique command line to execute (or a procedure to follow)
>>>> - check the release content (human)
>>>> - close the staging repository on nexus (human)
>>>> - send the vote e-mail (human)
>>>> - once the vote passed:
>>>> -- promote the staging repo (human)
>>>> -- update Jira (human)
>>>> -- publish artifacts on dist.apache.org (human)
>>>> -- update reporter.apache.org (human)
>>>> -- send announcement e-mail on the mailing lists (human)
>>>>
>>>> Regards
>>>> JB
>>>>
>>>>
>>>> On 03/09/2016 05:38 PM, Davor Bonaci wrote:
>>>>
>>>>> I think a release manager (a person) should be driving it, but his/her
>>>>> actions can still be automated through Jenkins. For example, a Jenkins
>>>>> job
>>>>> that release manager manually triggers is often better than a set of
>>>>> manual
>>>>> command-line actions. Reasons: less error prone, repeatable, log of
>>>>> actions
>>>>> is kept and is visible to everyone, etc.
>>>>>
>>>>> On Wed, Mar 9, 2016 at 1:25 AM, Jean-Baptiste Onofré 
>>>>> wrote:
>>>>>
>>>>> Hi Max,
>>>>>>
>>>>>>
>>>>>> I agree to use Jenkins for snapshots, but I don't think it's a good
>>>>>> idea
>>>>>> for release (it's better that a release manager does it IMHO).
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>>
>>>>>> On 03/09/2016 10:12 AM, Maximilian Michels wrote:
>>>>>>
>>>>>> I'm in favor of Travis too. We use it very extensively at Flink. It is
>>>>>>>
>>>>>>> true that Jenkins can provide a much more sophisticated workflow.
>>>>>>> However, its UI is outdated and it is not as nicely integrated with
>>>>>>> GitHub. For outside contributions, IMHO Travis is the best CI system.
>>>>>>>
>>>>>>> We might actually use Jenkins for releases or snapshot deployment.
>>>>>>> Jenkins is very flexible and nicely integrated with the ASF
>>>>>>> infrastructure which makes some things like providing credentials a
>>>>>>> piece of cake.
>>>>>>>
>>>>>>> Thanks for getting us started @Davor.
>>>

Re: Travis for pull requests

2016-03-09 Thread Maximilian Michels
I'm in favor of Travis too. We use it very extensively at Flink. It is
true that Jenkins can provide a much more sophisticated workflow.
However, its UI is outdated and it is not as nicely integrated with
GitHub. For outside contributions, IMHO Travis is the best CI system.

We might actually use Jenkins for releases or snapshot deployment.
Jenkins is very flexible and nicely integrated with the ASF
infrastructure which makes some things like providing credentials a
piece of cake.

Thanks for getting us started @Davor.

On Tue, Mar 8, 2016 at 6:35 PM, Davor Bonaci  wrote:
> We absolutely could -- that's why we forked over Dataflow's Travis
> configuration to start with. With Max's recent fixes to the Flink runner,
> this is very viable.
>
> Travis vs. Jenkins is often a contentious discussion. Common arguments
> against Travis are: scalability / capacity, hard to schedule periodic runs,
> and inability to automate the release process. There are many pros too;
> e.g., automatic coverage on forked repositories.
>
> We are generally in favor of doing this through Jenkins for the pull
> requests, since that is our "official" CI. Many projects do this -- Apache
> Thrift is one example [1]. Work on this is in-progress on our side.
>
> Maintaining both systems is an extra burden, but I feel we'll end up there
> sooner or later. Thus, I'm also in favor of enabling the coverage that we
> already have. Let's have both for now, and we can always adjust later.
>
> I'll go ahead and file ticket(s) with INFRA.
>
> [1] https://github.com/apache/thrift/pull/932
>
> On Tue, Mar 8, 2016 at 6:31 AM, Jean-Baptiste Onofré 
> wrote:
>
>> Hi Max,
>>
>> +1 good idea !
>>
>> Regards
>> JB
>>
>>
>> On 03/08/2016 03:22 PM, Maximilian Michels wrote:
>>
>>> Hi Beamers,
>>>
>>> Quick suggestion: Could we enable Travis for the pull request of the
>>> GitHub mirror? At the moment we only have Travis for our forks.
>>>
>>> This would provide contributors with some feedback and also help us to
>>> identify problems with the pull requests. I think we only need to tell
>>> Infra to enable it for the apache/incubator-beam GitHub project.
>>>
>>> Best,
>>> Max
>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>


Travis for pull requests

2016-03-08 Thread Maximilian Michels
Hi Beamers,

Quick suggestion: Could we enable Travis for the pull request of the
GitHub mirror? At the moment we only have Travis for our forks.

This would provide contributors with some feedback and also help us to
identify problems with the pull requests. I think we only need to tell
Infra to enable it for the apache/incubator-beam GitHub project.

Best,
Max


Re: New beam website!

2016-03-08 Thread Maximilian Michels
@Davor: Sure. I missed James' commit r1733208 on March 2. Updated.

@Kenneth: Yes, a bit annoying but Infra currently requires that the
website resides in the asf-site branch.

On Mon, Mar 7, 2016 at 11:35 PM, Davor Bonaci  wrote:
> In the transition process, it seems like we have lost a few website fixes
> that have been committed to the old repository. For example, JB's email
> address is incorrect again ;).
>
> Max, would you be willing to integrate those few last changes that were
> (likely) committed after your fork?
>
> On Mon, Mar 7, 2016 at 1:22 PM, Kenneth Knowles 
> wrote:
>
>> Thanks!
>>
>> Note that when someone clones for the first time, it will fail to checkout
>> the default `master` branch. You must manually `git checkout asf-site`. I
>> had a moment of confusion. If there is a technical reason not to use the
>> master branch, the default can be set like so:
>>
>> $ git symbolic-ref HEAD refs/head/asf-site
>>
>> It might make it easiest for new clones.
>>
>> Kenn
>>
>> On Mon, Mar 7, 2016 at 9:51 AM, Maximilian Michels  wrote:
>>
>> > Hi Beamers,
>> >
>> > The Git repo at
>> > https://git-wip-us.apache.org/repos/asf/incubator-beam-site.git is now
>> > populated and syncs with the website.
>> >
>> > Cheers,
>> > Max
>> >
>> > On Mon, Mar 7, 2016 at 12:40 PM, Jean-Baptiste Onofré 
>> > wrote:
>> > > Hi Max,
>> > >
>> > > thanks for the update.
>> > >
>> > > I'm on the renaming and legal PR on my side.
>> > >
>> > > Regards
>> > > JB
>> > >
>> > >
>> > > On 03/07/2016 11:43 AM, Maximilian Michels wrote:
>> > >>
>> > >> Hi JB,
>> > >>
>> > >> I've pushed the web site to the empty repository and I'll tell Infra
>> > >> to switch to the new repository.
>> > >>
>> > >> Cheers,
>> > >> Max
>> > >>
>> > >>
>> > >> On Fri, Mar 4, 2016 at 5:00 PM, Jean-Baptiste Onofré > >
>> > >> wrote:
>> > >>>
>> > >>> Hi Max,
>> > >>>
>> > >>> fair enough.
>> > >>>
>> > >>> Regards
>> > >>> JB
>> > >>>
>> > >>>
>> > >>> On 03/04/2016 03:52 PM, Maximilian Michels wrote:
>> > >>>>
>> > >>>>
>> > >>>> Hi JB,
>> > >>>>
>> > >>>> Good question. I didn't create the page but one problem about
>> relative
>> > >>>> URLs is changing the root directory or switching to https. So
>> instead
>> > >>>> of relative URLs which should use {{ site.baseurl }}/resource for
>> all
>> > >>>> links. That way, we can simply change baseurl in the _config.yml and
>> > >>>> we're good to go. For local testing, we set jekyll --baseurl "".
>> That
>> > >>>> approach has worked well for the Flink website which is also built
>> > >>>> with Jekyll.
>> > >>>>
>> > >>>> Cheers,
>> > >>>> Max
>> > >>>>
>> > >>>> On Fri, Mar 4, 2016 at 1:48 PM, Jean-Baptiste Onofré <
>> j...@nanthrax.net
>> > >
>> > >>>> wrote:
>> > >>>>>
>> > >>>>>
>> > >>>>> Hi Max,
>> > >>>>>
>> > >>>>> I just cloned your repo and built using jekyll.
>> > >>>>>
>> > >>>>> I just wonder why not using relative URL (for images and js
>> location)
>> > >>>>> instead of absolute ? It would allow us to directly open the
>> website
>> > in
>> > >>>>> a
>> > >>>>> browser. WDYT ?
>> > >>>>>
>> > >>>>> Regards
>> > >>>>> JB
>> > >>>>>
>> > >>>>> On 03/01/2016 01:59 PM, Maximilian Michels wrote:
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>> As a summary from
>> > >>>>>>
>> > >>>>>>
>> > https://is

Re: New beam website!

2016-03-07 Thread Maximilian Michels
Hi Beamers,

The Git repo at
https://git-wip-us.apache.org/repos/asf/incubator-beam-site.git is now
populated and syncs with the website.

Cheers,
Max

On Mon, Mar 7, 2016 at 12:40 PM, Jean-Baptiste Onofré  wrote:
> Hi Max,
>
> thanks for the update.
>
> I'm on the renaming and legal PR on my side.
>
> Regards
> JB
>
>
> On 03/07/2016 11:43 AM, Maximilian Michels wrote:
>>
>> Hi JB,
>>
>> I've pushed the web site to the empty repository and I'll tell Infra
>> to switch to the new repository.
>>
>> Cheers,
>> Max
>>
>>
>> On Fri, Mar 4, 2016 at 5:00 PM, Jean-Baptiste Onofré 
>> wrote:
>>>
>>> Hi Max,
>>>
>>> fair enough.
>>>
>>> Regards
>>> JB
>>>
>>>
>>> On 03/04/2016 03:52 PM, Maximilian Michels wrote:
>>>>
>>>>
>>>> Hi JB,
>>>>
>>>> Good question. I didn't create the page but one problem about relative
>>>> URLs is changing the root directory or switching to https. So instead
>>>> of relative URLs which should use {{ site.baseurl }}/resource for all
>>>> links. That way, we can simply change baseurl in the _config.yml and
>>>> we're good to go. For local testing, we set jekyll --baseurl "". That
>>>> approach has worked well for the Flink website which is also built
>>>> with Jekyll.
>>>>
>>>> Cheers,
>>>> Max
>>>>
>>>> On Fri, Mar 4, 2016 at 1:48 PM, Jean-Baptiste Onofré 
>>>> wrote:
>>>>>
>>>>>
>>>>> Hi Max,
>>>>>
>>>>> I just cloned your repo and built using jekyll.
>>>>>
>>>>> I just wonder why not using relative URL (for images and js location)
>>>>> instead of absolute ? It would allow us to directly open the website in
>>>>> a
>>>>> browser. WDYT ?
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>> On 03/01/2016 01:59 PM, Maximilian Michels wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> As a summary from
>>>>>>
>>>>>> https://issues.apache.org/jira/servicedesk/customer/portal/1/INFRA-11318
>>>>>> it would work as follows:
>>>>>>
>>>>>> We use the 'asf-site' branch of the repository. When we change the
>>>>>> website, we execute "jekyll build" followed by "jekyll serve" to
>>>>>> preview the site. Everything is generated in the 'content' directory.
>>>>>> We then push the changes and they are deployed.
>>>>>>
>>>>>> I've prepared everything in my fork:
>>>>>> https://github.com/mxm/beam-site/tree/asf-site
>>>>>>
>>>>>> Unfortunately, I couldn't push the changes to the new repository. The
>>>>>> permissions don't seem to be set up correctly.
>>>>>>
>>>>>> On Tue, Mar 1, 2016 at 10:24 AM, Maximilian Michels 
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Quick update. The Git repository is ready under
>>>>>>> https://git-wip-us.apache.org/repos/asf/incubator-beam-site.git
>>>>>>>
>>>>>>> I'm sorting out the last things and will push the website thereafter.
>>>>>>> Infra can then proceed to do the pubsub switch to deploy the website
>>>>>>> from there.
>>>>>>>
>>>>>>> On Thu, Feb 25, 2016 at 11:31 PM, Maximilian Michels 
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi JB,
>>>>>>>>
>>>>>>>> Greetings to Mexico! I was using Infra's "SVN to Git migration"
>>>>>>>> service desk. That seems like a standard way for migration to me. It
>>>>>>>> has also worked fine in the past.
>>>>>>>>
>>>>>>>> Could you explain the role of the SCM-Publish Maven plugin? What
>>>>>>>> would
>>>>>>>> be different from just committing the changes of the website to the
>>>>>>>> Git/SVN reposito

Re: New beam website!

2016-03-07 Thread Maximilian Michels
Hi JB,

I've pushed the web site to the empty repository and I'll tell Infra
to switch to the new repository.

Cheers,
Max


On Fri, Mar 4, 2016 at 5:00 PM, Jean-Baptiste Onofré  wrote:
> Hi Max,
>
> fair enough.
>
> Regards
> JB
>
>
> On 03/04/2016 03:52 PM, Maximilian Michels wrote:
>>
>> Hi JB,
>>
>> Good question. I didn't create the page but one problem about relative
>> URLs is changing the root directory or switching to https. So instead
>> of relative URLs which should use {{ site.baseurl }}/resource for all
>> links. That way, we can simply change baseurl in the _config.yml and
>> we're good to go. For local testing, we set jekyll --baseurl "". That
>> approach has worked well for the Flink website which is also built
>> with Jekyll.
>>
>> Cheers,
>> Max
>>
>> On Fri, Mar 4, 2016 at 1:48 PM, Jean-Baptiste Onofré 
>> wrote:
>>>
>>> Hi Max,
>>>
>>> I just cloned your repo and built using jekyll.
>>>
>>> I just wonder why not using relative URL (for images and js location)
>>> instead of absolute ? It would allow us to directly open the website in a
>>> browser. WDYT ?
>>>
>>> Regards
>>> JB
>>>
>>> On 03/01/2016 01:59 PM, Maximilian Michels wrote:
>>>>
>>>>
>>>> As a summary from
>>>> https://issues.apache.org/jira/servicedesk/customer/portal/1/INFRA-11318
>>>> it would work as follows:
>>>>
>>>> We use the 'asf-site' branch of the repository. When we change the
>>>> website, we execute "jekyll build" followed by "jekyll serve" to
>>>> preview the site. Everything is generated in the 'content' directory.
>>>> We then push the changes and they are deployed.
>>>>
>>>> I've prepared everything in my fork:
>>>> https://github.com/mxm/beam-site/tree/asf-site
>>>>
>>>> Unfortunately, I couldn't push the changes to the new repository. The
>>>> permissions don't seem to be set up correctly.
>>>>
>>>> On Tue, Mar 1, 2016 at 10:24 AM, Maximilian Michels 
>>>> wrote:
>>>>>
>>>>>
>>>>> Quick update. The Git repository is ready under
>>>>> https://git-wip-us.apache.org/repos/asf/incubator-beam-site.git
>>>>>
>>>>> I'm sorting out the last things and will push the website thereafter.
>>>>> Infra can then proceed to do the pubsub switch to deploy the website
>>>>> from there.
>>>>>
>>>>> On Thu, Feb 25, 2016 at 11:31 PM, Maximilian Michels 
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi JB,
>>>>>>
>>>>>> Greetings to Mexico! I was using Infra's "SVN to Git migration"
>>>>>> service desk. That seems like a standard way for migration to me. It
>>>>>> has also worked fine in the past.
>>>>>>
>>>>>> Could you explain the role of the SCM-Publish Maven plugin? What would
>>>>>> be different from just committing the changes of the website to the
>>>>>> Git/SVN repository? Is it necessary that people use another wrapper
>>>>>> around a version control system?
>>>>>>
>>>>>> After all, what counts is that we can use a Git repository to check in
>>>>>> website changes and use the GitHub mirror. That is not only much
>>>>>> faster (pushing in SVN takes a long time and there is no local
>>>>>> repository) but also more convenient for most people that use Git on a
>>>>>> daily basis. How the actual website is served, shouldn't matter to the
>>>>>> developer.
>>>>>>
>>>>>> Cheers,
>>>>>> Max
>>>>>>
>>>>>> On Thu, Feb 25, 2016 at 11:23 PM, Jean-Baptiste Onofré
>>>>>> 
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I agree and it's what we have: the website sources are on my github
>>>>>>> and
>>>>>>> we will move it on apache git. Anyway the source is not where you
>>>>>>> promote
>>>>>>> with the scm-publish. The PR or patches are not crea

Re: Permission problems

2016-03-07 Thread Maximilian Michels
Hi,

Thanks for the quick replies and for adding me to the incubator group.
We're getting there :)

Still, this doesn't completely solve all permission problems. For
instance, I can't assign myself to a JIRA issue nor resolve issues,
e.g. https://issues.apache.org/jira/browse/BEAM-5

Best,
Max



On Sat, Mar 5, 2016 at 9:02 AM, Jean-Baptiste Onofré  wrote:
> Hi Max,
>
> I added you in the incubator group, it should be OK now.
>
> Regards
> JB
>
> On 03/04/2016 06:56 PM, Maximilian Michels wrote:
>>
>> Hi Beamers,
>>
>> While working on migrating the Beam web site to Git, I came across
>> problems with the LDAP permissions. According to Infra, I'm not part
>> of the incubator group [1].
>>
>> Now that we want to merge the Flink Runner [2], I think I'll be unable
>> to merge any valuable changes to the repository. If we want to move
>> the main development of the runner to the Beam repository, we need to
>> fix that.
>>
>> I know we're just getting started but it would be great to setup
>> permissions properly.
>>
>> Thanks,
>> Max
>>
>> [1]
>> https://issues.apache.org/jira/browse/INFRA-11318?focusedCommentId=15174230&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15174230
>>
>> [2] https://github.com/apache/incubator-beam/pull/12
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Permission problems

2016-03-04 Thread Maximilian Michels
Hi Beamers,

While working on migrating the Beam web site to Git, I came across
problems with the LDAP permissions. According to Infra, I'm not part
of the incubator group [1].

Now that we want to merge the Flink Runner [2], I think I'll be unable
to merge any valuable changes to the repository. If we want to move
the main development of the runner to the Beam repository, we need to
fix that.

I know we're just getting started but it would be great to setup
permissions properly.

Thanks,
Max

[1] 
https://issues.apache.org/jira/browse/INFRA-11318?focusedCommentId=15174230&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15174230

[2] https://github.com/apache/incubator-beam/pull/12


Re: New beam website!

2016-03-04 Thread Maximilian Michels
Hi JB,

Good question. I didn't create the page but one problem about relative
URLs is changing the root directory or switching to https. So instead
of relative URLs which should use {{ site.baseurl }}/resource for all
links. That way, we can simply change baseurl in the _config.yml and
we're good to go. For local testing, we set jekyll --baseurl "". That
approach has worked well for the Flink website which is also built
with Jekyll.

Cheers,
Max

On Fri, Mar 4, 2016 at 1:48 PM, Jean-Baptiste Onofré  wrote:
> Hi Max,
>
> I just cloned your repo and built using jekyll.
>
> I just wonder why not using relative URL (for images and js location)
> instead of absolute ? It would allow us to directly open the website in a
> browser. WDYT ?
>
> Regards
> JB
>
> On 03/01/2016 01:59 PM, Maximilian Michels wrote:
>>
>> As a summary from
>> https://issues.apache.org/jira/servicedesk/customer/portal/1/INFRA-11318
>> it would work as follows:
>>
>> We use the 'asf-site' branch of the repository. When we change the
>> website, we execute "jekyll build" followed by "jekyll serve" to
>> preview the site. Everything is generated in the 'content' directory.
>> We then push the changes and they are deployed.
>>
>> I've prepared everything in my fork:
>> https://github.com/mxm/beam-site/tree/asf-site
>>
>> Unfortunately, I couldn't push the changes to the new repository. The
>> permissions don't seem to be set up correctly.
>>
>> On Tue, Mar 1, 2016 at 10:24 AM, Maximilian Michels 
>> wrote:
>>>
>>> Quick update. The Git repository is ready under
>>> https://git-wip-us.apache.org/repos/asf/incubator-beam-site.git
>>>
>>> I'm sorting out the last things and will push the website thereafter.
>>> Infra can then proceed to do the pubsub switch to deploy the website
>>> from there.
>>>
>>> On Thu, Feb 25, 2016 at 11:31 PM, Maximilian Michels 
>>> wrote:
>>>>
>>>> Hi JB,
>>>>
>>>> Greetings to Mexico! I was using Infra's "SVN to Git migration"
>>>> service desk. That seems like a standard way for migration to me. It
>>>> has also worked fine in the past.
>>>>
>>>> Could you explain the role of the SCM-Publish Maven plugin? What would
>>>> be different from just committing the changes of the website to the
>>>> Git/SVN repository? Is it necessary that people use another wrapper
>>>> around a version control system?
>>>>
>>>> After all, what counts is that we can use a Git repository to check in
>>>> website changes and use the GitHub mirror. That is not only much
>>>> faster (pushing in SVN takes a long time and there is no local
>>>> repository) but also more convenient for most people that use Git on a
>>>> daily basis. How the actual website is served, shouldn't matter to the
>>>> developer.
>>>>
>>>> Cheers,
>>>> Max
>>>>
>>>> On Thu, Feb 25, 2016 at 11:23 PM, Jean-Baptiste Onofré 
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> I agree and it's what we have: the website sources are on my github and
>>>>> we will move it on apache git. Anyway the source is not where you promote
>>>>> with the scm-publish. The PR or patches are not created based on the
>>>>> "resulting" location (on svn now) but on the sources (on my github now).
>>>>> That's why I didn't understand the jira.
>>>>> I don't mind to move the "resulting/promotion" location from svn to
>>>>> git, but I don't see how it changes for the devs. It would be exactly the
>>>>> same workflow  (using the scm-publish plugin).
>>>>> Regards JB
>>>>>
>>>>>
>>>>> Sent from my Samsung device
>>>>>
>>>>>  Original message 
>>>>> From: James Malone 
>>>>> Date: 25/02/2016  21:38  (GMT+01:00)
>>>>> To: dev@beam.incubator.apache.org
>>>>> Subject: Re: New beam website!
>>>>>
>>>>> I'll chime in here - I'd greatly prefer a git-based repo for the
>>>>> website.
>>>>> If nothing else, it's what many of the committers on our side are
>>>>> familiar
>>>>> with.
>>>>>
>>>>> Cheers!

Re: New beam website!

2016-03-01 Thread Maximilian Michels
As a summary from
https://issues.apache.org/jira/servicedesk/customer/portal/1/INFRA-11318
it would work as follows:

We use the 'asf-site' branch of the repository. When we change the
website, we execute "jekyll build" followed by "jekyll serve" to
preview the site. Everything is generated in the 'content' directory.
We then push the changes and they are deployed.

I've prepared everything in my fork:
https://github.com/mxm/beam-site/tree/asf-site

Unfortunately, I couldn't push the changes to the new repository. The
permissions don't seem to be set up correctly.

On Tue, Mar 1, 2016 at 10:24 AM, Maximilian Michels  wrote:
> Quick update. The Git repository is ready under
> https://git-wip-us.apache.org/repos/asf/incubator-beam-site.git
>
> I'm sorting out the last things and will push the website thereafter.
> Infra can then proceed to do the pubsub switch to deploy the website
> from there.
>
> On Thu, Feb 25, 2016 at 11:31 PM, Maximilian Michels  wrote:
>> Hi JB,
>>
>> Greetings to Mexico! I was using Infra's "SVN to Git migration"
>> service desk. That seems like a standard way for migration to me. It
>> has also worked fine in the past.
>>
>> Could you explain the role of the SCM-Publish Maven plugin? What would
>> be different from just committing the changes of the website to the
>> Git/SVN repository? Is it necessary that people use another wrapper
>> around a version control system?
>>
>> After all, what counts is that we can use a Git repository to check in
>> website changes and use the GitHub mirror. That is not only much
>> faster (pushing in SVN takes a long time and there is no local
>> repository) but also more convenient for most people that use Git on a
>> daily basis. How the actual website is served, shouldn't matter to the
>> developer.
>>
>> Cheers,
>> Max
>>
>> On Thu, Feb 25, 2016 at 11:23 PM, Jean-Baptiste Onofré  
>> wrote:
>>>
>>>
>>> I agree and it's what we have: the website sources are on my github and we 
>>> will move it on apache git. Anyway the source is not where you promote with 
>>> the scm-publish. The PR or patches are not created based on the "resulting" 
>>> location (on svn now) but on the sources (on my github now). That's why I 
>>> didn't understand the jira.
>>> I don't mind to move the "resulting/promotion" location from svn to git, 
>>> but I don't see how it changes for the devs. It would be exactly the same 
>>> workflow  (using the scm-publish plugin).
>>> Regards JB
>>>
>>>
>>> Sent from my Samsung device
>>>
>>>  Original message 
>>> From: James Malone 
>>> Date: 25/02/2016  21:38  (GMT+01:00)
>>> To: dev@beam.incubator.apache.org
>>> Subject: Re: New beam website!
>>>
>>> I'll chime in here - I'd greatly prefer a git-based repo for the website.
>>> If nothing else, it's what many of the committers on our side are familiar
>>> with.
>>>
>>> Cheers!
>>>
>>> James
>>>
>>> On Thu, Feb 25, 2016 at 10:49 AM, Henry Saputra 
>>> wrote:
>>>
>>>> Hi JB,
>>>>
>>>> ASF infra now offer Git pubsub to support website source code [1]
>>>>
>>>> I think this is what Max was trying to enable so instead of using svn for
>>>> website it will be using Git that also allow pubsub to publish it.
>>>>
>>>> - Henry
>>>>
>>>> [1] https://blogs.apache.org/infra/entry/git_based_websites_available
>>>>
>>>> On Thu, Feb 25, 2016 at 4:38 AM, Jean-Baptiste Onofré 
>>>> wrote:
>>>>
>>>> >
>>>> >
>>>> > Hi guys
>>>> > Max your jira doesn't make sense imho: the svn for the website is for
>>>> > pubsub, it's not where the website source has to be located. The website
>>>> > can be on git or where ever you want but the scm-publish has to be on
>>>> svn.
>>>> > Please close your jira and wait when I'm back to deal with that.
>>>> > Regards JB from Mexico ;)
>>>> >
>>>> >
>>>> > Sent from my Samsung device
>>>> >
>>>> >  Original message 
>>>> > From: Maximilian Michels 
>>>> > Date: 25/02/2016  10:43  (GMT+01:00)
>>>> > To: dev@beam.incubator.apache

Re: New beam website!

2016-03-01 Thread Maximilian Michels
Quick update. The Git repository is ready under
https://git-wip-us.apache.org/repos/asf/incubator-beam-site.git

I'm sorting out the last things and will push the website thereafter.
Infra can then proceed to do the pubsub switch to deploy the website
from there.

On Thu, Feb 25, 2016 at 11:31 PM, Maximilian Michels  wrote:
> Hi JB,
>
> Greetings to Mexico! I was using Infra's "SVN to Git migration"
> service desk. That seems like a standard way for migration to me. It
> has also worked fine in the past.
>
> Could you explain the role of the SCM-Publish Maven plugin? What would
> be different from just committing the changes of the website to the
> Git/SVN repository? Is it necessary that people use another wrapper
> around a version control system?
>
> After all, what counts is that we can use a Git repository to check in
> website changes and use the GitHub mirror. That is not only much
> faster (pushing in SVN takes a long time and there is no local
> repository) but also more convenient for most people that use Git on a
> daily basis. How the actual website is served, shouldn't matter to the
> developer.
>
> Cheers,
> Max
>
> On Thu, Feb 25, 2016 at 11:23 PM, Jean-Baptiste Onofré  
> wrote:
>>
>>
>> I agree and it's what we have: the website sources are on my github and we 
>> will move it on apache git. Anyway the source is not where you promote with 
>> the scm-publish. The PR or patches are not created based on the "resulting" 
>> location (on svn now) but on the sources (on my github now). That's why I 
>> didn't understand the jira.
>> I don't mind to move the "resulting/promotion" location from svn to git, but 
>> I don't see how it changes for the devs. It would be exactly the same 
>> workflow  (using the scm-publish plugin).
>> Regards JB
>>
>>
>> Sent from my Samsung device
>>
>>  Original message 
>> From: James Malone 
>> Date: 25/02/2016  21:38  (GMT+01:00)
>> To: dev@beam.incubator.apache.org
>> Subject: Re: New beam website!
>>
>> I'll chime in here - I'd greatly prefer a git-based repo for the website.
>> If nothing else, it's what many of the committers on our side are familiar
>> with.
>>
>> Cheers!
>>
>> James
>>
>> On Thu, Feb 25, 2016 at 10:49 AM, Henry Saputra 
>> wrote:
>>
>>> Hi JB,
>>>
>>> ASF infra now offer Git pubsub to support website source code [1]
>>>
>>> I think this is what Max was trying to enable so instead of using svn for
>>> website it will be using Git that also allow pubsub to publish it.
>>>
>>> - Henry
>>>
>>> [1] https://blogs.apache.org/infra/entry/git_based_websites_available
>>>
>>> On Thu, Feb 25, 2016 at 4:38 AM, Jean-Baptiste Onofré 
>>> wrote:
>>>
>>> >
>>> >
>>> > Hi guys
>>> > Max your jira doesn't make sense imho: the svn for the website is for
>>> > pubsub, it's not where the website source has to be located. The website
>>> > can be on git or where ever you want but the scm-publish has to be on
>>> svn.
>>> > Please close your jira and wait when I'm back to deal with that.
>>> > Regards JB from Mexico ;)
>>> >
>>> >
>>> > Sent from my Samsung device
>>> >
>>> >  Original message 
>>> > From: Maximilian Michels 
>>> > Date: 25/02/2016  10:43  (GMT+01:00)
>>> > To: dev@beam.incubator.apache.org
>>> > Subject: Re: New beam website!
>>> >
>>> > Alright, I've asked the Infra team to migrate the repository to Git
>>> > and setup a GitHub sync.
>>> >
>>> > Git migration:
>>> > https://issues.apache.org/jira/servicedesk/agent/INFRA/issue/INFRA-11318
>>> > GitHub sync:
>>> > https://issues.apache.org/jira/servicedesk/agent/INFRA/issue/INFRA-11319
>>> >
>>> > Cheers,
>>> > Max
>>> >
>>> > On Wed, Feb 24, 2016 at 9:00 PM, James Malone
>>> >  wrote:
>>> > > That would be awesome. Full disclosure - I >> > >
>>> > > Happy to help however I can and any experience anyone has for making
>>> that
>>> > > happen is welcomed. :)
>>> > >
>>> > > On Wed, Feb 24, 2016 at 9:05 AM, Maximilian Michels 
>>> > wrote:
>>> > >
>>> > >> Hi James,
>

Re: New beam website!

2016-02-25 Thread Maximilian Michels
Hi JB,

Greetings to Mexico! I was using Infra's "SVN to Git migration"
service desk. That seems like a standard way for migration to me. It
has also worked fine in the past.

Could you explain the role of the SCM-Publish Maven plugin? What would
be different from just committing the changes of the website to the
Git/SVN repository? Is it necessary that people use another wrapper
around a version control system?

After all, what counts is that we can use a Git repository to check in
website changes and use the GitHub mirror. That is not only much
faster (pushing in SVN takes a long time and there is no local
repository) but also more convenient for most people that use Git on a
daily basis. How the actual website is served, shouldn't matter to the
developer.

Cheers,
Max

On Thu, Feb 25, 2016 at 11:23 PM, Jean-Baptiste Onofré  
wrote:
>
>
> I agree and it's what we have: the website sources are on my github and we 
> will move it on apache git. Anyway the source is not where you promote with 
> the scm-publish. The PR or patches are not created based on the "resulting" 
> location (on svn now) but on the sources (on my github now). That's why I 
> didn't understand the jira.
> I don't mind to move the "resulting/promotion" location from svn to git, but 
> I don't see how it changes for the devs. It would be exactly the same 
> workflow  (using the scm-publish plugin).
> Regards JB
>
>
> Sent from my Samsung device
>
>  Original message 
> From: James Malone 
> Date: 25/02/2016  21:38  (GMT+01:00)
> To: dev@beam.incubator.apache.org
> Subject: Re: New beam website!
>
> I'll chime in here - I'd greatly prefer a git-based repo for the website.
> If nothing else, it's what many of the committers on our side are familiar
> with.
>
> Cheers!
>
> James
>
> On Thu, Feb 25, 2016 at 10:49 AM, Henry Saputra 
> wrote:
>
>> Hi JB,
>>
>> ASF infra now offer Git pubsub to support website source code [1]
>>
>> I think this is what Max was trying to enable so instead of using svn for
>> website it will be using Git that also allow pubsub to publish it.
>>
>> - Henry
>>
>> [1] https://blogs.apache.org/infra/entry/git_based_websites_available
>>
>> On Thu, Feb 25, 2016 at 4:38 AM, Jean-Baptiste Onofré 
>> wrote:
>>
>> >
>> >
>> > Hi guys
>> > Max your jira doesn't make sense imho: the svn for the website is for
>> > pubsub, it's not where the website source has to be located. The website
>> > can be on git or where ever you want but the scm-publish has to be on
>> svn.
>> > Please close your jira and wait when I'm back to deal with that.
>> > Regards JB from Mexico ;)
>> >
>> >
>> > Sent from my Samsung device
>> >
>> >  Original message 
>> > From: Maximilian Michels 
>> > Date: 25/02/2016  10:43  (GMT+01:00)
>> > To: dev@beam.incubator.apache.org
>> > Subject: Re: New beam website!
>> >
>> > Alright, I've asked the Infra team to migrate the repository to Git
>> > and setup a GitHub sync.
>> >
>> > Git migration:
>> > https://issues.apache.org/jira/servicedesk/agent/INFRA/issue/INFRA-11318
>> > GitHub sync:
>> > https://issues.apache.org/jira/servicedesk/agent/INFRA/issue/INFRA-11319
>> >
>> > Cheers,
>> > Max
>> >
>> > On Wed, Feb 24, 2016 at 9:00 PM, James Malone
>> >  wrote:
>> > > That would be awesome. Full disclosure - I > > >
>> > > Happy to help however I can and any experience anyone has for making
>> that
>> > > happen is welcomed. :)
>> > >
>> > > On Wed, Feb 24, 2016 at 9:05 AM, Maximilian Michels 
>> > wrote:
>> > >
>> > >> Hi James,
>> > >>
>> > >> The updated website looks good to me. I agree that it will simplify
>> > >> work for a lot of people who are used to Jekyll.
>> > >>
>> > >> The website is already in the SVN repository. After we have sorted out
>> > >> the CCLAs, the only thing that is left is a GitHub sync to make it
>> > >> more accessible for contributors. The prerequisite for that is that we
>> > >> change it to Git. Apache Infra can do that (we have done likewise for
>> > >> the Flink website repository and it works great).
>> > >>
>> > >> Cheers,
>> > >> Max
>> > >>
>> > >> On Wed, Feb 24, 2

Re: New beam website!

2016-02-25 Thread Maximilian Michels
Alright, I've asked the Infra team to migrate the repository to Git
and setup a GitHub sync.

Git migration: 
https://issues.apache.org/jira/servicedesk/agent/INFRA/issue/INFRA-11318
GitHub sync: 
https://issues.apache.org/jira/servicedesk/agent/INFRA/issue/INFRA-11319

Cheers,
Max

On Wed, Feb 24, 2016 at 9:00 PM, James Malone
 wrote:
> That would be awesome. Full disclosure - I 
> Happy to help however I can and any experience anyone has for making that
> happen is welcomed. :)
>
> On Wed, Feb 24, 2016 at 9:05 AM, Maximilian Michels  wrote:
>
>> Hi James,
>>
>> The updated website looks good to me. I agree that it will simplify
>> work for a lot of people who are used to Jekyll.
>>
>> The website is already in the SVN repository. After we have sorted out
>> the CCLAs, the only thing that is left is a GitHub sync to make it
>> more accessible for contributors. The prerequisite for that is that we
>> change it to Git. Apache Infra can do that (we have done likewise for
>> the Flink website repository and it works great).
>>
>> Cheers,
>> Max
>>
>> On Wed, Feb 24, 2016 at 7:04 AM, James Malone
>>  wrote:
>> > Hello everyone,
>> >
>> > Since we're in the process of getting bootstrapped, I wanted to improve
>> on
>> > our website a bit, especially since we have a new logo.
>> >
>> > The first site was built using Maven sites. To be honest, it presented a
>> > bit of a problem because many of the Beam committers have used GitHub
>> docs
>> > to date and wanted a system which was somewhat similar. Since GH docs is
>> > built on Jekyll (as far as I know) I figured that might actually be an
>> > easier way to build the website anyway.
>> >
>> > So, I just updated the incubator website. with an exact copy of the old
>> > site but using Jekyll + Bootstrap 3. That should hopefully make it easier
>> > for anyone and everyone to work against (and to move existing docs.)
>> >
>> > The repository for the current site is here:
>> >
>> > https://github.com/evilsoapbox/beam-site
>> >
>> > I also have a tgz of the old site in case there's any concern or
>> > disagreement. Needless to say, when we sort out the CCLAs (which should
>> be
>> > very soon) I'd like to get this in a project repo somewhere so we can
>> have
>> > better version control and review.
>> >
>> > Finally, the theme is basic bootstrap with a few changes (like using
>> Roboto
>> > as the font to match the logo.) I figure if/when there's interest in
>> > detailed design, we can cover that as a seperate discussion.
>> >
>> > Cheers!
>> >
>> > James
>>


Re: New beam website!

2016-02-24 Thread Maximilian Michels
Hi James,

The updated website looks good to me. I agree that it will simplify
work for a lot of people who are used to Jekyll.

The website is already in the SVN repository. After we have sorted out
the CCLAs, the only thing that is left is a GitHub sync to make it
more accessible for contributors. The prerequisite for that is that we
change it to Git. Apache Infra can do that (we have done likewise for
the Flink website repository and it works great).

Cheers,
Max

On Wed, Feb 24, 2016 at 7:04 AM, James Malone
 wrote:
> Hello everyone,
>
> Since we're in the process of getting bootstrapped, I wanted to improve on
> our website a bit, especially since we have a new logo.
>
> The first site was built using Maven sites. To be honest, it presented a
> bit of a problem because many of the Beam committers have used GitHub docs
> to date and wanted a system which was somewhat similar. Since GH docs is
> built on Jekyll (as far as I know) I figured that might actually be an
> easier way to build the website anyway.
>
> So, I just updated the incubator website. with an exact copy of the old
> site but using Jekyll + Bootstrap 3. That should hopefully make it easier
> for anyone and everyone to work against (and to move existing docs.)
>
> The repository for the current site is here:
>
> https://github.com/evilsoapbox/beam-site
>
> I also have a tgz of the old site in case there's any concern or
> disagreement. Needless to say, when we sort out the CCLAs (which should be
> very soon) I'd like to get this in a project repo somewhere so we can have
> better version control and review.
>
> Finally, the theme is basic bootstrap with a few changes (like using Roboto
> as the font to match the logo.) I figure if/when there's interest in
> detailed design, we can cover that as a seperate discussion.
>
> Cheers!
>
> James


Re: Apache Beam logo proposal

2016-02-19 Thread Maximilian Michels
The colorful logo looks really nice. The condensed logo is OK in my
opinion. I wonder if we could use a small variant of the colorful logo
instead?

On Thu, Feb 18, 2016 at 9:47 PM, Ufuk Celebi  wrote:
> Amazing job!
>
> On Thu, Feb 18, 2016 at 9:39 PM, Stephan Ewen  wrote:
>> I like it, too :-)
>>
>> On Thu, Feb 18, 2016 at 8:45 PM, Jean-Baptiste Onofré 
>> wrote:
>>
>>> Hi James,
>>>
>>> and thanks,
>>>
>>> I like it !
>>>
>>> Regards
>>> JB
>>>
>>>
>>> On 02/18/2016 04:38 PM, James Malone wrote:
>>>
 Absolutely!

 Here is a link to the full proposed logo
 <
 https://drive.google.com/file/d/0B-IhJZh9Ab52MmlkWHk2bWQ3bW8/view?usp=sharing
 >.
 Here is the proposed condensed (icon) logo
 <
 https://drive.google.com/file/d/0B-IhJZh9Ab52cGEycG16T1UwS2M/view?usp=sharing
 >
 .

 On Thu, Feb 18, 2016 at 12:19 AM, Jean-Baptiste Onofré 
 wrote:

 Hi James,
>
> it's not possible to send attachment on the Apache mailing list.
>
> Can you share the logo proposal on gist, google drive, or a HTTP server ?
>
> Thanks,
> Regards
> JB
>
> On 02/18/2016 08:46 AM, James Malone wrote:
>
> Hello everyone,
>>
>> We're at the point where I think Beam needs a logo.
>>
>> Originally, the name Beam was proposed to mix Batch + strEAM into one
>> name (hence Beam). A designer, Stephanie Smythies, graciously dedicated
>> some of her time to create a logo concept for the project. I wanted to
>> share that design and propose we adopt it as the logo (and design) for
>> the project.
>>
>>
>>
>> This logo is a 'B' which ties closely to the project name, the concept
>> of light beams, and the merge of patch and streaming (via the lights in
>> the 'B'). Moreover, it is colorful, bright, and fresh (imho, of course.)
>>
>> For a smaller logo (site icons and such) we can use the following:
>>
>>
>>
>> What does everyone think? If this logo looks good, I also propose we
>> redesign the current Beam site and materials to match the design,
>> colors, and typeface.
>>
>> Cheers!
>>
>> James
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
>

>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>


Re: Flink Runner - Current State & Roadmap

2016-02-15 Thread Maximilian Michels
@Bakey Pan: Sorry, just saw your email. The best way to get started is
to run some example programs and subsequently implement your own
Beam/Dataflow programs. Then, please have a look at the open issues.
In streaming, there is no generic Sink support yet and it is a
relatively easy task to fix that. For further discussion, please open
a new thread on the mailing list.

Cheers,
Max


Re: Flink Runner - Current State & Roadmap

2016-02-15 Thread Maximilian Michels
Just saw there is already a JIRA for including the Flink Runner code:
https://issues.apache.org/jira/browse/BEAM-5

On Mon, Feb 15, 2016 at 11:42 AM, Maximilian Michels  wrote:
> Hi,
>
> Thanks you all for the positive feedback!
>
> @Mark: Yes, the current GitHub version relies on 1.0.0 of the
> DataflowSDK. Naturally, some things have changed since that release
> but we figured that we freeze to this release until more users
> requested a newer version. Thanks a lot for the offer. I think it
> would be great if you tried to adapt the Flink Runner while doing the
> refactoring of the final to-be-contributed Beam code. As long as the
> code is not out yet, this is totally fine. Afterwards, we'll continue
> the development to the Beam community.
>
> We deliberately chose the low-level approach of translating the API
> because we think that semantics and correctness are the top priority.
> An optimization in terms of a more direct translation (and possibly
> improved performance), would be the next step. The question is, if
> Beam Runners have to implement more own functionality (like windows or
> triggers) or if we leave this as an optional optimization that the
> Runner chooses to do? That'll be one of the things we will have to
> figure out.
>
> Best,
> Max
>
>
>
> On Sat, Feb 13, 2016 at 5:28 PM, bakey pan  wrote:
>> Hi,Max:
>> I am reading the source code of Beam and Flink.
>> I am also interested in contributing code to the Flink runner.May be we
>> can talk more about which features is more suitable for me.
>>
>> 2016-02-13 15:17 GMT+08:00 Jean-Baptiste Onofré :
>>
>>> Hi Max,
>>>
>>> it sounds good to me !
>>>
>>> Thanks,
>>> Regards
>>> JB
>>>
>>> On 02/12/2016 08:06 PM, Maximilian Michels wrote:
>>>
>>>> Hi Beamers,
>>>>
>>>> Now that things are getting started and we discuss the technical
>>>> vision of Beam, we would like to contribute the Flink runner and start
>>>> by sharing some details about the status and the roadmap.
>>>>
>>>> The Flink Runner integrates deeply and naturally with the Dataflow SDK
>>>> (the Beam precursor), because the Flink DataStream API shares many
>>>> concepts with the Dataflow model.
>>>> Based on whether the program input is bounded or unbounded, the
>>>> program goes against Flink's DataStream or DataSet API.
>>>>
>>>> A quick preview at some of the nice features of the runner:
>>>>
>>>>- Support for stream transformations, event time, watermarks
>>>>- The full Dataflow windowing semantics, including fixed/sliding
>>>> time windows, and session windows
>>>>- Integration with Flink's streaming sources (Kafka, RabbitMQ, ...)
>>>>
>>>>- Batch (bounded sources) integrates fully with Flink's managed
>>>> memory techniques and out-of-core algorithms, supporting huge joins
>>>> and aggregations.
>>>>- Integration with Flink's batch API sources (plain text, CSV, Avro,
>>>> JDBC, HBase, ...)
>>>>
>>>>- Integration with Flink's fault tolerance - both batch and
>>>> streaming program recover from failures
>>>>- After upgrading the dependency to Flink 1.0, one could even use
>>>> the Flink Savepoints feature (save streaming state for later resuming)
>>>> with the Dataflow programs.
>>>>
>>>> Attached you can find the document we drew up with more information
>>>> about the current state of the Runner and the roadmap for its upcoming
>>>> features:
>>>>
>>>>
>>>> https://docs.google.com/document/d/1QM_X70VvxWksAQ5C114MoAKb1d9Vzl2dLxEZM4WYogo/edit?usp=sharing
>>>>
>>>> The Runner executes the quasi complete Beam streaming model (well,
>>>> Dataflow, actually, because Beam is not there, yet).
>>>>
>>>> Given the current excitement and buzz around Beam, we could add this
>>>> runner to the Beam repository and link it as a "preview" for the
>>>> people that want to get a feeling of what it will be like to write and
>>>> run streaming (unbounded) Beam programs. That would give people
>>>> something tangible until the actual Beam code is available.
>>>>
>>>> What do you think?
>>>>
>>>> Best,
>>>> Max
>>>>
>>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>
>>
>>
>> --
>>  Best Regards
>>BakeyPan


Re: Flink Runner - Current State & Roadmap

2016-02-15 Thread Maximilian Michels
Hi,

Thanks you all for the positive feedback!

@Mark: Yes, the current GitHub version relies on 1.0.0 of the
DataflowSDK. Naturally, some things have changed since that release
but we figured that we freeze to this release until more users
requested a newer version. Thanks a lot for the offer. I think it
would be great if you tried to adapt the Flink Runner while doing the
refactoring of the final to-be-contributed Beam code. As long as the
code is not out yet, this is totally fine. Afterwards, we'll continue
the development to the Beam community.

We deliberately chose the low-level approach of translating the API
because we think that semantics and correctness are the top priority.
An optimization in terms of a more direct translation (and possibly
improved performance), would be the next step. The question is, if
Beam Runners have to implement more own functionality (like windows or
triggers) or if we leave this as an optional optimization that the
Runner chooses to do? That'll be one of the things we will have to
figure out.

Best,
Max



On Sat, Feb 13, 2016 at 5:28 PM, bakey pan  wrote:
> Hi,Max:
> I am reading the source code of Beam and Flink.
> I am also interested in contributing code to the Flink runner.May be we
> can talk more about which features is more suitable for me.
>
> 2016-02-13 15:17 GMT+08:00 Jean-Baptiste Onofré :
>
>> Hi Max,
>>
>> it sounds good to me !
>>
>> Thanks,
>> Regards
>> JB
>>
>> On 02/12/2016 08:06 PM, Maximilian Michels wrote:
>>
>>> Hi Beamers,
>>>
>>> Now that things are getting started and we discuss the technical
>>> vision of Beam, we would like to contribute the Flink runner and start
>>> by sharing some details about the status and the roadmap.
>>>
>>> The Flink Runner integrates deeply and naturally with the Dataflow SDK
>>> (the Beam precursor), because the Flink DataStream API shares many
>>> concepts with the Dataflow model.
>>> Based on whether the program input is bounded or unbounded, the
>>> program goes against Flink's DataStream or DataSet API.
>>>
>>> A quick preview at some of the nice features of the runner:
>>>
>>>- Support for stream transformations, event time, watermarks
>>>- The full Dataflow windowing semantics, including fixed/sliding
>>> time windows, and session windows
>>>- Integration with Flink's streaming sources (Kafka, RabbitMQ, ...)
>>>
>>>- Batch (bounded sources) integrates fully with Flink's managed
>>> memory techniques and out-of-core algorithms, supporting huge joins
>>> and aggregations.
>>>- Integration with Flink's batch API sources (plain text, CSV, Avro,
>>> JDBC, HBase, ...)
>>>
>>>- Integration with Flink's fault tolerance - both batch and
>>> streaming program recover from failures
>>>- After upgrading the dependency to Flink 1.0, one could even use
>>> the Flink Savepoints feature (save streaming state for later resuming)
>>> with the Dataflow programs.
>>>
>>> Attached you can find the document we drew up with more information
>>> about the current state of the Runner and the roadmap for its upcoming
>>> features:
>>>
>>>
>>> https://docs.google.com/document/d/1QM_X70VvxWksAQ5C114MoAKb1d9Vzl2dLxEZM4WYogo/edit?usp=sharing
>>>
>>> The Runner executes the quasi complete Beam streaming model (well,
>>> Dataflow, actually, because Beam is not there, yet).
>>>
>>> Given the current excitement and buzz around Beam, we could add this
>>> runner to the Beam repository and link it as a "preview" for the
>>> people that want to get a feeling of what it will be like to write and
>>> run streaming (unbounded) Beam programs. That would give people
>>> something tangible until the actual Beam code is available.
>>>
>>> What do you think?
>>>
>>> Best,
>>> Max
>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>
>
> --
>  Best Regards
>BakeyPan


Flink Runner - Current State & Roadmap

2016-02-12 Thread Maximilian Michels
Hi Beamers,

Now that things are getting started and we discuss the technical
vision of Beam, we would like to contribute the Flink runner and start
by sharing some details about the status and the roadmap.

The Flink Runner integrates deeply and naturally with the Dataflow SDK
(the Beam precursor), because the Flink DataStream API shares many
concepts with the Dataflow model.
Based on whether the program input is bounded or unbounded, the
program goes against Flink's DataStream or DataSet API.

A quick preview at some of the nice features of the runner:

  - Support for stream transformations, event time, watermarks
  - The full Dataflow windowing semantics, including fixed/sliding
time windows, and session windows
  - Integration with Flink's streaming sources (Kafka, RabbitMQ, ...)

  - Batch (bounded sources) integrates fully with Flink's managed
memory techniques and out-of-core algorithms, supporting huge joins
and aggregations.
  - Integration with Flink's batch API sources (plain text, CSV, Avro,
JDBC, HBase, ...)

  - Integration with Flink's fault tolerance - both batch and
streaming program recover from failures
  - After upgrading the dependency to Flink 1.0, one could even use
the Flink Savepoints feature (save streaming state for later resuming)
with the Dataflow programs.

Attached you can find the document we drew up with more information
about the current state of the Runner and the roadmap for its upcoming
features:

https://docs.google.com/document/d/1QM_X70VvxWksAQ5C114MoAKb1d9Vzl2dLxEZM4WYogo/edit?usp=sharing

The Runner executes the quasi complete Beam streaming model (well,
Dataflow, actually, because Beam is not there, yet).

Given the current excitement and buzz around Beam, we could add this
runner to the Beam repository and link it as a "preview" for the
people that want to get a feeling of what it will be like to write and
run streaming (unbounded) Beam programs. That would give people
something tangible until the actual Beam code is available.

What do you think?

Best,
Max


Re: Apache Beam blog

2016-02-12 Thread Maximilian Michels
+1 Looks nice.

I'm sure we'll iterate over the design :)

On Fri, Feb 12, 2016 at 6:58 PM, Tyler Akidau  wrote:
> +1
>
> -Tyler
>
> On Fri, Feb 12, 2016 at 9:57 AM Amit Sela  wrote:
>
>> +1
>>
>> I think we could also publish user's use-case examples and stories. "How we
>> are using Beam" or something like that.
>>
>> On Fri, Feb 12, 2016, 19:49 James Malone 
>> wrote:
>>
>> > Hello everyone,
>> >
>> > Now that we have a skeleton website up (horray!) I wanted to raise the
>> idea
>> > of a "Beam Blog." I am thinking this blog is where we can show news, cool
>> > new Beam-things, examples for Apache Beam, and so on. This blog would
>> live
>> > under the project website (http://beam.incubator.apache.org).
>> >
>> > To that end, I would like to poll how the larger community feels about
>> > this. Additionally, if people think it's a good idea, I'd like to start
>> > thinking about content which we would want to feature on the blog.
>> >
>> > I happily volunteer to coordinate, review, and assist with the blog.
>> >
>> > Cheers!
>> >
>> > James
>> >
>>


Re: status update

2016-02-11 Thread Maximilian Michels
Hi Frances,

Thank you for the documents. The structure of the repository looks
good. I wonder if "core" could even be divided further, e.g. in API
and runtime related modules. For the CI, we could checkout Apache's
Jenkins or Travis CI. As for the /develop branch, I would suggest to
make it mandatory to have it in a usable state at all times.

Considering the rework of the Dataflow SDK, it would be great if you
could give some status updates while you're doing that. This could
help us to prepare the runners to any major breaking changes
(particularly thinking about the Flink runner because I've been
working on it).

Best,
Max

On Tue, Feb 9, 2016 at 5:01 PM, Jean-Baptiste Onofré  wrote:
> Hi Frances,
>
> and thanks for the update.
>
> The repository structure looks good to me.
>
> Maybe we can add a section about the PR workflow (PR/review/push). WDYT ?
>
> For the Jira, no problem. I will add some tasks as well related to the
> roadmap (especially the DSLs, new IO, and DataIntegration part).
>
> Thanks !
> Regards
> JB
>
>
> On 02/09/2016 04:46 PM, Frances Perry wrote:
>>
>> Hi Beamers!
>>
>> Here’s the Apache Beam: Technical Vision
>>
>> 
>> document I shared last week with a number of you. (Now we have a dev@ list
>> to share it more widely -- yay!)
>>
>> I just wanted to give you a little visibility into some of the work we’ve
>> been doing within Google over the last week:
>>
>> * Refactoring the DataflowJavaSDK: We’re hard at work separating out the
>> user-facing portions of the DataflowJavaSDK from the Google-specific
>> worker
>> harness. This will ensure that all runners (Cloud Dataflow, Spark, Flink)
>> are on equal footing with clear APIs to implement. Due to the
>> complications
>> that come with doing that while supporting our current users, we won’t be
>> able to push those changes to GitHub for a couple of weeks or so.
>>
>> * Repository structure: As we get ready to start moving different chunks
>> of
>> code into the new repo, we need to figure out the right way to structure
>> it. Here’s a proposal
>>
>> 
>> -- please provide feedback!
>>
>> * Issue tracking: Thanks to JB for getting the Beam JIRA
>>  set up. We were thinking
>> that
>> it makes sense to put in components that match the repository structure
>> (see above). And then we’ll go ahead and start transitioning our internal
>> Google bug tracking into JIRA.
>>
>> Frances
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: PPMC

2016-02-05 Thread Maximilian Michels
Hello Tyler,

Thanks for summing it up for the newly registered. Thanks to everyone
at Google for being so straightforward on this matter.

Best,
Max

On Thu, Feb 4, 2016 at 11:18 PM, Tyler Akidau  wrote:
> Hello Beamers!
>
>
> To summarize a discussion that started while infrastructure was being set
> up:
>
>- Google folks proposed a PPMC composed of a subset of the initial
>committers.
>- Multiple folks pointed out that the rules say the PPMC is to be
>initial committers + mentors.
>- Multiple folks also noted they would be fine with the more limited
>PPMC.
>- Google folks agreed a PPMC of initial committers + mentors is what the
>rules state, so we should go with that.
>
> Please feel free to jump in and correct anything I've misrepresented in my
> summary, or if you feel there's anything further to discuss. Given that
> we're simply going with what is stated in the rules, seems there is no need
> for a vote of any kind.
>
> -Tyler