Re: [PROPOSAL] Test performance of basic Apache Beam operations

2018-09-05 Thread Rafael Fernandez
neat! left a comment or two

On Mon, Sep 3, 2018 at 3:53 AM Łukasz Gajowy  wrote:

> Hi all!
>
> I'm bumping this (in case you missed it). Any feedback and questions are
> welcome!
>
> Best regards,
> Łukasz
>
> pon., 13 sie 2018 o 13:51 Jean-Baptiste Onofré 
> napisał(a):
>
>> Hi Lukasz,
>>
>> Thanks for the update, and the abstract looks promising.
>>
>> Let me take a look on the doc.
>>
>> Regards
>> JB
>>
>> On 13/08/2018 13:24, Łukasz Gajowy wrote:
>> > Hi all,
>> >
>> > since Synthetic Sources API has been introduced in Java and Python SDK,
>> > it can be used to test some basic Apache Beam operations (i.e.
>> > GroupByKey, CoGroupByKey Combine, ParDo and ParDo with SideInput) in
>> > terms of performance. This, in brief, is why we'd like to share the
>> > below proposal:
>> >
>> > _
>> https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_
>> >
>> > Let us know what you think in the document's comments. Thank you in
>> > advance for all the feedback!
>> >
>> > Łukasz
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PROPOSAL] Prepare Beam 2.7.0 release

2018-08-20 Thread Rafael Fernandez
+1, thanks for volunteering, Charles!

On Mon, Aug 20, 2018 at 12:09 PM Charles Chen  wrote:

> Thank you Andrew for pointing out my mistake.  We should follow the
> calendar and aim to cut on 8/29, not 9/7 as I incorrectly wrote earlier.
>
> On Mon, Aug 20, 2018 at 12:02 PM Andrew Pilloud 
> wrote:
>
>> +1 Thanks for volunteering! The calendar I have puts the cut date at
>> August 29th, which looks to be 6 weeks from when 2.6.0 was cut. Do I have
>> the wrong calendar?
>>
>> See:
>> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com=America%2FLos_Angeles
>>
>> Andrew
>>
>> On Mon, Aug 20, 2018 at 11:43 AM Connell O'Callaghan 
>> wrote:
>>
>>> +1 Charles thank you for taking this up and helping us maintain this
>>> schedule.
>>>
>>> On Mon, Aug 20, 2018 at 11:29 AM Charles Chen  wrote:
>>>
 Hey everyone,

 Our release calendar indicates that the process for the 2.7.0 Beam
 release should start on September 7.

 I volunteer to perform this release and propose the following schedule:

- We start triaging issues in JIRA this week.
- I will cut the initial 2.7.0 release branch on September 7.
- After September 7, any blockers will need to be manually
cherry-picked into the release branch.
- After tests pass and blockers are fully addressed, I will move on
and perform other release tasks.

 What do you think?

 Best,
 Charles

>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Discuss] Add EXTERNAL keyword to CREATE TABLE statement

2018-08-13 Thread Rafael Fernandez
Strictly speaking, they are not necessarily tables either. We could also
introduce something like CREATE EXTERNAL DATA SOURCE (a-la T-SQL
),
if it's somehow advantageous for us to leverage access patterns or restrict
DML statements.

I think your idea of CREATE EXTERNAL TABLE is practical :)

On Mon, Aug 13, 2018 at 2:12 PM Rui Wang  wrote:

> Hi Community,
>
> BeamSQL allows CREATE TABLE
>  statements
> to register virtual tables from external storage systems (e.g. BigQuery).
>
> BeamSQL is not a storage system, so any table registered by "CREATE TABLE"
> statement is essentially equivalent to be registered by "CREATE EXTERNAL
> TABLE", which requires the user to provide a LOCATION and BeamSQL will
> register the table outside of current execution environment based on
> LOCATION.
>
> So I propose to add EXTERNAL keyword to "CREATE TABLE" in BeamSQL to help
> users understand they are registering tables, and BeamSQL does not create
> non existing tables by running CREATE TABLE (at least on some storage
> systems, if not all).
>
> We can make the EXTERNAL keyword either required or optional.
>
> If we make the EXTERNAL keyword required:
>
> Pros:
> a. We can get rid of the registering table semantic on CREATE TABLE.
> b, We keep the room that we could add CREATE TABLE back in the future if
> we want CREATE TABLE to create, rather than not only register tables in
> BeamSQL.
>
> Cons:
> 1. CREATE TABLE syntax will not be supported so existing BeamSQL pipelines
> which has CREATE TABLE require changes.
> 2. It's required to type tedious EXTERNAL keyword every time, especially
> in SQL Shell.
>
> If we make the EXTERNAL keyword optional, we will have reversed pros and
> cons above.
>
> Any thoughts on adding EXTERNAL keyword, and make it required or optional?
>
>
> Thanks,
> Rui
>
>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Discussion] Clarify the support story for released Beam versions

2018-08-13 Thread Rafael Fernandez
I think this will great for the project! It's worked well for others (such
as Ubuntu). I like that this remains compatible with our desire to release
every six weeks, while keeping the support/patch load manageable.

Release: +1 single process. This is just a statement of what we commit to
service.

On Mon, Aug 13, 2018 at 12:31 PM Ahmet Altay  wrote:

> I was not proposing any additional changes to the release process. If we
> think that release process could be improved it would make sense to apply
> it to all releases.
>
> On Mon, Aug 13, 2018 at 11:01 AM, Lukasz Cwik  wrote:
>
>> Charles, I would keep the process the same with respect to releasing.
>>
>> On Mon, Aug 13, 2018 at 11:00 AM Charles Chen  wrote:
>>
>>> (sending to the dev@ list thread as this is more relevant here than
>>> users@)
>>>
>>> Will we be using a different / potentially more rigorous process for
>>> releasing LTS releases?  Or do we feel that any validations that could
>>> possibly be done should already be incorporated into each release?
>>>
>>> On Mon, Aug 13, 2018 at 10:57 AM Ahmet Altay  wrote:
>>>
 Update:

 I sent out an email to user@ to collect their feedback [1]. I will
 encourage everyone here to collect feedback from the other channels
 available to you. To facilitate the discussion I drafted my proposal in a
 PR [2].

 Ahmet

 [1]
 https://lists.apache.org/thread.html/7d890d6ed221c722a95d9c773583450767b79ee0c0c78f48a56c7eba@%3Cuser.beam.apache.org%3E
 [2] https://github.com/apache/beam-site/pull/537

 On Fri, Aug 10, 2018 at 5:20 PM, Lukasz Cwik  wrote:

> Thanks, I can see the reasoning for LTS releases based upon some
> enterprise customers needs.
>
> Forgot about the 2.1.1 Python release. Thanks for pointing that out.
>
> On Fri, Aug 10, 2018 at 4:47 PM Ahmet Altay  wrote:
>
>>
>> On Fri, Aug 10, 2018 at 12:33 PM, Lukasz Cwik 
>> wrote:
>>
>>> I like the ideas that your proposing but am wondering what value if
>>> any do supporting LTS releases add? We maintain semantic versioning and 
>>> I
>>> would expect that most users would be using the latest released version 
>>> if
>>> not the release just before that. There is likely a long tail of users 
>>> who
>>> will use a specific version and are unlikely to ever upgrade.
>>>
>>
>> I believe there is a category of enterprise users who would continue
>> to use a specific version as long as they know they can get support for 
>> it.
>> This usually stems from the need to have a stable environment. There is
>> also the aspect of validating new product before using. I know some
>> companies have validation cycles longer than 6 weeks. They will still
>> upgrade but they would like to upgrade much less frequently.
>>
>> I was hoping that defining LTS releases will signal these types of
>> users what releases are worth upgrading to if they have a high cost of
>> upgrading.
>>
>> This comes from my anecdotal evidence and I may be wrong.
>>
>>
>>>
>>> I believe it would be valuable to ask our users what is most
>>> important to them with respect to the policy (after we have discussed 
>>> it a
>>> little bit) as well since ultimately our goal is to help our users.
>>>
>>
>> I agree with this. Since I am referring to enterprise users primarily
>> I think some of it will require the companies here to collect that 
>> feedback.
>>
>>
>>> This could then be documented and we could provide guidance to
>>> customers as to how to reach out to the group for big bugs. Also note 
>>> that
>>> Apache has a security policy[1] in place which we should direct users 
>>> to.
>>>
>>
>> I think document what could be expected of Beam in terms of support
>> would be very valuable by itself. It will also help us figure out what we
>> could drop. For example in the recent discussion to drop old API docs,
>> there was no clear guidance on which SDKs are still supported and should
>> have their API docs hosted.
>>
>> I think we reference to the Apache security policy on our website. If
>> not I agree, we should add a reference to it.
>>
>>
>>>
>>> Also, we don't have any experience in patching a release as we
>>> haven't yet done one patch version bump. All issues that have been 
>>> brought
>>> up were always fixed in the next minor version bump.
>>>
>>
>> I agree. There was the Python 2.1.1 but that is the only example I
>> could remember.
>>
>>
>>>
>>> 1: http://www.apache.org/security/
>>>
>>>
>>>
>>>
>>> On Fri, Aug 10, 2018 at 11:50 AM Pablo Estrada 
>>> wrote:
>>>
 I think this all sounds reasonable, and I think it would be a good
 story for our users. We don't have much 

Re: [HELP] Blog post for 2.6.0 Release

2018-08-13 Thread Rafael Fernandez
Howdy! https://github.com/apache/beam-site/pull/536 should take care of
this.

Cheers,
r


On Mon, Aug 13, 2018 at 9:14 AM Austin Bennett 
wrote:

> Alexey,
>
> I believe seeing the same.
>
> Best,
> Austin
>
> On Mon, Aug 13, 2018 at 8:54 AM, Alexey Romanenko <
> aromanenko@gmail.com> wrote:
>
>> Pablo,
>>
>> Thank you for taking care of it this time!
>>
>> Btw, the page with a list of blog posts [1] looks a bit strange for me -
>> it shows the full text of last Pablo’s post (I believe it should be only a
>> preview with a first paragraph) and every preview of other posts starts
>> with Apache License text which actually should be commented. It looks like
>> html converter substitutes html comments with html entities for unknown
>> reason.
>>
>> I checked this in the 2 browsers and I have the same picture there. Is it
>> only me or everybody see the same issue?
>>
>> [1] https://beam.apache.org/blog/
>>
>> On 11 Aug 2018, at 01:26, Pablo Estrada  wrote:
>>
>> Hello everyone,
>> here's a PR for the blog post:
>> https://github.com/apache/beam-site/pull/533
>> Last call for opinions : )
>>
>> Best
>> -P.
>>
>> On Fri, Aug 10, 2018 at 9:47 AM Pablo Estrada  wrote:
>>
>>> Hello everyone,
>>> a bump here on asking for your contributions to the 2.6.0 release blog
>>> post.
>>> Best
>>> -P.
>>>
>>> On Wed, Aug 8, 2018 at 4:20 PM Pablo Estrada  wrote:
>>>
 Hello all,
 During my work on the release, I missed that we have started doing blog
 posts for every release. I decided to announce the release, and start the
 blog post afterwards - the blog post will only be late for a few days.

 Please add all your release notes and comments in this doc:
 https://docs.google.com/document/d/1Jwz5AxInSm9C6z0TZqer6JYE2gpbJ_dZn88X37VvQEY/edit?usp=sharing

 Thanks everyone for your help and work for this release!

 Best
 -P.
 --
 Got feedback? go/pabloem-feedback
 

>>> --
>>> Got feedback? go/pabloem-feedback
>>> 
>>>
>> --
>> Got feedback? go/pabloem-feedback
>> 
>>
>>
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


[DISCUSS] Communicating in dev@

2018-08-09 Thread Rafael Fernandez
Hi,

I think it's important we all discuss in the community what we want to
do to make sure we communicate effectively. I ask because I've seen
preferences expressed in dev@, and want to make sure we're conscious
about our practices.

I think we all want discussions to be accessible to all members of the
community, and we need to make sure decisions are recorded in the dev@
list. If we are not doing this well, we need to flag this. I hope this
is thread allows us to do so.

Some questions I have:

- Are we sharing docs that require installing software or forcing
creation of accounts? I don't think this is happening, but let's make
sure this is not the case.
- Are we having technical discussions and collaborating in tools such
as Google Docs without circling back to record decisions in the dev
list? If so, let' try our best to circle back to the dev list.
- Are we sharing trivially short information in doc form when an email
would suffice? If so, we can try our best to avoid that. Save a Beamer
a click and a new tab! :)

Thoughts?
r


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [VOTE] Community Examples Repository

2018-08-09 Thread Rafael Fernandez
Here is the Rose', David's, and Gris' proposal in text form, I hope
the copy/paste helps:


Apache Beam Examples Repository

Authors: Rose Nguyen (rtngu...@google.com), David Cavazos
(dcava...@google.com), Gris Cuevas (g...@apache.org)

Status: Proposal
Created: 2018-07-30
Updated: 2018-07-30

Summary

The Apache Beam Community creates and contributes examples to the core
Apache Beam Github repository. We want to make the process easier and
less dependent in the core repository by creating a separate repo,
dedicated solely to Community examples, contribution guidelines and
add the examples to the website.

Background

The original batch of examples on the Apache Beam GitHub repository
was donated by Cloud Dataflow at the time of Java SDK 1.x to
demonstrate the capability of this programming model. These initial
examples were intended to demonstrate how a user can put together
their code components and try out Beam. Since then, there have been
numerous updates, increased Python parity, and new features that do
not have accompanying examples employing best practices and
demonstrating an end-to-end experience for new users. We would like to
leverage the existing examples by raising their visibility and
auditing them. This is also an opportunity to establish
contribution/maintenance guidelines for community contributions and to
start hosting the examples on the Beam site in an official repository.
Attracting and retaining new users necessitates updated, concrete
examples that exhibit the range of capabilities of Beam.

Proposed Tasks

We would like to create a new GitHub Repository under the Apache
Software Foundation Org page for Apache Beam Community Examples. This
repo would be similar to apache/beam-site. The name we’d like to have
is apache/beam-examples. We will also move all current examples to
this repo, perform an audit to outline best practices and guidelines
and then publish them in the Apache Beam website.


Here is an outlined list of tasks we propose:

Send Apache Beam Example Repository proposal to the mailing list
(David) - July 31

Create the GitHub Repo (PMC would need to do this)  - Request help
after proposal is refined/accepted

Move current examples to new repo (David) -- 2 weeks after item 2 is completed

Add a note to let people we need to audit for best practices

Audit current examples and define best practices (David, Rose, Gris)
-- Target date: week of 8/20

Write guidelines on adding new examples and maintaining them (Gris,
Rose) -- Week after audit is completed

Add examples to website (Rose) -- 1 week after guidelines are written

Publish guidelines in website (Rose) -- 1 week after guidelines are written



On Thu, Aug 9, 2018 at 6:22 AM Łukasz Gajowy  wrote:
>
> I'd also vote for 3: I don't see much added value in separating the repos and 
> I see much additional effort to be done in maintaining extra repo(s) 
> (updating examples when new version of beam sdk comes out) and their 
> infrastructure (jenkins, etc). What Lukasz Cwik said about mvn archetypes and 
> how easy the examples can be to get starter examples from a common repo only 
> strengthens my opinion.
>
> Regarding 2: I think it's not good to have some official examples here and 
> some there - IMO it can make a false impression (user experience) that some 
> examples are less important than the others. Maybe a good idea is to 
> encourage users to share their (independent, non official) examples and 
> create a list of such on the beam site instead of 2?
>
> Łukasz
>
> czw., 9 sie 2018 o 11:35 Alexey Romanenko  
> napisał(a):
>>
>> 3 - I agree with JB, Charles and Lukasz arguments above saying why we need 
>> to have examples and main code in the same repository (+ website code base 
>> will move there soon). I don’t see any huge benefits to have examples aside 
>> and, at the same time, it will bring additional complexity and burden for 
>> project support.
>>
>> On 9 Aug 2018, at 08:18, Jean-Baptiste Onofré  wrote:
>>
>> Hi guys,
>>
>> For this kind of discussion, I would prefer to avoid Google Doc and
>> directly put the point/proposal on the mailing list.
>>
>> It's easier for the community to follow.
>>
>> The statement is more for 3 because it's more convenient for users to
>> easily find the examples and include in the distribution.
>>
>> Regards
>> JB
>>
>> On 08/08/2018 23:25, Charles Chen wrote:
>>
>> It looks like the main claim is that 1 and 2 have the benefit of
>> increasing visibility for examples on the Beam site.  I agree with
>> Robert's comments on the doc which claim that this is orthogonal to
>> whether a separate repository is created (the comments are unresolved:
>> https://docs.google.com/a/google.com/document/d/1vhcKJlP0qH1C7NZPDjohT2PUbOD-k71avv1CjEYapdw/edit?disco=BzifZxY).
>>
>> I would add that the maintenance and testing burden has not been
>> adequately addressed in the proposal (i.e. are we creating new Jenkins
>> jobs?; will postcommits on the main Beam repo run examples 

Re: [Vote] Dev wiki engine

2018-07-19 Thread Rafael Fernandez
-1 .md! :-)

On Thu, Jul 19, 2018 at 6:38 PM Gaurav Thakur  wrote:

> +1 for confluence
>
> On Fri, Jul 20, 2018 at 11:42 AM Thomas Weise  wrote:
>
>> +1 for Confluence
>>
>>
>> On Thu, Jul 19, 2018 at 4:00 PM Kai Jiang  wrote:
>>
>>> +1 Apache Confluence
>>>
>>> On Thu, Jul 19, 2018, 15:18 Lukasz Cwik  wrote:
>>>
 +1 for confluence.

 On Thu, Jul 19, 2018 at 3:17 PM Anton Kedin  wrote:

> +1 for Confluence
>
>
> On Thu, Jul 19, 2018 at 2:56 PM Andrew Pilloud 
> wrote:
>
>> +1 Apache Confluence
>>
>> Because .md files in code repo require code review and commit.
>>
>> On Thu, Jul 19, 2018, 2:22 PM Mikhail Gryzykhin 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> There is a long lasting discussion on starting Beam Dev Wiki
>>> 
>>> ongoing. Seems that the only question remaining is to decide on what 
>>> engine
>>> to use for wiki. So far it seems that we have two suggestions: 
>>> confluence
>>> and .md files in repo.
>>>
>>> Quick summary can also be found in following doc
>>> 
>>> .
>>>
>>> I suggest to vote on which approach to use:
>>> 1. Apache Confluence
>>> 2. .md files in code repository (Those can be rendered by Github)
>>>
>>> --Mikhail
>>>
>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: CODEOWNERS for apache/beam repo

2018-07-10 Thread Rafael Fernandez
+1!

On Tue, Jul 10, 2018 at 8:51 AM Robert Burke  wrote:

> +1
> If non-committers are welcome in the file, I'm happy to assist Henning
> with Go SDK reviews.(@lostluck)
>
>
> On Tue, Jul 10, 2018, 8:47 AM Alexey Romanenko 
> wrote:
>
>> +1,
>> Udi, thank you for taking care of this!
>> I added myself as a reviewer of some IO components.
>>
>>
>> On 10 Jul 2018, at 17:00, Henning Rohde  wrote:
>>
>> +1. Sounds like a useful improvement.
>>
>> Udi -- do the reviewers in this file need to be committers for the PR
>> auto-assignment to work?
>>
>> On Tue, Jul 10, 2018 at 1:59 AM Łukasz Gajowy 
>> wrote:
>>
>>> +1. It will certainly be useful. I added myself (and a fellow
>>> contributor) to some components (IO testing related mostly).
>>>
>>> Thanks,
>>> Łukasz
>>>
>>> wt., 10 lip 2018 o 02:06 Udi Meiri  napisał(a):
>>>
 Hi everyone,

 I'm proposing to add auto-reviewer-assignment using Github's CODEOWNERS
 mechanism.
 Initial version is here: *https://github.com/apache/beam/pull/5909/files
 *

 I need help from the community in determining owners for each component.
 Feel free to directly edit the PR (if you have permission) or add a
 comment.


 Background
 The idea is to:
 1. Document good review candidates for each component.
 2. Help choose reviewers using the auto-assignment mechanism. The
 suggestion is in no way binding.



>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Parallelizing test runs

2018-07-03 Thread Rafael Fernandez
Summary for all folks following this story -- and many thanks for
explaining configs to me and pointing me to files and such.

- Scott made changes to the config and we can now run 3
ValidatesRunner.Dataflow in parallel (each run is about 2 hours)
- With the latest quota changes, we peaked at ~70% capacity in concurrent
Dataflow jobs when running those
- I've been keeping an eye on quota peaks for all resources today and have
not seen any worryisome limits overall.
- Also note there are improvements planned to the ValidatesRunner.Dataflow
test so various items get batched and the test itself runs faster -- I
believe it's on Alan's radar

Cheers,
r

On Mon, Jul 2, 2018 at 4:23 PM Rafael Fernandez  wrote:

> Done!
>
> On Mon, Jul 2, 2018 at 4:10 PM Scott Wegner  wrote:
>
>> Hey Rafael, looks like we need more 'INSTANCE_TEMPLATES' quota [1]. Can
>> you take a look? I've filed [BEAM-4722]:
>> https://issues.apache.org/jira/browse/BEAM-4722
>>
>> [1] https://github.com/apache/beam/pull/5861#issuecomment-401963630
>>
>> On Mon, Jul 2, 2018 at 11:33 AM Rafael Fernandez 
>> wrote:
>>
>>> OK, Scott just sent https://github.com/apache/beam/pull/5860 . Quotas
>>> should not be a problem, if they are, please file a JIRA under gcp-quota.
>>>
>>> Cheers,
>>> r
>>>
>>> On Mon, Jul 2, 2018 at 10:06 AM Kenneth Knowles  wrote:
>>>
>>>> One thing that is nice when you do this is to be able to share your
>>>> results. Though if all you are sharing is "they passed" then I guess we
>>>> don't have to insist on evidence.
>>>>
>>>> Kenn
>>>>
>>>> On Mon, Jul 2, 2018 at 9:25 AM Scott Wegner  wrote:
>>>>
>>>>> A few thoughts:
>>>>>
>>>>> * The Jenkins job getting backed up
>>>>> is beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR [1]. Since
>>>>> Mikhail refactored Jenkins jobs, this only runs when explicitly requested
>>>>> via "Run Dataflow ValidatesRunner", and only has 8 total runs. So this job
>>>>> is idle more often than backlogged.
>>>>>
>>>>> * It's difficult to reason about our exact quota needs because
>>>>> Dataflow jobs get launched from various Jenkins jobs that have different
>>>>> parallelism configurations. If we have budget, we could enable concurrent
>>>>> execution of this job and increase our quota enough to give some breathing
>>>>> room. If we do this, I recommend limiting the max concurrency via
>>>>> throttleConcurrentBuilds [2] to some reasonable limit.
>>>>>
>>>>> * This test suite is meant to be an exhaustive post-commit validation
>>>>> of Dataflow runner, and tests a lot of different aspects of a runner. It
>>>>> would be more efficient to run locally only the tests affected by your
>>>>> change. Note that this requires having access to a GCP project with
>>>>> billing, but most Dataflow developers probably have access to this 
>>>>> already.
>>>>> The command for this is:
>>>>>
>>>>> ./gradlew :beam-runners-google-cloud-dataflow-java:validatesRunner
>>>>> -PdataflowProject=myGcpProject -PdataflowTempRoot=gs://myGcsTempRoot
>>>>> --tests "org.apache.beam.MyTestClass"
>>>>>
>>>>> [1]
>>>>> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR/buildTimeTrend
>>>>> [2]
>>>>> https://jenkinsci.github.io/job-dsl-plugin/#method/javaposse.jobdsl.dsl.jobs.FreeStyleJob.throttleConcurrentBuilds
>>>>>
>>>>>
>>>>> On Mon, Jul 2, 2018 at 8:33 AM Lukasz Cwik  wrote:
>>>>>
>>>>>> The validates runner test parallelism is controlled here and is
>>>>>> currently set to be "unlimited":
>>>>>>
>>>>>> https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/runners/google-cloud-dataflow-java/build.gradle#L115
>>>>>>
>>>>>> Each test fork is run on a different gradle worker, so the number of
>>>>>> parallel test runs is limited to the max number of workers configured 
>>>>>> which
>>>>>> is controlled here:
>>>>>>
>>>>>> https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow.groovy#L50
>&

Re: Parallelizing test runs

2018-07-02 Thread Rafael Fernandez
Done!

On Mon, Jul 2, 2018 at 4:10 PM Scott Wegner  wrote:

> Hey Rafael, looks like we need more 'INSTANCE_TEMPLATES' quota [1]. Can
> you take a look? I've filed [BEAM-4722]:
> https://issues.apache.org/jira/browse/BEAM-4722
>
> [1] https://github.com/apache/beam/pull/5861#issuecomment-401963630
>
> On Mon, Jul 2, 2018 at 11:33 AM Rafael Fernandez 
> wrote:
>
>> OK, Scott just sent https://github.com/apache/beam/pull/5860 . Quotas
>> should not be a problem, if they are, please file a JIRA under gcp-quota.
>>
>> Cheers,
>> r
>>
>> On Mon, Jul 2, 2018 at 10:06 AM Kenneth Knowles  wrote:
>>
>>> One thing that is nice when you do this is to be able to share your
>>> results. Though if all you are sharing is "they passed" then I guess we
>>> don't have to insist on evidence.
>>>
>>> Kenn
>>>
>>> On Mon, Jul 2, 2018 at 9:25 AM Scott Wegner  wrote:
>>>
>>>> A few thoughts:
>>>>
>>>> * The Jenkins job getting backed up
>>>> is beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR [1]. Since
>>>> Mikhail refactored Jenkins jobs, this only runs when explicitly requested
>>>> via "Run Dataflow ValidatesRunner", and only has 8 total runs. So this job
>>>> is idle more often than backlogged.
>>>>
>>>> * It's difficult to reason about our exact quota needs because Dataflow
>>>> jobs get launched from various Jenkins jobs that have different parallelism
>>>> configurations. If we have budget, we could enable concurrent execution of
>>>> this job and increase our quota enough to give some breathing room. If we
>>>> do this, I recommend limiting the max concurrency via
>>>> throttleConcurrentBuilds [2] to some reasonable limit.
>>>>
>>>> * This test suite is meant to be an exhaustive post-commit validation
>>>> of Dataflow runner, and tests a lot of different aspects of a runner. It
>>>> would be more efficient to run locally only the tests affected by your
>>>> change. Note that this requires having access to a GCP project with
>>>> billing, but most Dataflow developers probably have access to this already.
>>>> The command for this is:
>>>>
>>>> ./gradlew :beam-runners-google-cloud-dataflow-java:validatesRunner
>>>> -PdataflowProject=myGcpProject -PdataflowTempRoot=gs://myGcsTempRoot
>>>> --tests "org.apache.beam.MyTestClass"
>>>>
>>>> [1]
>>>> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR/buildTimeTrend
>>>> [2]
>>>> https://jenkinsci.github.io/job-dsl-plugin/#method/javaposse.jobdsl.dsl.jobs.FreeStyleJob.throttleConcurrentBuilds
>>>>
>>>>
>>>> On Mon, Jul 2, 2018 at 8:33 AM Lukasz Cwik  wrote:
>>>>
>>>>> The validates runner test parallelism is controlled here and is
>>>>> currently set to be "unlimited":
>>>>>
>>>>> https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/runners/google-cloud-dataflow-java/build.gradle#L115
>>>>>
>>>>> Each test fork is run on a different gradle worker, so the number of
>>>>> parallel test runs is limited to the max number of workers configured 
>>>>> which
>>>>> is controlled here:
>>>>>
>>>>> https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow.groovy#L50
>>>>> It is currently configured to 3 * number of CPU cores.
>>>>>
>>>>> We are already running up to 48 Dataflow jobs in parallel.
>>>>>
>>>>>
>>>>> On Sat, Jun 30, 2018 at 9:51 AM Rafael Fernandez 
>>>>> wrote:
>>>>>
>>>>>> - How many resources to ValidatesRunner tests use?
>>>>>> - Where are those settings?
>>>>>>
>>>>>> On Sat, Jun 30, 2018 at 9:50 AM Reuven Lax  wrote:
>>>>>>
>>>>>>> The specific issue only affects Dataflow ValidatesRunner tests. We
>>>>>>> currently allow only one of these to run at a time, to control usage of
>>>>>>> Dataflow and of GCE quota. Other types of tests do not suffer from this
>>>>>>> issue.
>>>>>>>
>>>>>>> I would like to see if it's possible to increase Dataflow q

Re: Parallelizing test runs

2018-07-02 Thread Rafael Fernandez
OK, Scott just sent https://github.com/apache/beam/pull/5860 . Quotas
should not be a problem, if they are, please file a JIRA under gcp-quota.

Cheers,
r

On Mon, Jul 2, 2018 at 10:06 AM Kenneth Knowles  wrote:

> One thing that is nice when you do this is to be able to share your
> results. Though if all you are sharing is "they passed" then I guess we
> don't have to insist on evidence.
>
> Kenn
>
> On Mon, Jul 2, 2018 at 9:25 AM Scott Wegner  wrote:
>
>> A few thoughts:
>>
>> * The Jenkins job getting backed up
>> is beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR [1]. Since
>> Mikhail refactored Jenkins jobs, this only runs when explicitly requested
>> via "Run Dataflow ValidatesRunner", and only has 8 total runs. So this job
>> is idle more often than backlogged.
>>
>> * It's difficult to reason about our exact quota needs because Dataflow
>> jobs get launched from various Jenkins jobs that have different parallelism
>> configurations. If we have budget, we could enable concurrent execution of
>> this job and increase our quota enough to give some breathing room. If we
>> do this, I recommend limiting the max concurrency via
>> throttleConcurrentBuilds [2] to some reasonable limit.
>>
>> * This test suite is meant to be an exhaustive post-commit validation of
>> Dataflow runner, and tests a lot of different aspects of a runner. It would
>> be more efficient to run locally only the tests affected by your change.
>> Note that this requires having access to a GCP project with billing, but
>> most Dataflow developers probably have access to this already. The command
>> for this is:
>>
>> ./gradlew :beam-runners-google-cloud-dataflow-java:validatesRunner
>> -PdataflowProject=myGcpProject -PdataflowTempRoot=gs://myGcsTempRoot
>> --tests "org.apache.beam.MyTestClass"
>>
>> [1]
>> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR/buildTimeTrend
>> [2]
>> https://jenkinsci.github.io/job-dsl-plugin/#method/javaposse.jobdsl.dsl.jobs.FreeStyleJob.throttleConcurrentBuilds
>>
>>
>> On Mon, Jul 2, 2018 at 8:33 AM Lukasz Cwik  wrote:
>>
>>> The validates runner test parallelism is controlled here and is
>>> currently set to be "unlimited":
>>>
>>> https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/runners/google-cloud-dataflow-java/build.gradle#L115
>>>
>>> Each test fork is run on a different gradle worker, so the number of
>>> parallel test runs is limited to the max number of workers configured which
>>> is controlled here:
>>>
>>> https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow.groovy#L50
>>> It is currently configured to 3 * number of CPU cores.
>>>
>>> We are already running up to 48 Dataflow jobs in parallel.
>>>
>>>
>>> On Sat, Jun 30, 2018 at 9:51 AM Rafael Fernandez 
>>> wrote:
>>>
>>>> - How many resources to ValidatesRunner tests use?
>>>> - Where are those settings?
>>>>
>>>> On Sat, Jun 30, 2018 at 9:50 AM Reuven Lax  wrote:
>>>>
>>>>> The specific issue only affects Dataflow ValidatesRunner tests. We
>>>>> currently allow only one of these to run at a time, to control usage of
>>>>> Dataflow and of GCE quota. Other types of tests do not suffer from this
>>>>> issue.
>>>>>
>>>>> I would like to see if it's possible to increase Dataflow quota so we
>>>>> can run more of these in parallel. It took me 8 hours end to end to run
>>>>> these tests (about 6 hours for the run to be scheduled). If there was a
>>>>> failure, I would have had to repeat the whole process. In the worst case,
>>>>> this process could have taken me days. While this is not as pressing as
>>>>> some other issues (as most people don't need to run the Dataflow tests on
>>>>> every PR), fixing it would make such changes much easier to manage.
>>>>>
>>>>> Reuven
>>>>>
>>>>> On Sat, Jun 30, 2018 at 9:32 AM Rafael Fernandez 
>>>>> wrote:
>>>>>
>>>>>> +Reuven Lax  told me yesterday that he was waiting
>>>>>> for some test to be scheduled and run, and it took 6 hours or so. I would
>>>>>> like to help reduce these wait times by increasing parallelism. I need 
>>>>>> help
>>>>>> understanding the continuous minimum of what we use. It seems the 
>>>>>> following
>>>>>> is true:
>>>>>>
>>>>>>
>>>>>>- There seems to always be 16 jenkins machines on (16 CPUs each)
>>>>>>- There seems to be three GKE machines always on (1 CPU each)
>>>>>>- Most (if not all) unit tests run on 1 machine, and seem to run
>>>>>>one-at-a-time <-- I think we can safely parallelize this to 20.
>>>>>>
>>>>>> With current quotas, if we parallelize to 20 concurrent unit tests,
>>>>>> we still have room for 80 other concurrent dataflow jobs to execute, with
>>>>>> 75% of CPU capacity.
>>>>>>
>>>>>> Thoughts? Additional data?
>>>>>>
>>>>>> Thanks,
>>>>>> r
>>>>>>
>>>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Parallelizing test runs

2018-06-30 Thread Rafael Fernandez
- How many resources to ValidatesRunner tests use?
- Where are those settings?

On Sat, Jun 30, 2018 at 9:50 AM Reuven Lax  wrote:

> The specific issue only affects Dataflow ValidatesRunner tests. We
> currently allow only one of these to run at a time, to control usage of
> Dataflow and of GCE quota. Other types of tests do not suffer from this
> issue.
>
> I would like to see if it's possible to increase Dataflow quota so we can
> run more of these in parallel. It took me 8 hours end to end to run these
> tests (about 6 hours for the run to be scheduled). If there was a failure,
> I would have had to repeat the whole process. In the worst case, this
> process could have taken me days. While this is not as pressing as some
> other issues (as most people don't need to run the Dataflow tests on every
> PR), fixing it would make such changes much easier to manage.
>
> Reuven
>
> On Sat, Jun 30, 2018 at 9:32 AM Rafael Fernandez 
> wrote:
>
>> +Reuven Lax  told me yesterday that he was waiting for
>> some test to be scheduled and run, and it took 6 hours or so. I would like
>> to help reduce these wait times by increasing parallelism. I need help
>> understanding the continuous minimum of what we use. It seems the following
>> is true:
>>
>>
>>- There seems to always be 16 jenkins machines on (16 CPUs each)
>>- There seems to be three GKE machines always on (1 CPU each)
>>- Most (if not all) unit tests run on 1 machine, and seem to run
>>one-at-a-time <-- I think we can safely parallelize this to 20.
>>
>> With current quotas, if we parallelize to 20 concurrent unit tests, we
>> still have room for 80 other concurrent dataflow jobs to execute, with 75%
>> of CPU capacity.
>>
>> Thoughts? Additional data?
>>
>> Thanks,
>> r
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Parallelizing test runs

2018-06-30 Thread Rafael Fernandez
+Reuven Lax  told me yesterday that he was waiting for
some test to be scheduled and run, and it took 6 hours or so. I would like
to help reduce these wait times by increasing parallelism. I need help
understanding the continuous minimum of what we use. It seems the following
is true:


   - There seems to always be 16 jenkins machines on (16 CPUs each)
   - There seems to be three GKE machines always on (1 CPU each)
   - Most (if not all) unit tests run on 1 machine, and seem to run
   one-at-a-time <-- I think we can safely parallelize this to 20.

With current quotas, if we parallelize to 20 concurrent unit tests, we
still have room for 80 other concurrent dataflow jobs to execute, with 75%
of CPU capacity.

Thoughts? Additional data?

Thanks,
r


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [DISCUSS] Automation for Java code formatting

2018-06-27 Thread Rafael Fernandez
On Wed, Jun 27, 2018 at 9:31 AM Kenneth Knowles  wrote:

> Luke: the proposal here solves exactly what you are talking about.
>
> The problem you describe happens when the PR author uses autoformat but
> the baseline is not already autoformatted. What I am proposing is to make
> sure the baseline is already autoformatted, so PRs never have extraneous
> formatting changes.
>
> Rafael: the default setting on GitHub is "allow edits by maintainers" so
> actually a committer can run spotless on behalf of a contributor and push
> the fixup. I have done this. It also lets a committer fix up a good PR
> and merge it even if the contributor is, say, asleep.
>

​This is a great practice, review the technical part, use the tool to
address the mechanical part. ​



>
> Kenn
>
> On Wed, Jun 27, 2018 at 9:24 AM Rafael Fernandez 
> wrote:
>
>> Luke: Anything that helps contributors and reviewers work better together
>> - +1! :D
>>
>>
>>
>> On Wed, Jun 27, 2018 at 9:04 AM Lukasz Cwik  wrote:
>>
>>> If spotless is run against a PR that is already well formatted its a
>>> non-issue as the formatting changes are usually related to the change but I
>>> have reviewed a few PRs that have 100s of lines of formatting change which
>>> really obfuscates the work.
>>> Instead of asking contributors to run spotless, can we have a cron job
>>> run it across the project like once a day/week/... and cut a PR?
>>>
>>> On Wed, Jun 27, 2018 at 8:07 AM Kenneth Knowles  wrote:
>>>
>>>> Good points, Dan. Checkstyle will still run, but just focused on the
>>>> things that go beyond format.
>>>>
>>>> Kenn
>>>>
>>>> On Wed, Jun 27, 2018 at 8:03 AM Etienne Chauchot 
>>>> wrote:
>>>>
>>>>> +1 !
>>>>> It's my custom to avoid reformatting to spare meaningless diff burden
>>>>> to the reviewer. Now it will be over, thanks.
>>>>>
>>>>> Etienne
>>>>>
>>>>> Le mardi 26 juin 2018 à 21:15 -0700, Kenneth Knowles a écrit :
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I like readable code, but I don't like formatting it myself. And I
>>>>> _really_ don't like discussing in code review. "Spotless" [1] can enforce 
>>>>> -
>>>>> and automatically apply - automatic formatting for Java, Groovy, and some
>>>>> others.
>>>>>
>>>>> This is not about style or wanting a particular layout. This is about
>>>>> automation, contributor experience, and streamlining review
>>>>>
>>>>>  - Contributor experience: MUCH better than checkstyle: error message
>>>>> just says "run ./gradlew :beam-your-module:spotlessApply" instead of
>>>>> telling them to go in and manually edit.
>>>>>
>>>>>  - Automation: You want to use autoformat so you don't have to format
>>>>> code by hand. But if you autoformat a file that was in some other format,
>>>>> then you touch a bunch of unrelated lines. If the file is already
>>>>> autoformatted, it is much better.
>>>>>
>>>>>  - Review: Never talk about code formatting ever again. A PR also
>>>>> needs baseline to already be autoformatted or formatting will make it
>>>>> unclear which lines are really changed.
>>>>>
>>>>> This is already available via applyJavaNature(enableSpotless: true)
>>>>> and it is turned on for SQL and our buildSrc gradle plugins. It is very
>>>>> nice. There is a JIRA [2] to turn it on for the hold code base. 
>>>>> Personally,
>>>>> I think (a) every module could make a different choice if the main
>>>>> contributors feel strongly and (b) it is objectively better to always
>>>>> autoformat :-)
>>>>>
>>>>> WDYT? If we do it, it is trivial to add it module-at-a-time or
>>>>> globally. If someone conflicts with a massive autoformat commit, they can
>>>>> just keep their changes and autoformat them and it is done.
>>>>>
>>>>> Kenn
>>>>>
>>>>> [1] https://github.com/diffplug/spotless/tree/master/plugin-gradle
>>>>> [2] https://issues.apache.org/jira/browse/BEAM-4394
>>>>>
>>>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Jenkins Groovy files naming convention

2018-06-27 Thread Rafael Fernandez
+1 CamelCase. Feels "normal" to me in Groovy.

On Wed, Jun 27, 2018 at 9:18 AM Lukasz Cwik  wrote:

> I don't really have a strong preference.
>
> On Wed, Jun 27, 2018 at 9:13 AM Łukasz Gajowy 
> wrote:
>
>> Hi all,
>>
>> I think we should change the naming convention that we have in
>> jenkins .groovy files. AFAIK, groovy is CamelCase, and we use snake_case
>> names there. I suppose this is because we wanted to reflect jenkins job
>> names (do we need this?)
>>
>> IMO, the convention should be CamelCase for all .groovy files (both
>> actual job files and helper class files).
>>
>> WDYT?
>>
>> Łukasz
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [DISCUSS] Automation for Java code formatting

2018-06-27 Thread Rafael Fernandez
Luke: Anything that helps contributors and reviewers work better together -
+1! :D



On Wed, Jun 27, 2018 at 9:04 AM Lukasz Cwik  wrote:

> If spotless is run against a PR that is already well formatted its a
> non-issue as the formatting changes are usually related to the change but I
> have reviewed a few PRs that have 100s of lines of formatting change which
> really obfuscates the work.
> Instead of asking contributors to run spotless, can we have a cron job run
> it across the project like once a day/week/... and cut a PR?
>
> On Wed, Jun 27, 2018 at 8:07 AM Kenneth Knowles  wrote:
>
>> Good points, Dan. Checkstyle will still run, but just focused on the
>> things that go beyond format.
>>
>> Kenn
>>
>> On Wed, Jun 27, 2018 at 8:03 AM Etienne Chauchot 
>> wrote:
>>
>>> +1 !
>>> It's my custom to avoid reformatting to spare meaningless diff burden to
>>> the reviewer. Now it will be over, thanks.
>>>
>>> Etienne
>>>
>>> Le mardi 26 juin 2018 à 21:15 -0700, Kenneth Knowles a écrit :
>>>
>>> Hi all,
>>>
>>> I like readable code, but I don't like formatting it myself. And I
>>> _really_ don't like discussing in code review. "Spotless" [1] can enforce -
>>> and automatically apply - automatic formatting for Java, Groovy, and some
>>> others.
>>>
>>> This is not about style or wanting a particular layout. This is about
>>> automation, contributor experience, and streamlining review
>>>
>>>  - Contributor experience: MUCH better than checkstyle: error message
>>> just says "run ./gradlew :beam-your-module:spotlessApply" instead of
>>> telling them to go in and manually edit.
>>>
>>>  - Automation: You want to use autoformat so you don't have to format
>>> code by hand. But if you autoformat a file that was in some other format,
>>> then you touch a bunch of unrelated lines. If the file is already
>>> autoformatted, it is much better.
>>>
>>>  - Review: Never talk about code formatting ever again. A PR also needs
>>> baseline to already be autoformatted or formatting will make it unclear
>>> which lines are really changed.
>>>
>>> This is already available via applyJavaNature(enableSpotless: true) and
>>> it is turned on for SQL and our buildSrc gradle plugins. It is very nice.
>>> There is a JIRA [2] to turn it on for the hold code base. Personally, I
>>> think (a) every module could make a different choice if the main
>>> contributors feel strongly and (b) it is objectively better to always
>>> autoformat :-)
>>>
>>> WDYT? If we do it, it is trivial to add it module-at-a-time or globally.
>>> If someone conflicts with a massive autoformat commit, they can just keep
>>> their changes and autoformat them and it is done.
>>>
>>> Kenn
>>>
>>> [1] https://github.com/diffplug/spotless/tree/master/plugin-gradle
>>> [2] https://issues.apache.org/jira/browse/BEAM-4394
>>>
>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [DISCUSS] Automation for Java code formatting

2018-06-26 Thread Rafael Fernandez
+1! Remove guesswork :D



On Tue, Jun 26, 2018 at 9:15 PM Kenneth Knowles  wrote:

> Hi all,
>
> I like readable code, but I don't like formatting it myself. And I
> _really_ don't like discussing in code review. "Spotless" [1] can enforce -
> and automatically apply - automatic formatting for Java, Groovy, and some
> others.
>
> This is not about style or wanting a particular layout. This is about
> automation, contributor experience, and streamlining review
>
>  - Contributor experience: MUCH better than checkstyle: error message just
> says "run ./gradlew :beam-your-module:spotlessApply" instead of telling
> them to go in and manually edit.
>
>  - Automation: You want to use autoformat so you don't have to format code
> by hand. But if you autoformat a file that was in some other format, then
> you touch a bunch of unrelated lines. If the file is already autoformatted,
> it is much better.
>
>  - Review: Never talk about code formatting ever again. A PR also needs
> baseline to already be autoformatted or formatting will make it unclear
> which lines are really changed.
>
> This is already available via applyJavaNature(enableSpotless: true) and it
> is turned on for SQL and our buildSrc gradle plugins. It is very nice.
> There is a JIRA [2] to turn it on for the hold code base. Personally, I
> think (a) every module could make a different choice if the main
> contributors feel strongly and (b) it is objectively better to always
> autoformat :-)
>
> WDYT? If we do it, it is trivial to add it module-at-a-time or globally.
> If someone conflicts with a massive autoformat commit, they can just keep
> their changes and autoformat them and it is done.
>
> Kenn
>
> [1] https://github.com/diffplug/spotless/tree/master/plugin-gradle
> [2] https://issues.apache.org/jira/browse/BEAM-4394
>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Auto closing stale PRs label

2018-06-26 Thread Rafael Fernandez
Neat! Thanks for showing me where the options are.

On Tue, Jun 26, 2018 at 7:24 PM Kenneth Knowles  wrote:

> That's actually already how it works. We can configure how long it waits
> after the message. Currently it is set for 60 day to stale and then 7 days
> to close. You can see the options we've set up here; there may be more:
> https://github.com/apache/beam/blob/master/.github/stale.yml
>
> Kenn
>
> On Tue, Jun 26, 2018 at 6:42 PM Rafael Fernandez 
> wrote:
>
>> The new label makes sense to me, but Ismael: I want to make sure your
>> concern is fully addressed. I see your point about making sure we are not
>> shutting the door on a small fix that perhaps went unatended for benign
>> reasons. Perhaps a step before closure is feasble? something like getting a
>> nice message in the PR, "Ahoy! This PR hasn't moved in [X time]. If you're
>> still working on it, can you comment? Otherwise, our highly sophisticated
>> AI will declutter and close it in [Y days]".
>>
>> Thoughts?
>>
>>
>> On Mon, Jun 25, 2018 at 8:23 AM Kenneth Knowles  wrote:
>>
>>> Totally agree.
>>>
>>> By the way, these seem to be default labels for issue tracking. So I got
>>> rid of the ones that don't seem to make sense. Any committer can hack them
>>> I think. I just left "stale" for this purpose and "help wanted" since that
>>> makes sense on a PR. But probably we don't need any since we don't have a
>>> plan for them.
>>>
>>> Kenn
>>>
>>> On Mon, Jun 25, 2018 at 8:12 AM Ismaël Mejía  wrote:
>>>
>>>> Thanks Kenn, much better.
>>>>
>>>> Yes closing stale PRs is worth, but our ultimate goal should be to get
>>>> contributions in so we should keep in mind and try when it is worth to
>>>> rescue fixes that can be lost  because of minor review issues or
>>>> contributor inactivity.
>>>>
>>>> On Mon, Jun 25, 2018 at 4:23 PM Kenneth Knowles  wrote:
>>>>
>>>>> It is configured by just a file so alteration is very transparent. I
>>>>> agree with your point about the label. I made a new one for it. Here:
>>>>> https://github.com/apache/beam/pull/5750
>>>>>
>>>>> So far I have been satisfied that it close many _very_ stale PRs. I
>>>>> have been watching it and didn't see any that seemed wrong.
>>>>>
>>>>> Kenn
>>>>>
>>>>>
>>>>> On Mon, Jun 25, 2018 at 12:52 AM Ismaël Mejía 
>>>>> wrote:
>>>>>
>>>>>> I saw some PRs auto closed recently and I was wondering if we could
>>>>>> adjust the  label that is added to the autoclosed PRs, currently it is
>>>>>> 'wontfix' but this label sends a fake (and negative) message. Can we
>>>>>> parametrize the bot to put something closer to the intention like
>>>>>> 'autoclosed'?
>>>>>>
>>>>>> Who can take care of this?
>>>>>> Any other opinion/suggestion after these first days of the stale bot?
>>>>>>
>>>>>> I have the impression that the time between the staleness warning and
>>>>>> the close is relatively short, of course PRs can be reopened but we
>>>>>> (committers) should pay attention that a PR that is marked as stale is
>>>>>> not stale because of unfinished reviews.
>>>>>>
>>>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Auto closing stale PRs label

2018-06-26 Thread Rafael Fernandez
The new label makes sense to me, but Ismael: I want to make sure your
concern is fully addressed. I see your point about making sure we are not
shutting the door on a small fix that perhaps went unatended for benign
reasons. Perhaps a step before closure is feasble? something like getting a
nice message in the PR, "Ahoy! This PR hasn't moved in [X time]. If you're
still working on it, can you comment? Otherwise, our highly sophisticated
AI will declutter and close it in [Y days]".

Thoughts?


On Mon, Jun 25, 2018 at 8:23 AM Kenneth Knowles  wrote:

> Totally agree.
>
> By the way, these seem to be default labels for issue tracking. So I got
> rid of the ones that don't seem to make sense. Any committer can hack them
> I think. I just left "stale" for this purpose and "help wanted" since that
> makes sense on a PR. But probably we don't need any since we don't have a
> plan for them.
>
> Kenn
>
> On Mon, Jun 25, 2018 at 8:12 AM Ismaël Mejía  wrote:
>
>> Thanks Kenn, much better.
>>
>> Yes closing stale PRs is worth, but our ultimate goal should be to get
>> contributions in so we should keep in mind and try when it is worth to
>> rescue fixes that can be lost  because of minor review issues or
>> contributor inactivity.
>>
>> On Mon, Jun 25, 2018 at 4:23 PM Kenneth Knowles  wrote:
>>
>>> It is configured by just a file so alteration is very transparent. I
>>> agree with your point about the label. I made a new one for it. Here:
>>> https://github.com/apache/beam/pull/5750
>>>
>>> So far I have been satisfied that it close many _very_ stale PRs. I have
>>> been watching it and didn't see any that seemed wrong.
>>>
>>> Kenn
>>>
>>>
>>> On Mon, Jun 25, 2018 at 12:52 AM Ismaël Mejía  wrote:
>>>
 I saw some PRs auto closed recently and I was wondering if we could
 adjust the  label that is added to the autoclosed PRs, currently it is
 'wontfix' but this label sends a fake (and negative) message. Can we
 parametrize the bot to put something closer to the intention like
 'autoclosed'?

 Who can take care of this?
 Any other opinion/suggestion after these first days of the stale bot?

 I have the impression that the time between the staleness warning and
 the close is relatively short, of course PRs can be reopened but we
 (committers) should pay attention that a PR that is marked as stale is
 not stale because of unfinished reviews.

>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Design Proposal] Improving Beam code review

2018-06-26 Thread Rafael Fernandez
I did a quick pass on the doc and left minor comments, thanks! I have some
feedback and thoughts:

   - For metrics and tools, there ought to be mature OSS projects out there
   we can learn from. I believe Kubernetes has a very healthy practice, it'd
   be ideal to learn from them. +Griselda Cuevas  can
   connect you (and people working on this).
   - I really like the idea of a style guide (which can evolve) for the
   various areas - presumably Java, Python, Go, etc. have their own. The
   reason I like it is because reviews become easier -- the reviewer will have
   an easier time working with the contributor to make sure together they can
   introduce great code that is consistent with the codebase (so they can
   focus on functionality and scale discussions, not style, which is
   published).
   - I think setting review expectations is hard. Many of us in the
   community have various degrees of time devoted to development - some of us
   are paid to work on Beam full time, others part time, others are gifting
   their time and talent. I find inspiration in the Apache Code of Conduct [1]
   to instead empower people to communicate clearly. A company or a developer
   may choose to say "This is what you can expect from me", and say, opt-in to
   email reminders and such. And when something is time sensitive, we should
   trust reviewers to be Apache-y and do a micro version of "*Step down
   consderately*" -- "I can't commit to reviewing this by Friday, I suggest
   another person.", for example.

I think at the end of the day we all need to eliminate guesswork and
promote the healthiest communication we can so we can all continue to grow
the project as fast as we want.

r

[1] https://www.apache.org/foundation/policies/conduct.html

On Tue, Jun 26, 2018 at 5:48 PM Huygaa Batsaikhan  wrote:

> Reuven, that's great. In this thread, we can continue discussing the usage
> of review tools, dashboards, and metrics.
>
> On Tue, Jun 26, 2018 at 5:27 PM Reuven Lax  wrote:
>
>> So I suggested a while ago that we create a code-review guidelines doc,
>> and in fact I was coincidentally just now drafting up a proposal doc. I'll
>> share my proposal doc with the dev list soon.
>>
>> On Tue, Jun 26, 2018 at 5:18 PM Huygaa Batsaikhan 
>> wrote:
>>
>>> Hi, I've been looking into ways to improve Beam's code review process
>>> based on previous discussions on dev list and summits, and I would like to
>>> propose improvement ideas. Please take a look at:
>>> https://s.apache.org/beam-code-review.
>>>
>>> Main proposals suggested in the doc are:
>>>
>>>1. Create a code review guideline document.
>>>2. Build/setup code review tools and dashboards for Beam.
>>>3. Collect metrics to monitor Beam's code review health.
>>>
>>> Feel free to add comments in the doc. I am looking for all sorts of
>>> suggestions including existing code review guidelines, potential code
>>> review tools etc.
>>>
>>> Thanks so much,
>>> Huygaa
>>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: bad logger import?

2018-06-26 Thread Rafael Fernandez
Filed https://issues.apache.org/jira/browse/BEAM-4644 for this. I assigned
it to +Ankur Goenka  because it's the first name in
history :p (please reroute where appropriate).

Thanks!
r

On Tue, Jun 26, 2018 at 8:23 AM Lukasz Cwik  wrote:

> That is an internal class to the Flink runner. Runners are allowed to
> choose whichever logging framework they want to use with the understanding
> that the SDK and shared libraries use SLF4J but most likely its a simple
> typo.
>
> On Tue, Jun 26, 2018 at 7:22 AM Kenneth Knowles  wrote:
>
>> Seems like a legit bug to me. Perhaps we can adjust checkstyle, or some
>> other more semantic analysis, to forbid it.
>>
>> Kenn
>>
>> On Tue, Jun 26, 2018 at 6:48 AM Rafael Fernandez 
>> wrote:
>>
>>> +Lukasz Cwik  , +Henning Rohde 
>>>
>>> On Tue, Jun 26, 2018 at 1:25 AM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> answering a question on slack i realized flink
>>>> ExecutableStageDoFnOperator.java uses JUL instead of SLF4J, not sure it is
>>>> intended so thought I would mention it here.
>>>>
>>>> Side note: archetype and some test code does as well but it is less an
>>>> issue.
>>>>
>>>> Romain Manni-Bucau
>>>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>> <http://rmannibucau.wordpress.com> | Github
>>>> <https://github.com/rmannibucau> | LinkedIn
>>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>>>
>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: bad logger import?

2018-06-26 Thread Rafael Fernandez
+Lukasz Cwik  , +Henning Rohde 

On Tue, Jun 26, 2018 at 1:25 AM Romain Manni-Bucau 
wrote:

> Hi guys,
>
> answering a question on slack i realized flink
> ExecutableStageDoFnOperator.java uses JUL instead of SLF4J, not sure it is
> intended so thought I would mention it here.
>
> Side note: archetype and some test code does as well but it is less an
> issue.
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Going on leave for a bit

2018-06-26 Thread Rafael Fernandez
Have a great time, Kenn!

On Tue, Jun 26, 2018 at 5:49 AM Ismaël Mejía  wrote:

> Enjoy your family time.
>
> Best wishes,
> Ismael
>
>
> On Tue, Jun 26, 2018 at 12:13 PM Pei HE  wrote:
>
>> (A late) Congrats for the newborn!
>> --
>> Pei
>>
>> On Tue, Jun 26, 2018 at 1:42 PM, Kenneth Knowles  wrote:
>> > Hi friends,
>> >
>> > I think I did not mention on dev@ at the time, but my child #2 arrived
>> March
>> > 14 (Pi day!) and I took some weeks off. Starting ~July 4 I will be
>> taking a
>> > more significant absence, until ~October 1, trying my best to be totally
>> > offline.
>> >
>> > JFYI so that you know why JIRAs and PRs are not being addressed. I am
>> also
>> > unassigning my JIRAs so that I am not holding any mutexes, and I will
>> close
>> > PRs so they don't get stale.
>> >
>> > Any questions or pressing issues, I will be online this week and a
>> little
>> > bit next week.
>> >
>> > Kenn
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: FYI on Slack Channels

2018-06-26 Thread Rafael Fernandez
Ah! Didn't know -- thanks Romain!

Done for all channels I could find. Also, here is a list of channels:

#beam
#beam-events-meetups
#beam-go
#beam-java
#beam-portability
#beam-python
#beam-sql
#beam-testing


On Tue, Jun 26, 2018 at 1:18 AM Romain Manni-Bucau 
wrote:

> +1 sounds very good
>
> side note: any channel must invite @asfarchivebot, I did it for the ones
> before "etc" but if you add others please ensure it is done
>
> Romain Manni-Bucau
> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
> <https://rmannibucau.metawerx.net/> | Old Blog
> <http://rmannibucau.wordpress.com> | Github
> <https://github.com/rmannibucau> | LinkedIn
> <https://www.linkedin.com/in/rmannibucau> | Book
> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>
>
> Le mar. 26 juin 2018 à 01:05, Lukasz Cwik  a écrit :
>
>> +u...@beam.apache.org 
>>
>> On Mon, Jun 25, 2018 at 4:04 PM Rafael Fernandez 
>> wrote:
>>
>>> Hello!
>>>
>>> I took the liberty to create area-specific channels (such as #beam-java,
>>> #beam-python, #beam-go, etc.) As our project and community grows, I am
>>> seeing more and more "organic" interest groups forming -- this may help us
>>> chat more online. If they don't, we can delete later.
>>>
>>> Any thoughts? (I am having second thoughts... #beam-go should probably
>>> be #beam-burrow ;p )
>>>
>>> Cheers,
>>> r
>>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: scala/scio

2018-06-25 Thread Rafael Fernandez
It seems there is support in the community - what is the next step and who
is on it? How can I help?

On Mon, Jun 25, 2018 at 9:25 AM Jean-Baptiste Onofré 
wrote:

> Hi,
>
> I already proposed release per module while ago. It doesn't need any
> change on the existing module, just a different versioning.
>
> It's a little bit more work for the release manager to do a full
> release, but it gives more flexibility. It allows us also do use several
> branches for the same IO for instance.
>
> I already doing Apache ServiceMix Bundles and Specs release per release.
> So I can prepare some details about that.
>
> Regards
> JB
>
> On 25/06/2018 17:48, Ismaël Mejía wrote:
> > ​Agree with Kenn, granularity would be key for per-module Beam releases.
> >
> > We can easily think today of releasing separately IOs and extensions
> > because they depend only on the public API of the SDK which has not
> > changed much since 2.0.0. So scio will probably fit this category.
> >
> > A different story would be for runners at least at this moment with all
> > the ongoing work on portability, there could be important benefits on
> > having fixes and refactors applied globally when multiple internal
> > changes are happening.
> >
> > In the end separating modules for releases has its pros/cons. In the pro
> > side it encourages the project to be more stable and robust, but adds
> > the weight of guaranteeing that everything works together given the
> > multiple combination of versions released which has of course a price
> > (this can be a burden for some users if we consider that users should
> > also deal with the other dependencies of their own). Today Beam users
> > get this for free with the monolithic release approach.
> >
> > As usual with software trade-offs trade-offs.
> > ​
> >
> >
> >
> >
> > On Fri, Jun 22, 2018 at 7:53 PM Kenneth Knowles  > <mailto:k...@google.com>> wrote:
> >
> > I'm generally in favor of greater decoupling / decentralization
> > where possible. It is easy to imagine a world in which Beam consists
> > of a few fairly autonomous projects, some centralized policy, opt-in
> > infrastructure, just like ~all non-tiny software
> > projects/foundations/companies.
> >
> > The granularity matters quite a lot. But in my experience if you
> > choose subprojects to match clear self-sustaining subcommunities
> > then each one moves faster and has a stronger community than when
> > you try to erase those distinctions in a giant monoproject.
> >
> > Starting with an active project that already exists and has users
> > and maintainers works perfectly for getting the granularity right.
> > I'm for trying it if/when the opportunity arises. There are
> > technical challenges around shared infrastructure and global quality
> > assurance. It would be an opportunity to make them concrete and
> > address them.
> >
> > Kenn
> >
> >
> >
> > On Fri, Jun 22, 2018 at 10:24 AM Rafael Fernandez
> > mailto:rfern...@google.com>> wrote:
> >
> > I like the idea of per-module releases for Beam. I know Henning
> > and others have thought about that space as well.
> >
> > BTW, I'm a big fan of scio, and happy to help in any way
> > possible if they are interested in turning "de facto" into "de
> > jure" :D
> >
> > In such a world, Scio could be a very good first use case to
> > drive the mechanisms to enable per-module releases. I think it
> > allows us to scale better and sets a healthy path for special
> > interest groups to naturally emerge and collaborate with their
> > own scope in mind.
> >
> >
> > On Thu, Jun 21, 2018 at 11:48 PM alistair.m...@googlemail.com
> > <mailto:alistair.m...@googlemail.com>
> >  > <mailto:alistair.m...@googlemail.com>> wrote:
> >
> >
> >
> > On 2018/06/21 17:17:36, Reuven Lax  wrote:
> > > In that case things have changed since I talked to Neville
> > about it last
> > > November.
> > >
> > > On Thu, Jun 21, 2018 at 10:16 AM Rafal Wojdyla
> >  wrote:
> > >
> > > > Nope - it uses standard runners and is fully Beam
> compliant.
> > > >
> > > > On Thu, Jun 21, 2018 a

Re: [RESULT][VOTE] Apache Beam, version 2.5.0, release candidate #2

2018-06-25 Thread Rafael Fernandez
Neat! Thanks! (No guesswork! Just look at the calendar! :D)

On Mon, Jun 25, 2018 at 2:47 PM Kenneth Knowles  wrote:

> I quickly put together https://s.apache.org/beam-release-calendar. I
> started from the day that the 2.5.0 release branch was created.
>
> Kenn
>
> On Mon, Jun 25, 2018 at 2:10 PM Lukasz Cwik  wrote:
>
>> If our release process is taking longer then 6 weeks, we should probably
>> start the next release since we expect that it will take a long time as
>> well even though the prior one is not yet finished. This will help get to
>> an average of one release every 6 weeks.
>>
>> On Mon, Jun 25, 2018 at 1:52 PM Chamikara Jayalath 
>> wrote:
>>
>>> I think the idea was to include a 6 weeks worth of change diff in
>>> subsequent releases. So cutting the 2.6.0 release branch 6 weeks from the
>>> date 2.5.0 branch was cut sounds proper to me. I think we should
>>> consistently cut release branches every six weeks even though some of the
>>> releases might take longer than expected (hopefully not six weeks :)).
>>>
>>> - Cham
>>>
>>> On Mon, Jun 25, 2018 at 12:54 PM Andrew Pilloud 
>>> wrote:
>>>
>>>> If I recall from the 2.4 discussion, the 2.5 branch should have been
>>>> cut in late April. It was cut ~5 weeks late. Keeping to the 6 week release
>>>> cadence, we are already late cutting the 2.6 release branch, which should
>>>> be cut immediately. Then 2.7 should be cut in mid July.
>>>>
>>>> Andrew
>>>>
>>>> On Mon, Jun 25, 2018 at 12:32 PM Kenneth Knowles 
>>>> wrote:
>>>>
>>>>> Specifically, I mean that since we cut release-2.5.0 branch on Jun 6
>>>>> we would cut release-2.6.0 on July 18.
>>>>>
>>>>> This time around, we should cut first, cherry-pick second.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Mon, Jun 25, 2018 at 12:21 PM Jean-Baptiste Onofré 
>>>>> wrote:
>>>>>
>>>>>> It makes sense.
>>>>>>
>>>>>> So, I will start the 2.6.0 process on July 17.
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>> On 25/06/2018 20:47, Alan Myrvold wrote:
>>>>>> > It would be a more predictable cadence to have a consistent timing
>>>>>> > between when the release branches are cut, and not when the release
>>>>>> is
>>>>>> > published.
>>>>>> >
>>>>>> > If 2.5.0 was cut on June 5, then 2.6.0 could be cut July 17?
>>>>>> >
>>>>>> > On Mon, Jun 25, 2018 at 10:57 AM Ahmet Altay >>>>> > <mailto:al...@google.com>> wrote:
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Jun 25, 2018 at 10:49 AM, Ahmet Altay >>>>> > <mailto:al...@google.com>> wrote:
>>>>>> >
>>>>>> > JB, thank you for making this release happen.
>>>>>> >
>>>>>> > I noticed that python artifacts are not deployed to pypi
>>>>>> yet.
>>>>>> > Would you like me to do that?
>>>>>> >
>>>>>> > Thank you,
>>>>>> > Ahmet
>>>>>> >
>>>>>> > On Sat, Jun 23, 2018 at 6:45 AM, Rafael Fernandez
>>>>>> > mailto:rfern...@google.com>> wrote:
>>>>>> >
>>>>>> > Great news! Thanks so much to our Release Manager and
>>>>>> > everybody who helped iron out the wrinkles!
>>>>>> >
>>>>>> > If you haven't seen it already, look for the thread
>>>>>> > "[PROPOSAL] Add a blog post for Beam release 2.5.0
>>>>>> > ​" [1]​
>>>>>> > in dev@  - Alexey Romanenko has put together a very
>>>>>> nice
>>>>>> > summary of all the good stuff in 2.5.0.
>>>>>> >
>>>>>> >
>>>>>> > [1]
>>>>>> https://lists.apache.org/thread.html/ae3284ca051b800b3edd73ad0f7f62344e26d3957b46794149bf1fb2@%3Cdev.beam.apache.org%3E
>>>

Re: Fixing flaky tests and infrastructure stability

2018-06-25 Thread Rafael Fernandez
On Mon, Jun 25, 2018 at 3:49 PM Kenneth Knowles  wrote:

> There's also a "flake" label and a saved search here:
> https://issues.apache.org/jira/issues/?filter=12343195
>
> Kenn
>
> On Mon, Jun 25, 2018 at 12:48 PM Mikhail Gryzykhin 
> wrote:
>
>> Hello everyone,
>>
>> I have assembled short list with JIRA tickets for current issues with
>> post-commit tests.
>> https://issues.apache.org/jira/issues/?jql=parent%3DBEAM-4627
>>
>> Currently, list contains:
>> Flaky SQL test: https://issues.apache.org/jira/browse/BEAM-4628
>> Flaky examples test: https://issues.apache.org/jira/browse/BEAM-4637
>> Failed tests are reported as unstable job:
>> https://issues.apache.org/jira/browse/BEAM-4638
>> Failed tests are reported as success on detailed view:
>> https://issues.apache.org/jira/browse/BEAM-4638
>>
>> Additionally, we have a problem of quota issue:
>> https://issues.apache.org/jira/browse/BEAM-4630
>>
>
​Thanks for flagging - this should now be resolved.​



>
>>
>> All of these issues degrade quality of our post-commit tests signal.
>>
>> Can owners or active committers to corresponding areas work on fixing
>> these issues with high priority?
>> (If you pick up task, update it to "in progress" in JIRA and/or respond
>> to this thread please.)
>>
>> If you need any help resolving this issue or more information, feel free
>> to let me know, I'll be happy to help.
>>
>> Thank you,
>> --Mikhail
>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


FYI on Slack Channels

2018-06-25 Thread Rafael Fernandez
Hello!

I took the liberty to create area-specific channels (such as #beam-java,
#beam-python, #beam-go, etc.) As our project and community grows, I am
seeing more and more "organic" interest groups forming -- this may help us
chat more online. If they don't, we can delete later.

Any thoughts? (I am having second thoughts... #beam-go should probably be
#beam-burrow ;p )

Cheers,
r


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RESULT][VOTE] Apache Beam, version 2.5.0, release candidate #2

2018-06-23 Thread Rafael Fernandez
Great news! Thanks so much to our Release Manager and everybody who helped
iron out the wrinkles!

If you haven't seen it already, look for the thread "[PROPOSAL] Add a blog
post for Beam release 2.5.0
​" [1]​
in dev@  - Alexey Romanenko has put together a very nice summary of all the
good stuff in 2.5.0.

[1]
https://lists.apache.org/thread.html/ae3284ca051b800b3edd73ad0f7f62344e26d3957b46794149bf1fb2@%3Cdev.beam.apache.org%3E


"

On Fri, Jun 22, 2018 at 8:33 PM Jean-Baptiste Onofré 
wrote:

> I meant August (not July) for next release cycle.
>
> Regards
> JB
>
> On 23/06/2018 05:17, Jean-Baptiste Onofré wrote:
> > Hi all,
> >
> > I'm happy to announce that we have unanimously approved this release.
> >
> > There are 12 approving votes, 5 of which are binding:
> > * Ahmet Altay
> > * Jean-Baptiste Onofré
> > * Lukasz Cwik
> > * Reuven Lax
> > * Robert Bradshaw
> >
> > There are no disapproving votes.
> >
> > I'm finalizing the release.
> >
> > Thanks everyone!
> >
> > The 2.6.0 release process is expected to begin in 6 weeks. So we should
> > start the Jira triage on Saturday, 4th July and I would like to start
> > the release process on Tuesday 7th.
> >
> > Regards
> > JB
> >
> >
> > On 17/06/2018 07:18, Jean-Baptiste Onofré wrote:
> >> Hi everyone,
> >>
> >> Please review and vote on the release candidate #2 for the version
> >> 2.5.0, as follows:
> >>
> >> [ ] +1, Approve the release
> >> [ ] -1, Do not approve the release (please provide specific comments)
> >>
> >> NB: this is the first release using Gradle, so don't be too harsh ;) A
> >> PR about the release guide will follow thanks to this release.
> >>
> >> The complete staging area is available for your review, which includes:
> >> * JIRA release notes [1],
> >> * the official Apache source release to be deployed to dist.apache.org
> >> [2], which is signed with the key with fingerprint C8282E76 [3],
> >> * all artifacts to be deployed to the Maven Central Repository [4],
> >> * source code tag "v2.5.0-RC2" [5],
> >> * website pull request listing the release and publishing the API
> >> reference manual [6].
> >> * Java artifacts were built with Gradle 4.7 (wrapper) and OpenJDK/Oracle
> >> JDK 1.8.0_172 (Oracle Corporation 25.172-b11).
> >> * Python artifacts are deployed along with the source release to the
> >> dist.apache.org [2].
> >>
> >> The vote will be open for at least 72 hours. It is adopted by majority
> >> approval, with at least 3 PMC affirmative votes.
> >>
> >> Thanks,
> >> JB
> >>
> >> [1]
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12342847
> >> [2] https://dist.apache.org/repos/dist/dev/beam/2.5.0/
> >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> >> [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1043/
> >> [5] https://github.com/apache/beam/tree/v2.5.0-RC2
> >> [6] https://github.com/apache/beam-site/pull/463
> >>
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PROPOSAL] Add a blog post for Beam release 2.5.0

2018-06-22 Thread Rafael Fernandez
+1 and no additional suggestions. Great stuff!

On Fri, Jun 22, 2018, 2:08 PM Kenneth Knowles  wrote:

> +1 and added some too.
>
> On Fri, Jun 22, 2018 at 1:18 PM Ahmet Altay  wrote:
>
>> Thank you Alexey!
>>
>> It is a great idea. I added my suggestions to the doc.
>>
>> On Fri, Jun 22, 2018 at 1:01 PM, Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Hi everyone,
>>>
>>> I propose to continue a tradition of publishing new blog post for Beam
>>> web site (as it was positively accepted by community before) with announce
>>> of the key features/fixes incorporated in new release. To do that for
>>> upcoming release (2.5.0), I created a draft document of this future post:
>>>
>>>
>>> https://docs.google.com/document/d/1BeqHuH1U8iOFJWTfFPW_4O2HtLRZm9rlnEUyMB6Eq7M/edit?usp=sharing
>>>
>>> I added there details based on June newsletter and Jira filter but this
>>> document still misses key items for several important features, like - Beam
>>> SQL, Schema-Aware PCollections and Portability. I’d kindly ask the
>>> contributors, who was involved in development of these features, to add
>>> some details about what was included in this release and what deserves to
>>> be mentioned in this blog post.
>>>
>>> Also, if any other topics should be added, removed or completed then,
>>> please, feel free to do that too. In the same time, as we discussed before,
>>> this post should cover only major things since the detailed report will be
>>> included in release notes.
>>>
>>> Once this document will be completed and agreed, I’ll transfer it into
>>> markdown page for Beam web site.
>>>
>>> Please, let me know if there are any objections or suggestions about
>>> that.
>>>
>>> WBR,
>>> Alexey
>>>
>>>
>>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: scala/scio

2018-06-22 Thread Rafael Fernandez
I like the idea of per-module releases for Beam. I know Henning and others
have thought about that space as well.

BTW, I'm a big fan of scio, and happy to help in any way possible if they
are interested in turning "de facto" into "de jure" :D

In such a world, Scio could be a very good first use case to drive the
mechanisms to enable per-module releases. I think it allows us to scale
better and sets a healthy path for special interest groups to naturally
emerge and collaborate with their own scope in mind.


On Thu, Jun 21, 2018 at 11:48 PM alistair.m...@googlemail.com <
alistair.m...@googlemail.com> wrote:

>
>
> On 2018/06/21 17:17:36, Reuven Lax  wrote:
> > In that case things have changed since I talked to Neville about it last
> > November.
> >
> > On Thu, Jun 21, 2018 at 10:16 AM Rafal Wojdyla  wrote:
> >
> > > Nope - it uses standard runners and is fully Beam compliant.
> > >
> > > On Thu, Jun 21, 2018 at 1:12 PM, Reuven Lax  wrote:
> > >
> > >> My understanding was that under the covers it used the low-level
> Dataflow
> > >> service API to run the evaluations.
> > >>
> > >> On Thu, Jun 21, 2018 at 10:10 AM Rafal Wojdyla 
> wrote:
> > >>
> > >>> Hi.
> > >>> Reuven - sorry to hijack the thread - regarding REPL - what do you
> mean
> > >>> by it being very Dataflow specific?
> > >>>
> > >>> On Thu, Jun 21, 2018 at 12:04 PM, Jean-Baptiste Onofré <
> j...@nanthrax.net>
> > >>> wrote:
> > >>>
> >  As the code is not hosted at Apache as Beam, I would not consider
> SCIO
> >  as the "official" Scala DSL.
> > 
> >  However, I agree that it's "de facto" Scala DSL for Beam ;)
> > 
> >  Just "wording" ;)
> > 
> >  Regards
> >  JB
> > 
> >  On 21/06/2018 18:00, Robert Bradshaw wrote:
> >  > I might go so far as to say Scio *is* the official Scala API for
> Beam.
> >  > We point to it on our website, and have no plans to create
> another. It
> >  > just happens to not be maintained and released by us.
> >  > On Thu, Jun 21, 2018 at 7:37 AM Jean-Baptiste Onofré <
> j...@nanthrax.net>
> >  wrote:
> >  >>
> >  >> Hi Alistair,
> >  >>
> >  >> we discussed several times in the past with SCIO guys (especially
> >  >> Neville), but it seems there's no strong plan right now about a
> >  donation
> >  >> of SCIO in Beam.
> >  >> I think one of the concern is the release cycle, but I think it
> makes
> >  >> sense to think about a release per module in Beam. It would allow
> >  use to
> >  >> release DSLs, IOs/extensions independently. But that's another
> story
> >  ;)
> >  >>
> >  >> Regards
> >  >> JB
> >  >>
> >  >> On 21/06/2018 16:34, alistair.m...@googlemail.com wrote:
> >  >>> Hi,
> >  >>>
> >  >>> Is there any plan to make scio an official scala API for beam?
> If
> >  not, is there any plan to have a scala API?
> >  >>>
> >  >>> Thanks,
> >  >>> Alistair
> >  >>>
> >  >>
> >  >> --
> >  >> Jean-Baptiste Onofré
> >  >> jbono...@apache.org
> >  >> http://blog.nanthrax.net
> >  >> Talend - http://www.talend.com
> > 
> >  --
> >  Jean-Baptiste Onofré
> >  jbono...@apache.org
> >  http://blog.nanthrax.net
> >  Talend - http://www.talend.com
> > 
> > >>>
> > >>>
> > >
> > Thanks for the responses everyone. I'm essentially looking for some
> reassurance scio is still going to be supported in the long term. We'd like
> to use it for a big project.
>
> It states in the scio repo readme that from v0.3.0 it depends on beam and
> not on dataflow.
>
> Thanks,
> Alistair
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PROPOSAL] Merge samza-runner to master

2018-06-22 Thread Rafael Fernandez
​I think it's great to go ahead and merge it, so it can continue evolving.
As with all things, it'll adopt new stuff as it becomes ready (in fact, it
may even prove to be a great example of how to port an existing "legacy"
runner to the portability stuff when ready).​

It seems the immediate blocker (gradle) was addressed, and there is great
future work planned. Exciting!

On Thu, Jun 21, 2018 at 8:00 PM Kenneth Knowles  wrote:

> *Contributors*
> Agree with Robert's concern. But this is a nice opportunity for Beam to
> connect. It is a different sort of backend and a different sort of
> community that we are linking in.
>
> Consider the Gearpump and Apex runners: both had resumes that met the
> requirements, but might not today. But they haven't been a burden. I have
> some hope the Samza runner might have a better chance recruiting users and
> contributors, since the value add for Samza users is unique among Beam
> runners, and likewise the Samza community is unique among backend
> communities.
>
> *Portability*
> My take is that we shouldn't _start_ any runner down the legacy path. But
> this is runner predates portability. I don't think the Java SDK is ready to
> provide feature parity, much less adequate performance, so it doesn't seem
> reasonable to require using it. Community > code as well.
>
> Kenn
>
> On Thu, Jun 21, 2018 at 3:34 PM Robert Bradshaw 
> wrote:
>
>> Neat to see a new runner on board!
>>
>> I would like to make it a requirement for all new runners to support
>> the portability API, but given that it's still somewhat of a moving
>> target, and you have ongoing work in this direction, that may not be a
>> hard requirement.
>>
>> I'm a bit concerned that there is are only two contributors (but the
>> git logs): you and Kenn. But you do indicate there are others
>> interested in working on this.
>>
>> Other than that, this looks great.
>>
>> - Robert
>>
>>
>> On Thu, Jun 21, 2018 at 3:14 PM Xinyu Liu  wrote:
>> >
>> > I updated the merge PR with the gradle integration (there was some
>> Jenkins Java tests failure with google cloud quota issues. It seems not
>> related to this patch). Please feel free to ping me if anything else is
>> needed.
>> >
>> > Thanks,
>> > Xinyu
>> >
>> > On Mon, Jun 18, 2018 at 5:44 PM, Xinyu Liu 
>> wrote:
>> >>
>> >> @Kenn: I am going to add the build.gradle. Is there anything else?
>> >>
>> >> @Ahmet, @Robert: here are more details about the samza runner right
>> now:
>> >>
>> >> - Missing pieces: timer support in ParDo is not there yet and I plan
>> to add it soon. SplittableParDo is missing but we don't have a use case so
>> far. We are on par with the other runners for the rest of the Java features.
>> >> - Work in Progress: implement the portable pipeline runner logic.
>> >> - Future plans: support Python is our next goal. Hopefully we will get
>> a prototype working sometime next quarter :).
>> >>
>> >> Btw, thanks everyone for the comments!
>> >>
>> >> Thanks,
>> >> Xinyu
>> >>
>> >> On Mon, Jun 18, 2018 at 4:59 PM, Robert Burke 
>> wrote:
>> >>>
>> >>> This is exciting! Is it implemented as a portability framework runner
>> too?
>> >>>
>> >>>
>> >>> On Mon, Jun 18, 2018, 4:36 PM Pablo Estrada 
>> wrote:
>> >>>>
>> >>>> It's very exciting to see a new runner making it into master. : )
>> >>>>
>> >>>> Best
>> >>>> -P.
>> >>>>
>> >>>> On Mon, Jun 18, 2018 at 3:38 PM Rafael Fernandez <
>> rfern...@google.com> wrote:
>> >>>>>
>> >>>>> I've just read this and wanted to share my excitement :D
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Mon, Jun 18, 2018 at 3:10 PM Kenneth Knowles 
>> wrote:
>> >>>>>>
>> >>>>>> One thing that will be necessary is porting the build to Gradle.
>> >>>>>>
>> >>>>>> Kenn
>> >>>>>>
>> >>>>>> On Mon, Jun 18, 2018 at 11:57 AM Xinyu Liu 
>> wrote:
>> >>>>>>>
>> >>>>>>> Hi, Folks,
>> >>>>>>>
>> >>>>>>> On behalf of the Samza team, I would like to propose to merge t

Re: [DISCUSS] Releasing Beam in the presence of emergencies

2018-06-19 Thread Rafael Fernandez
>>>> Thanks,
>>>> Cham
>>>>
>>>> On Thu, Jun 14, 2018 at 10:29 PM Jean-Baptiste Onofré 
>>>> wrote:
>>>>
>>>>> Hi Rafael,
>>>>>
>>>>> It's a good point but I don't see nothing more to do on our side: if a
>>>>> emergency issue is detected, then we have to address it and release a
>>>>> fix release (x.y.z where z is the specific release fixing the issue).
>>>>> The commitment is a best effort as in all community: if an emergency
>>>>> issue is detected, qualified and accepted, then we do our best to
>>>>> provide a fix and do the fix release.
>>>>>
>>>>> So, for me, it's already handled.
>>>>>
>>>>> By the way, just a quick reminder in term of release:
>>>>>
>>>>> - now that gradle release seems ok, we resume our release cycle every ~
>>>>> 6 weeks
>>>>> - we can cut release anytime if required, especially to address
>>>>> emergency issues.
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>> On 14/06/2018 22:33, Rafael Fernandez wrote:
>>>>> > Hi Beam devs,
>>>>> >
>>>>> > Emergencies can and will happen. As Apache Beam adoption continues to
>>>>> > grow, the user community will naturally expect the Beam developer
>>>>> > community to react to critical issues, such as security
>>>>> vulnerabilities
>>>>> > in our dependencies. I want to make sure the dev community is in
>>>>> > agreement that we follow the ASF Vulnerability Handling processes [1]
>>>>> > for such occurrences.
>>>>> >
>>>>> >
>>>>> > In addition, I'd like to discuss cases in which data correctness/loss
>>>>> > may warrant an expedited release (i.e., we did not wait 72 hours),
>>>>> as we
>>>>> > did in 2.1.1 [2].  Concretely:
>>>>> >
>>>>> >
>>>>> >  1.
>>>>> >
>>>>> > Do we need to add anything to our project website so the user
>>>>> > community knows how we react to such issues?
>>>>> >
>>>>> >  2.
>>>>> >
>>>>> > Should we have an entry in the contributor guide to address
>>>>> critical
>>>>> > point releases, so we eliminate any guesswork in the event of an
>>>>> > emergency? (Example text [3])
>>>>> >
>>>>> >
>>>>> > Thanks,
>>>>> >
>>>>> > r
>>>>> >
>>>>> >
>>>>> > [1]
>>>>> >
>>>>> > _https://apache.org/security/committers.html#vulnerability-handling_
>>>>> >
>>>>> > [2]
>>>>> https://lists.apache.org/list.html?dev@beam.apache.org:lte=40M:2.1.1
>>>>> >
>>>>> > *
>>>>> >
>>>>> > [3] Example text for the contributor guideline:
>>>>> >
>>>>> >
>>>>> > What requires a critical point release?
>>>>> >
>>>>> >   *
>>>>> >
>>>>> > A data loss bug
>>>>> >
>>>>> >   *
>>>>> >
>>>>> > A data corruption bug
>>>>> >
>>>>> >   *
>>>>> >
>>>>> > A processing correctness bug
>>>>> >
>>>>> >   *
>>>>> >
>>>>> > For security vulnerabilities, please follow
>>>>> >
>>>>> https://apache.org/security/committers.html#vulnerability-handling .
>>>>> >
>>>>> >
>>>>> > What do we do a critical point release on?
>>>>> >
>>>>> > Our first priority is to stop the bleeding. We ought to prioritize a
>>>>> > point release for the latest Beam version, based on the release
>>>>> branch,
>>>>> > that cherrypicks only the intended fix.
>>>>> >
>>>>> >   *
>>>>> >
>>>>> > We've done it before! Remember 2.1.1
>>>>> > <
>>>>> https://lists.apache.org/list.html?dev@beam.apache.org:lte=40M:2.1.1>?
>>>>> >
>>>>> >   o
>>>>> >
>>>>> > Since this is a critical release, we can relax our usual 72
>>>>> hour
>>>>> > voting policy. It worked well for 2.1.1, we should make it
>>>>> > repeatable: Propose, have PMC folks do due diligence on the
>>>>> > request, and sign off. Since this is critical, we may want to
>>>>> > have more than one person working on the release.
>>>>> >
>>>>> >   *
>>>>> >
>>>>> > Once we get it out, the community can discuss which previous
>>>>> > releases would benefit from a potential point release.
>>>>> >
>>>>> >
>>>>> > Who proposes a critical point release?
>>>>> >
>>>>> > Any member of the community. 3 PMC +1 votes are sufficient to get the
>>>>> > process rolling.
>>>>> >
>>>>> > *
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>> --
>>>>> Jean-Baptiste Onofré
>>>>> jbono...@apache.org
>>>>> http://blog.nanthrax.net
>>>>> Talend - http://www.talend.com
>>>>>
>>>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PROPOSAL] Merge samza-runner to master

2018-06-18 Thread Rafael Fernandez
I've just read this and wanted to share my excitement :D



On Mon, Jun 18, 2018 at 3:10 PM Kenneth Knowles  wrote:

> One thing that will be necessary is porting the build to Gradle.
>
> Kenn
>
> On Mon, Jun 18, 2018 at 11:57 AM Xinyu Liu  wrote:
>
>> Hi, Folks,
>>
>> On behalf of the Samza team, I would like to propose to merge the
>> samza-runner branch into master. The branch was created on Jan when we
>> first introduced the Samza Runner [1], and we've been adding features and
>> refining it afterwards. Now the runner satisfies the criteria outlined in
>> [2], and merging it to master will give more visibility to other
>> contributors and users.
>>
>> 1. Have at least 2 contributors interested in maintaining it, and 1
>> committer interested in supporting it: *Both Chris and me have been making
>> contributions and I am going to sign up for the support. There are more
>> folks in the Samza team interested in contributing to it. Thanks Kenn for
>> all the help and reviews for the runner!*
>> 2. Provide both end-user and developer-facing documentation: *The PR for
>> the samza-runner doc has runner user guide, capability matrix, and tutorial
>> using WordCount examples.*
>> 3. Have at least a basic level of unit test coverage: *Unit tests are
>> here [3].*
>> 4. Run all existing applicable integration tests with other Beam components
>> and create additional tests as appropriate: Enabled ValidatesRunner tests.*
>> 5. Be able to handle a subset of the model that addresses a significant
>> set of use cases, such as ‘traditional batch’ or ‘processing time
>> streaming’: *We have test Beam jobs running in Yarn using event-time
>> processing of Kafka streams.*
>> 6. Update the capability matrix with the current status. *Same as #2.*
>> 7. Add a webpage under documentation/runners. *Same as #2.*
>>
>> The PR for the samza-runner merge:
>> https://github.com/apache/beam/pull/5668
>> The PR for the samza-runner doc:
>> https://github.com/apache/beam-site/pull/471
>>
>> Thanks,
>> Xinyu
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-3079
>> [2] https://beam.apache.org/contribute/
>> [3]
>> https://github.com/apache/beam/tree/samza-runner/runners/samza/src/test
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


[DISCUSS] Releasing Beam in the presence of emergencies

2018-06-14 Thread Rafael Fernandez
Hi Beam devs,

Emergencies can and will happen. As Apache Beam adoption continues to grow,
the user community will naturally expect the Beam developer community to
react to critical issues, such as security vulnerabilities in our
dependencies. I want to make sure the dev community is in agreement that we
follow the ASF Vulnerability Handling processes [1] for such occurrences.

In addition, I'd like to discuss cases in which data correctness/loss may
warrant an expedited release (i.e., we did not wait 72 hours), as we did in
2.1.1 [2].  Concretely:


   1.

   Do we need to add anything to our project website so the user community
   knows how we react to such issues?
   2.

   Should we have an entry in the contributor guide to address critical
   point releases, so we eliminate any guesswork in the event of an emergency?
   (Example text [3])


Thanks,

r


[1]
*https://apache.org/security/committers.html#vulnerability-handling
*

[2] https://lists.apache.org/list.html?dev@beam.apache.org:lte=40M:2.1.1








*[3] Example text for the contributor guideline:What requires a critical
point release? - A data loss bug- A data corruption bug- A processing
correctness bug- For security vulnerabilities, please follow
https://apache.org/security/committers.html#vulnerability-handling
 .What
do we do a critical point release on?Our first priority is to stop the
bleeding. We ought to prioritize a point release for the latest Beam
version, based on the release branch, that cherrypicks only the intended
fix. - We've done it before! Remember 2.1.1
? -
Since this is a critical release, we can relax our usual 72 hour voting
policy. It worked well for 2.1.1, we should make it repeatable: Propose,
have PMC folks do due diligence on the request, and sign off. Since this is
critical, we may want to have more than one person working on the release.-
Once we get it out, the community can discuss which previous releases would
benefit from a potential point release. Who proposes a critical point
release?Any member of the community. 3 PMC +1 votes are sufficient to get
the process rolling.*


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [VOTE] Go SDK

2018-05-22 Thread Rafael Fernandez
+1 !

On Tue, May 22, 2018 at 7:54 AM Lukasz Cwik  wrote:

> +1 (binding)
>
> On Tue, May 22, 2018 at 6:16 AM Robert Burke  wrote:
>
>> +1 (non-binding)
>>
>> I'm looking forward to helping gophers solve their big data problems in
>> their language of choice, and runner of choice!
>>
>> Next stop, a non-java portability runner?
>>
>> On Tue, May 22, 2018, 6:08 AM Kenneth Knowles  wrote:
>>
>>> +1 (binding)
>>>
>>> This is great. Feels like a phase change in the life of Apache Beam,
>>> having three languages, with multiple portable runners on the horizon.
>>>
>>> Kenn
>>>
>>> On Tue, May 22, 2018 at 2:50 AM Ismaël Mejía  wrote:
>>>
 +1 (binding)

 Go SDK brings new language support for a community not well supported in
 the Big Data world the Go developers, so this is a great. Also the fact
 that this is the first SDK integrated with the portability work makes
 it an
 interesting project to learn lessons from for future languages.

 Now it is the time to start building a community around the Go SDK this
 is
 the most important task now, and the only way to do it is to have the
 SDK
 as an official part of Beam so +1.

 Congrats to Henning and all the other contributors for this important
 milestone.
 On Tue, May 22, 2018 at 10:21 AM Holden Karau 
 wrote:

 > +1 (non-binding), I've had a chance to work with the SDK and it's
 pretty
 neat to see Beam add support for a language before the most of the big
 data
 ecosystem.

 > On Mon, May 21, 2018 at 10:29 PM, Jean-Baptiste Onofré <
 j...@nanthrax.net>
 wrote:

 >> Hi Henning,

 >> SGA has been filed for the entire project during the incubation
 period.

 >> Here, we have to check if SGA/IP donation is clean for the Go SDK.

 >> We don't have a lot to do, just checked that we are clean on this
 front.

 >> Regards
 >> JB

 >> On 22/05/2018 06:42, Henning Rohde wrote:

 >>> Thanks everyone!

 >>> Davor -- regarding your two comments:
 >>> * Robert mentioned that "SGA should have probably already been
 filed" in the previous thread. I got the impression that nothing further
 was needed. I'll follow up.
 >>> * The standard Go tooling basically always pulls directly from
 github, so there is no real urgency here.

 >>> Thanks,
 >>>Henning


 >>> On Mon, May 21, 2018 at 9:30 PM Jean-Baptiste Onofré <
 j...@nanthrax.net
 > wrote:

 >>>  +1 (binding)

 >>>  I just want to check about SGA/IP/Headers.

 >>>  Thanks !
 >>>  Regards
 >>>  JB

 >>>  On 22/05/2018 03:02, Henning Rohde wrote:
 >>>   > Hi everyone,
 >>>   >
 >>>   > Now that the remaining issues have been resolved as
 discussed,
 >>>  I'd like
 >>>   > to propose a formal vote on accepting the Go SDK into
 master. The
 >>>  main
 >>>   > practical difference is that the Go SDK would be part of the
 >>>  Apache Beam
 >>>   > release going forward.
 >>>   >
 >>>   > Highlights of the Go SDK:
 >>>   >   * Go user experience with natively-typed DoFns with
 (simulated)
 >>>   > generic types
 >>>   >   * Covers most of the Beam model: ParDo, GBK, CoGBK,
 Flatten,
 >>>  Combine,
 >>>   > Windowing, ..
 >>>   >   * Includes several IO connectors: Datastore, BigQuery,
 PubSub,
 >>>   > extensible textio.
 >>>   >   * Supports the portability framework for both batch and
 streaming,
 >>>   > notably the upcoming portable Flink runner
 >>>   >   * Supports a direct runner for small batch workloads and
 testing.
 >>>   >   * Includes pre-commit tests and post-commit integration
 tests.
 >>>   >
 >>>   > And last but not least
 >>>   >   *  includes contributions from several independent users
 and
 >>>   > developers, notably an IO connector for Datastore!
 >>>   >
 >>>   > Website: https://beam.apache.org/documentation/sdks/go/
 >>>   > Code: https://github.com/apache/beam/tree/master/sdks/go
 >>>   > Design: https://s.apache.org/beam-go-sdk-design-rfc
 >>>   >
 >>>   > Please vote:
 >>>   > [ ] +1, Approve that the Go SDK becomes an official part of
 Beam
 >>>   > [ ] -1, Do not approve (please provide specific comments)
 >>>   >
 >>>   > Thanks,
 >>>   >   The Gophers of Apache Beam
 >>>   >
 >>>   >




 > --
 > Twitter: https://twitter.com/holdenkarau

>>>


smime.p7s
Description: S/MIME Cryptographic Signature

Re: [ANNOUCEMENT] New Foundation members!

2018-03-30 Thread Rafael Fernandez
Congratulations!!!

On Fri, Mar 30, 2018 at 8:29 PM Aviem Zur  wrote:

> Congrats!
>
> On Sat, Mar 31, 2018 at 2:30 AM Ahmet Altay  wrote:
>
>> Congratulations to all of you!
>>
>>
>> On Fri, Mar 30, 2018, 4:29 PM Pablo Estrada  wrote:
>>
>>> Congratulations y'all! Very cool.
>>> Best
>>> -P.
>>>
>>> On Fri, Mar 30, 2018 at 4:09 PM Davor Bonaci  wrote:
>>>
 Now that this is public... please join me in welcoming three newly
 elected members of the Apache Software Foundation with ties to this
 community, who were elected during the most recent Members' Meeting.

 * Ismaël Mejía (Beam PMC)

 * Josh Wills (Crunch Chair; Beam, DataFu PMC)

 * Holden Karau (Spark, SystemML PMC; Mahout, Subversion committer; Beam
 contributor)

 These individuals demonstrated merit in Foundation's growth, evolution,
 and progress. They were recognized, nominated, and elected by existing
 membership for their significant impact to the Foundation as a whole, such
 as the roots of project-related and cross-project activities.

 As members, they now become legal owners and shareholders of the
 Foundation. They can vote for the Board, incubate new projects, nominate
 new members, participate in any PMC-private discussions, and contribute to
 any project.

 (For the Beam community, this election nearly doubles the number of
 Foundation members. The new members are joining Jean-Baptiste Onofré,
 Stephan Ewen, Romain Manni-Bucau and myself in this role.)

 I'm happy to be able to call all three of you my fellow members.
 Congratulations!


 Davor

>>> --
>>> Got feedback? go/pabloem-feedback
>>> 
>>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Proposed improvements to our documentation

2018-02-28 Thread Rafael Fernandez
Thanks for all the feedback. I have filed a JIRA [1] to get started.

https://issues.apache.org/jira/browse/BEAM-3763


On Wed, Feb 28, 2018 at 12:11 PM Reuven Lax <re...@google.com> wrote:

> +1 - to many things are documented only in Javadoc. While there are some
> users who are more likely to read Javadoc (e.g. via an IDE), we should try
> and have this part of our public documentation. This will help us document
> the other languages as well. I've noticed that some basic things (e.g. how
> do I access the current window inside a ParDo) are not easy to discover in
> our documentation.
>
> Also strong +1 to Eugene's proposal. Much of our documentation is
> base-level documentation. i.e. we document the low-level concepts such as
> PCollection, etc. However there's a strong need for use-case based
> documentation.
>
> Reuven
>
>
> On Wed, Feb 28, 2018 at 11:58 AM Eugene Kirpichov <kirpic...@google.com>
> wrote:
>
>> +1 sounds reasonable.
>>
>> A couple more areas where our documentation could use some work:
>>
>> - I'm feeling very strongly that the documentation of windowing/triggers
>> is due for a complete rewrite. It was written when Beam was first being
>> revealed to the world, and now we have both extensive experience with it
>> ourselves, as well as extensive experience explaining it to users and
>> seeing what users get wrong in practice.
>>
>> - It'd be good if we had in-depth articles in the documentation on common
>> but broad topics, such as "How do I enrich a stream", "How do I join two
>> streams", "How do I efficiently call an external REST service", "How do I
>> express sequencing, do X then Y", "How do I maintain a running
>> sliding-window aggregation" etc.
>>
>> On Wed, Feb 28, 2018 at 11:00 AM Chamikara Jayalath <chamik...@google.com>
>> wrote:
>>
>>> +1
>>>
>>> A per-transform reference will definitely help Python (and Go ?) since
>>> some transforms lack detailed documentation compared to Java. Additionally
>>> it might be a good idea to compare Java/Py/Go docs in general to make sure
>>> there are no inconsistencies.
>>>
>>> - Cham
>>>
>>> On Wed, Feb 28, 2018 at 10:53 AM Lukasz Cwik <lc...@google.com> wrote:
>>>
>>>> +1
>>>>
>>>> On Wed, Feb 28, 2018 at 10:46 AM, Kenneth Knowles <k...@google.com>
>>>> wrote:
>>>>
>>>>> Yes! I love the idea of having a good cross-language transform
>>>>> reference on the web site. Very good idea to get started now and provide
>>>>> the skeleton, then fill out additional transforms and additional languages
>>>>> incrementally.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Wed, Feb 28, 2018 at 10:23 AM, Rafael Fernandez <
>>>>> rfern...@google.com> wrote:
>>>>>
>>>>>> Hi folks,
>>>>>>
>>>>>> I think we've all seen a few areas of improvement here and there in
>>>>>> our docs. For example, one can find a a Javadoc entry with outdated 
>>>>>> content
>>>>>> here and there [1], or "sample" code snippets that have problems, such as
>>>>>> not compiling [2].
>>>>>>
>>>>>> I think a good thing to do is to invest in extending our
>>>>>> documentation to having a robust per-transform reference, which has 
>>>>>> samples
>>>>>> and a good description of what the transform does, and keep JavaDoc as a
>>>>>> solid source of API documentation. I believe similar approaches can 
>>>>>> benefit
>>>>>> Python and other languages.
>>>>>>
>>>>>> What do you think? I'm happy to spend some time now and then and
>>>>>> incrementaly move in this direction. I would like some help from the
>>>>>> community with reviews, suggestions (and perhaps picking up associated
>>>>>> JIRAs as I file them.) Good idea? Bad? Try? +1?
>>>>>>
>>>>>> Thanks,
>>>>>> r
>>>>>>
>>>>>> [1] See
>>>>>> https://github.com/apache/beam/blob/a629f73ee4e64c470e0c78cc6f51b8625d781b41/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/CombineWithContext.java
>>>>>> , which contains a stale reference to KeyedCombineFn .
>>>>>>
>>>>>> [2]
>>>>>> https://github.com/apache/beam/blob/5fb30ec8265c841cd8c4e6ae16b43be1f171eabb/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/FlatMapElements.java#L65
>>>>>>
>>>>>
>>>>>
>>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Proposed improvements to our documentation

2018-02-28 Thread Rafael Fernandez
Hi folks,

I think we've all seen a few areas of improvement here and there in our
docs. For example, one can find a a Javadoc entry with outdated content
here and there [1], or "sample" code snippets that have problems, such as
not compiling [2].

I think a good thing to do is to invest in extending our documentation to
having a robust per-transform reference, which has samples and a good
description of what the transform does, and keep JavaDoc as a solid source
of API documentation. I believe similar approaches can benefit Python and
other languages.

What do you think? I'm happy to spend some time now and then and
incrementaly move in this direction. I would like some help from the
community with reviews, suggestions (and perhaps picking up associated
JIRAs as I file them.) Good idea? Bad? Try? +1?

Thanks,
r

[1] See
https://github.com/apache/beam/blob/a629f73ee4e64c470e0c78cc6f51b8625d781b41/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/CombineWithContext.java
, which contains a stale reference to KeyedCombineFn .

[2]
https://github.com/apache/beam/blob/5fb30ec8265c841cd8c4e6ae16b43be1f171eabb/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/FlatMapElements.java#L65


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Beam 2.4.0

2018-02-20 Thread Rafael Fernandez
+1 on having release trains scheduled.

Romain: Do you have a list of PRs that could benefit from increased focus
if they want to make it on the upcoming train?


On Tue, Feb 20, 2018 at 3:30 PM Ahmet Altay  wrote:

> +1 for having regular release cycles. Finalizing a release takes time in
> the order of a few weeks and starting a new release soon after the previous
> one is a reliable way for having releases every 6 weeks.
>
> On Tue, Feb 20, 2018 at 2:30 PM, Robert Bradshaw 
> wrote:
>
>> Yep. I am starting the "Let's do a 2.4.0 release" thread almost
>> exactly 6 weeks after JB first started the 2.3.0 release thread.
>>
>> On Tue, Feb 20, 2018 at 2:20 PM, Charles Chen  wrote:
>> > I would like to +1 the faster release cycle process JB and Robert have
>> been
>> > advocating and implementing, and thank JB for releasing 2.3.0 smoothly.
>> > When we block for specific features and increase the time between
>> releases,
>> > we increase the urgency for PR authors to push for their change to go
>> into
>> > an upcoming release, which is a feedback loop that results in our
>> releases
>> > taking months instead of weeks.  We should however try to get pending
>> PRs
>> > wrapped up.
>> >
>> > On Tue, Feb 20, 2018 at 2:15 PM Romain Manni-Bucau <
>> rmannibu...@gmail.com>
>> > wrote:
>> >>
>> >> Kind of agree but rythm was supposed to be 6 weeks IIRC, 2.3 is just
>> out
>> >> so 1 week is a bit fast IMHO.
>> >>
>> >> Le 20 févr. 2018 23:13, "Robert Bradshaw"  a
>> écrit :
>> >>>
>> >>> One of the main shifts that I think helped this release was explicitly
>> >>> not being feature driven, rather releasing what's already in the
>> >>> branch. That doesn't mean it's not a good call to action to try and
>> >>> get long-pending PRs or similar wrapped up.
>> >>>
>> >>> On Tue, Feb 20, 2018 at 2:10 PM, Romain Manni-Bucau
>> >>>  wrote:
>> >>> > There are a lot of long pending PR, would be good to merge them
>> before
>> >>> > 2.4.
>> >>> > Some are bringing tests for the 2.3 release which can be critical to
>> >>> > include.
>> >>> >
>> >>> > Maybe we should list the pr and jira we want it before picking a
>> date?
>> >>> >
>> >>> > Le 20 févr. 2018 22:02, "Konstantinos Katsiapis" <
>> katsia...@google.com>
>> >>> > a
>> >>> > écrit :
>> >>> >>
>> >>> >> +1 since tf.transform 0.6 depends on Beam 2.4 and Tensorflow 1.6
>> (and
>> >>> >> the
>> >>> >> latter already has an RC out, so we will likely be blocked on
>> Beam).
>> >>> >>
>> >>> >> On Tue, Feb 20, 2018 at 12:50 PM, Robert Bradshaw
>> >>> >> 
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> Now that Beam 2.3.0 went out (and in record time, kudos to all
>> that
>> >>> >>> made this happen!) It'd be great to keep the ball rolling for a
>> >>> >>> similarly well-executed 2.4. A lot has gone in [1] since we made
>> the
>> >>> >>> 2.3 cut, and to keep our cadence up I would propose a time-based
>> cut
>> >>> >>> date early next week (say the 28th).
>> >>> >>>
>> >>> >>> I'll volunteer to do this release.
>> >>> >>>
>> >>> >>> [1] https://github.com/apache/beam/compare/release-2.3.0...master
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Gus Katsiapis | Software Engineer | katsia...@google.com |
>> >>> >> 650-918-7487
>>
>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: rename: BeamRecord -> Row

2018-02-02 Thread Rafael Fernandez
Very strong +1


On Fri, Feb 2, 2018 at 1:24 PM Reuven Lax  wrote:

> We're looking at renaming the BeamRecord class
> , that was used for columnar
> data. There was sufficient discussion on the naming, that I want to make
> sure the dev list is aware of naming plans here.
>
> BeamRecord is a columnar, field-based record. Currently it's used by
> BeamSQL, and the plan is to use it for schemas as well. "Record" is a
> confusing name for this class, as all elements in the Beam model are
> referred to as "records," whether or not they have schemas. "Row" is a much
> clearer name.
>
> There was a lot of discussion whether to name this BeamRow or just plain
> Row (in the org.apache.beam.values namespace). The argument in favor of
> BeamRow was so that people aren't forced to qualify their type names in the
> case of a conflict with a Row from another package. The argument in favor
> of Row was that it's a better name, it's in the Beam namespace anyway, and
> it's what the rest of the world (Cassandra, Hive, Spark, etc.) calls
> similar classes.
>
> RIght not consensus on the PR is leaning to Row. If you feel strongly,
> please speak up :)
>
> Reuven
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [DISCUSS] Towards Beam 2.3.0

2018-01-08 Thread Rafael Fernandez
+1! I like the predictability a schedule would bring. And I think it helps
feature users to budget their time a little better -- there's always the
next scheduled train, so no need to stress out to ship in the current one.




On Mon, Jan 8, 2018 at 10:37 AM Kenneth Knowles  wrote:

> +1 to not holding except for critical bugs and regressions. Using 2.3.0 to
> improve automation is a great idea.
>
> Features can make the next release, and backwards incompatible refinements
> should have quiesced long before a feature comes out of @Experimental
> status.
>
> On Mon, Jan 8, 2018 at 10:32 AM, Reuven Lax  wrote:
>
>> +1 - this is definitely one of the (multiple) things that delayed 2.2.0.
>> In my opinion releases should be held up for critical bug fixes, but not
>> for features. Any feature work can always go into 2.4.0, and with any luck
>> we can get 2.3.0 out much faster than 2.2.0.
>>
>> Reuven
>>
>> On Mon, Jan 8, 2018 at 10:01 AM, Robert Bradshaw 
>> wrote:
>>
>>> +1 for starting the 2.3 ball rolling.
>>>
>>> In general, I'd like to avoid holding up releases for specific
>>> features/PRs. The is (one of the things) that holds up releases which
>>> then is a vicious cycle for more people wanting to make their feature
>>> a condition of the next release, etc. (Bugs, regressions, and
>>> backwards-incompatible refinements to new features are fair candidates
>>> as the need arises...)
>>>
>>> On Mon, Jan 8, 2018 at 7:04 AM, Jean-Baptiste Onofré 
>>> wrote:
>>> > Hi Romain,
>>> >
>>> > no problem: let's try a best effort and define the target version in
>>> the
>>> > Jira.
>>> >
>>> > Regards
>>> > JB
>>> >
>>> > On 01/08/2018 03:51 PM, Romain Manni-Bucau wrote:
>>> >>
>>> >> Hi JB,
>>> >>
>>> >> I'd like https://github.com/apache/beam/pull/4235 to be integrated if
>>> >> possible
>>> >>
>>> >> Also the JUnit 5 PR brings some light changes which can be worth the
>>> "3"
>>> >> digit upgrade so if anyone has some time to review it can be a good
>>> >> candidate too.
>>> >>
>>> >> Thanks for driving it
>>> >>
>>> >>
>>> >> Romain Manni-Bucau
>>> >> @rmannibucau  | Blog
>>> >>  | Old Blog
>>> >>  | Github <
>>> https://github.com/rmannibucau>
>>> >> | LinkedIn 
>>> >>
>>> >> 2018-01-08 15:37 GMT+01:00 Jean-Baptiste Onofré >> >> >:
>>> >>
>>> >> Hi guys,
>>> >>
>>> >> In a previous discussion thread, we agreed that we should have a
>>> >> regular
>>> >> pace in term of releases.
>>> >>
>>> >> We released Beam 2.2.0 on the 16th of November '17, but the
>>> release
>>> >> takes a
>>> >> pretty long time.
>>> >>
>>> >> I think it's reasonable to think about Beam 2.3.0 in the coming
>>> weeks.
>>> >> I
>>> >> would like to propose target Beam 2.3.0 for end January/beginning
>>> of
>>> >> February.
>>> >>
>>> >> I'm volunteer to do this release.
>>> >>
>>> >> Thoughts ?
>>> >>
>>> >> Regards
>>> >> JB
>>> >> -- Jean-Baptiste Onofré
>>> >> jbono...@apache.org 
>>> >> http://blog.nanthrax.net
>>> >> Talend - http://www.talend.com
>>> >>
>>> >>
>>> >
>>> > --
>>> > Jean-Baptiste Onofré
>>> > jbono...@apache.org
>>> > http://blog.nanthrax.net
>>> > Talend - http://www.talend.com
>>>
>>
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature